from:"Attila Fazekas"

Re: [openstack-dev] [tempest] Small doubt in Tempest setup

2018-08-06 Thread Attila Fazekas

I was tried to be quick and become wrong. ;-)

Here are the working ways:

On Mon, Aug 6, 2018 at 3:49 PM, Attila Fazekas  wrote:

> Please use ostestr or stestr instead of testr.
>
> $ git clone https://github.com/openstack/tempest
> $ cd tempest/
> $ stestr init
> $ stestr list
>
> $ git clone https://github.com/openstack/tempest
> $ cd tempest/
> $ ostestr -l #old way, also worked, does to steps
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tempest] Small doubt in Tempest setup

2018-08-06 Thread Attila Fazekas

Please use ostestr or stestr instead of testr.

$ git clone https://github.com/openstack/tempest
$ cd tempest/
$ stestr --list

$ ostestr -l #old way, also worked

These tools handling the config creation implicitly.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-29 Thread Attila Fazekas

I have overlay2 and super fast disk I/O (memory cheat + SSD),
just the CPU freq is not high. The CPU is a Broadwell
and actually it has lot more core (E5-2630V4). Even a 5 year old gamer CPU
can be 2 times
faster on a single core, but cannot compete with all of the cores ;-)

This machine have seen faster setup time,  but I'll return to this in an
another topic.

On Tue, Sep 26, 2017 at 6:16 PM, Michał Jastrzębski <inc...@gmail.com>
wrote:

> On 26 September 2017 at 07:34, Attila Fazekas <afaze...@redhat.com> wrote:
> > decompressing those registry tar.gz takes ~0.5 min on 2.2 GHz CPU.
> >
> > Fully pulling all container takes something like ~4.5 min (from
> localhost,
> > one leaf request at a time),
> > but on the gate vm  we usually have 4 core,
> > so it is possible to go bellow 2 min with better pulling strategy,
> > unless we hit some disk limit.
>
> Check your $docker info. If you kept defaults, storage driver will be
> devicemapper on loopback, which is awfully slow and not very reliable.
> Overlay2 is much better and should speed things up quite a bit. For me
> deployment of 5 node openstack on vms similar to gate took 6min (I had
> registry available in same network). Also if you pull single image it
> will download all base images as well, so next one will be
> significantly faster.
>
> >
> > On Sat, Sep 23, 2017 at 5:12 AM, Michał Jastrzębski <inc...@gmail.com>
> > wrote:
> >>
> >> On 22 September 2017 at 17:21, Paul Belanger <pabelan...@redhat.com>
> >> wrote:
> >> > On Fri, Sep 22, 2017 at 02:31:20PM +, Jeremy Stanley wrote:
> >> >> On 2017-09-22 15:04:43 +0200 (+0200), Attila Fazekas wrote:
> >> >> > "if DevStack gets custom images prepped to make its jobs
> >> >> > run faster, won't Triple-O, Kolla, et cetera want the same and
> where
> >> >> > do we draw that line?). "
> >> >> >
> >> >> > IMHO we can try to have only one big image per distribution,
> >> >> > where the packages are the union of the packages requested by all
> >> >> > team,
> >> >> > minus the packages blacklisted by any team.
> >> >> [...]
> >> >>
> >> >> Until you realize that some projects want packages from UCA, from
> >> >> RDO, from EPEL, from third-party package repositories. Version
> >> >> conflicts mean they'll still spend time uninstalling the versions
> >> >> they don't want and downloading/installing the ones they do so we
> >> >> have to optimize for one particular set and make the rest
> >> >> second-class citizens in that scenario.
> >> >>
> >> >> Also, preinstalling packages means we _don't_ test that projects
> >> >> actually properly declare their system-level dependencies any
> >> >> longer. I don't know if anyone's concerned about that currently, but
> >> >> it used to be the case that we'd regularly add/break the package
> >> >> dependency declarations in DevStack because of running on images
> >> >> where the things it expected were preinstalled.
> >> >> --
> >> >> Jeremy Stanley
> >> >
> >> > +1
> >> >
> >> > We spend a lot of effort trying to keep the 6 images we have in
> nodepool
> >> > working
> >> > today, I can't imagine how much work it would be to start adding more
> >> > images per
> >> > project.
> >> >
> >> > Personally, I'd like to audit things again once we roll out zuulv3, I
> am
> >> > sure
> >> > there are some tweaks we could make to help speed up things.
> >>
> >> I don't understand, why would you add images per project? We have all
> >> the images there.. What I'm talking about is to leverage what we'll
> >> have soon (registry) to lower time of gates/DIB infra requirements
> >> (DIB would hardly need to refresh images...)
> >>
> >> >
> >> > 
> __
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe:
> >> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> 
> __
> >> OpenStack Development Mailing List (not for usage questions)

Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-26 Thread Attila Fazekas

decompressing those registry tar.gz takes ~0.5 min on 2.2 GHz CPU.

Fully pulling all container takes something like ~4.5 min (from localhost,
one leaf request at a time),
but on the gate vm  we usually have 4 core,
so it is possible to go bellow 2 min with better pulling strategy,
unless we hit some disk limit.


On Sat, Sep 23, 2017 at 5:12 AM, Michał Jastrzębski <inc...@gmail.com>
wrote:

> On 22 September 2017 at 17:21, Paul Belanger <pabelan...@redhat.com>
> wrote:
> > On Fri, Sep 22, 2017 at 02:31:20PM +, Jeremy Stanley wrote:
> >> On 2017-09-22 15:04:43 +0200 (+0200), Attila Fazekas wrote:
> >> > "if DevStack gets custom images prepped to make its jobs
> >> > run faster, won't Triple-O, Kolla, et cetera want the same and where
> >> > do we draw that line?). "
> >> >
> >> > IMHO we can try to have only one big image per distribution,
> >> > where the packages are the union of the packages requested by all
> team,
> >> > minus the packages blacklisted by any team.
> >> [...]
> >>
> >> Until you realize that some projects want packages from UCA, from
> >> RDO, from EPEL, from third-party package repositories. Version
> >> conflicts mean they'll still spend time uninstalling the versions
> >> they don't want and downloading/installing the ones they do so we
> >> have to optimize for one particular set and make the rest
> >> second-class citizens in that scenario.
> >>
> >> Also, preinstalling packages means we _don't_ test that projects
> >> actually properly declare their system-level dependencies any
> >> longer. I don't know if anyone's concerned about that currently, but
> >> it used to be the case that we'd regularly add/break the package
> >> dependency declarations in DevStack because of running on images
> >> where the things it expected were preinstalled.
> >> --
> >> Jeremy Stanley
> >
> > +1
> >
> > We spend a lot of effort trying to keep the 6 images we have in nodepool
> working
> > today, I can't imagine how much work it would be to start adding more
> images per
> > project.
> >
> > Personally, I'd like to audit things again once we roll out zuulv3, I am
> sure
> > there are some tweaks we could make to help speed up things.
>
> I don't understand, why would you add images per project? We have all
> the images there.. What I'm talking about is to leverage what we'll
> have soon (registry) to lower time of gates/DIB infra requirements
> (DIB would hardly need to refresh images...)
>
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [devstack] pike time growth in August

2017-09-22 Thread Attila Fazekas

The main offenders reported by devstack does not seams to explain the
growth visible on OpenstackHealth [1] .
The logs also stated to disappear which does not makes easy to figure out.


Which code/infra changes can be related ?


http://status.openstack.org/openstack-health/#/test/devstack?resolutionKey=day=P6M
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-22 Thread Attila Fazekas

"if DevStack gets custom images prepped to make its jobs
run faster, won't Triple-O, Kolla, et cetera want the same and where
do we draw that line?). "

IMHO we can try to have only one big image per distribution,
where the packages are the union of the packages requested by all team,
minus the packages blacklisted by any team.

You need to provide a bug link(s) (distribution/upstream bug) for
blacklisting
a package.

Very unlikely we will run out from the disk space juts because of the too
many packages,
usually if a package causes harm to anything it is a distro/upstream bug
which expected
to be solved within 1..2 cycle in the worst case scenario.

If the above thing proven to be not working, we need to draw the line based
on the
expected usage frequency.

On Wed, Sep 20, 2017 at 3:46 PM, Jeremy Stanley <fu...@yuggoth.org> wrote:

> On 2017-09-20 15:17:28 +0200 (+0200), Attila Fazekas wrote:
> [...]
> > The image building was the good old working solution and unless
> > the image build become a super expensive thing, this is still the
> > best option.
> [...]
>
> It became a super expensive thing, and that's the main reason we
> stopped doing it. Now that Nodepool has grown support for
> distributed/parallel image building and uploading, the cost model
> may have changed a bit in that regard so I agree it doesn't hurt to
> revisit that decision. Nevertheless it will take a fair amount of
> convincing that the savings balances out the costs (not just in
> resource consumption but also administrative overhead and community
> impact... if DevStack gets custom images prepped to make its jobs
> run faster, won't Triple-O, Kolla, et cetera want the same and where
> do we draw that line?).
> --
> Jeremy Stanley
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-20 Thread Attila Fazekas

On Wed, Sep 20, 2017 at 3:11 AM, Ian Wienand  wrote:

> On 09/20/2017 09:30 AM, David Moreau Simard wrote:
>
>> At what point does it become beneficial to build more than one image per
>> OS
>> that is more aggressively tuned/optimized for a particular purpose ?
>>
>
> ... and we can put -dsvm- in the jobs names to indicate it should run
> on these nodes :)
>
> Older hands than myself will remember even more issues, but the
> "thicker" the base-image has been has traditionally just lead to a lot
> more corners for corner-cases can hide in.  We saw this all the time
> with "snapshot" images where we'd be based on upstream images that
> would change ever so slightly and break things, leading to
> diskimage-builder and the -minimal build approach.
>
> That said, in a zuulv3 world where we are not caching all git and have
> considerably smaller images, a nodepool that has a scheduler that
> accounts for flavor sizes and could conceivably understand similar for
> images, and where we're building with discrete elements that could
> "bolt-on" things like a list-of-packages install sanely to daily
> builds ... it's not impossible to imagine.
>
> -i

The problem is these package install steps are not really I/O bottle-necked
in most cases,
even with a regular DSL speed you can  frequently see
the decompress and the post config steps takes more time.

The site-local cache/mirror has visible benefit, but does not eliminates
the issues.

The main enemy is the single threaded CPU intensive operation in most
install/config related script,
the 2th most common issue is serially requesting high latency steps, which
does not reaches neither
the CPU or I/O possibilities at the end.

The fat images are generally cheaper even if your cloud has only 1Gb
Ethernet for image transfer.
You gain more by baking the packages into the image than the 1GbE can steal
from you, because
you also save time what would be loosen on CPU intensive operations or from
random disk access.

It is safe to add all distro packages used  by devstack to the cloud image.

Historically we had issues with some base image packages which presence
changed the
behavior of some component ,for example firewalld vs. libvirt (likely an
already solved issue),
these packages got explicitly removed by devstack in case of necessary.
Those packages not requested by devstack !

Fedora/Centos also has/had issues with overlapping with pypi packages on
main filesystem,
(too long story, pointing fingers ..) , generally not a good idea to add
packages from pypi to
an image which content might be overridden by the distro's package manager.

The distribution package install time delays the gate response,
when the slowest ruining job delayed by this, than the whole response is
delayed.

It Is an user facing latency issue, which should be solved even if the cost
would be higher.

The image building was the good old working solution and unless the image
build
become a super expensive thing, this is still the best option.

site-local mirror also expected to help making the image build step(s)
faster and safer.

The other option is the ready scripts.

>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-19 Thread Attila Fazekas

The gate-tempest-dsvm-neutron-full-ubuntu-xenial job is 20..30 min slower
than it supposed to be/used to be.

The extra time has multiple reasons and it is not because we test more :( .
Usually we are just less smart than before.

Huge time increment is visible in devstack as well.
devstack is advertised as:

Running devstack ... this takes 10 - 15 minutes (logs in
logs/devstacklog.txt.gz)

The actual time is 20 - 25 min according to openstack health:
http://status.openstack.org/openstack-health/#/test/devstack?resolutionKey=day=P6M


Let's start with the first obvious difference compared to the old-time
jobs.:
The jobs does 120..220 sec apt-get install and packages defined
/files/debs/general are missing from the images before starting the job.

We used to bake multiple packages into the images based on the package list
provided by devstack in order to save time.

Why this does not happens anymore ?
Is anybody working on solving this issue ?
Is any blocker technical issue / challenge exists ?
Was it a design decision ?

We have similar issue with pypi usage as well.

PS.:
Generally a good idea to group these kind of package install commands to
one huge pip/apt-get/yum .. invocation, because these tools has significant
start up time and they also need to process the dependency graph at
install/update.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][keystone] internal endpoints vs sanity

2017-07-26 Thread Attila Fazekas

On Mon, Jul 24, 2017 at 10:53 AM, Dmitry Tantsur <dtant...@redhat.com>
wrote:

> These questions are to the operators, and should be asked on
> openstack-operators IMO (maybe with tuning the overall tone to be a bit
> less aggressive).
>

So the question looks like this without tuning:
 - Do you think is it good idea to spam the users with internal data which
useless for them unless they want to use it against you ?

>
> On 07/24/2017 10:23 AM, Attila Fazekas wrote:
>
>> Thanks for your answer.
>>
>> The real question is do we agree in the
>> internalULR usage what suggested in [1] is a bad security practice
>> and should not be told to operators at all.
>>
>> Also we should try to get rid off the enpointTypes in keystone v4.
>>
>
> Let's not seriously talk about keystone v4 at this point, we haven't
> gotten rid of v2 so far.
>
> Eventually it will come, but until that the story told to operators could
be we are going to remove the
interfaces (admin/internal/public) from the keystone catalog.

>
>> Do we have any good (not just making happy funny dev envs) to keep
>> endpoint types ?
>>
>
> I suspect any external SSL termination proxy. And anything else that will
> make the URLs exposed to end users look different from ones exposed to
> services.
>

The only real question is how many people would mind to use SSL/TLS also
internally
across the  services, when https is the one provided to the end-users.

It does not means the LB to backend needs to be SSL, it can still remain
HTTP regardless to the catalog entry.

If the internal both-way no-SSL communication is really important for the
deployers and we do not
want to change how the keystone API behave we might put
the service urls next to to auth urls in the  keystone_authtoken_section
kind of sections .

Many service has multiple  keystone_authtoken_section [3],
but for example `heat` does not have dedicated auth like section for all
related services.

The options available keystoneauth1 is usually directly exposed to the
service config,
so introducing a `catalog_override` option which accepts a json file can be
the simplest option.

Again, it is only required if you really want to use different protocol
internally than  for the public.
This should not be in a security best practice guide either, but if there
is a real user request for it so be it.

>
> Speaking of DNS, I also suspect there may be a micro-optimization in not
> making the services use it when talking to each other, while still
> providing names to end users.
>
>

If we are speaking about micro optimization, the above way would
open up the way to have services to choose `same host`, `same segment`
service instances
when it makes sense (usually does not).

Most of the networking libraries has built in intelligence to cache DNS
responses,
the dns lookup typically causes extra <0.1ms latency an openstack service
frequently needs
more than 5 ms to respond.

But if you really want to do some micro optimization here,
there are multiple small dns services available which can run on the
localhost and provide faster response
then a remote one and they are also able to hide dns infrastructure
downtimes.

The devstack vms are using unbound for DNS caching.

As always, you can use /etc/hosts file to bypass the DNS lookups,
however the /etc/hosts not expected to do Round Robin, but if you were
happy without DNS
 you will not notice it.

nscd might have surprising changing behavior, but available in all Linux
distros,
likely you want to decrease the negative-time-to-live times in most cases.

[3]
https://docs.openstack.org/ocata/config-reference/shared-file-systems/samples/manila.conf.html

>
>>
>>
>> On Fri, Jul 21, 2017 at 1:37 PM, Giulio Fidente <gfide...@redhat.com
>> <mailto:gfide...@redhat.com>> wrote:
>>
>> Only a comment about the status in TripleO
>>
>> On 07/21/2017 12:40 PM, Attila Fazekas wrote:
>>
>> [...]
>>
>> > We should seriously consider using names instead of ip address also
>> > on the devstack gates to avoid people thinking the catalog entries
>> > meant to be used with ip address and keystone is a replacement for
>> DNS.
>>
>> this is configurable, you can have names or ips in the keystone
>> endpoints ... actually you can chose to use names or ips independently
>> for each service and even for the different endpoints
>> (Internal/Admin/Public) of the same service
>>
>> if an operator, like you suggested, configures the DNS to resolve
>> different IPs for the same name basing on where the request comes
>> from,
>> then he can use the same 'hostname' for all Public, Admin and Internal
>> endpoin

Re: [openstack-dev] [TripleO][keystone] internal endpoints vs sanity

2017-07-25 Thread Attila Fazekas

>> "While it may not be a setup that everyone wants, for some deployers
having a public and internal is important."
To be more precise:
In some deployment based on in the initial directives the operators run
into an issue where the `public and internal named urls in keystone` looked
like a possible solution.

>> "Those deployers do not seem to mind the RFC1918 showing up in the
catalog, "

These deployers either just showing the public interfaces just to the
trusted administrators(/power users) or not aware of the possible risks.

>> "if they're doing point-to-point firewalling (as they should be) the
private addresses should not be considered 'secret'  "

Based on your trust in your network setup , you should also consider
removing the authentication from (for example) mariadb,
because if you really think your services are safe from the public you do
not need it either.

A random example for why this address leakage can be dangerous.:
 - Based on your private address the attacker is able to figure out (or
just better guess) where is your non-openstack openstack backend services
 - One of your backend services has weak or no authentication
 - You have an openstack service which is able to connect to arbitrary
address on user request (connection to the backend is explicitly allowed by
firewall)
 ---> possible one hit exploitation

If you think these were never was in an OpenStack together, think again and
read the CVEs and the
deployment guides and script.

We must not make the task for the crackers easier by exposing internal
information.
The addresses are frequently not dangerous alone, but in an OpenStack sized
thing it
can become very dangerous together with another `minor` issues.

Another randomly picked issue regarding to an internal url expose:
 1. have service which has some internal info which would be useful for
another service
 2. expose the internal data to all users on the API
 3. have different backend where the same filed is confidential by nature

We will run into similar issue again and again if we are not strict about
not
exposing internal info.

Whatever you think you are solving by internal urls,
can be solved in multiple other ways without leaking information, in most
cases
we also does not need to modify an OpenStack service code for solving it.

Because the internal urls are useless for the unprivileged users, they
should
not receive them at all, even tough we might have users who simply do not
care
about this, they will not die if we move to more secure solution.

>
> On Fri, Jul 21, 2017 at 1:37 PM, Giulio Fidente <gfide...@redhat.com
>> <mailto:gfide...@redhat.com>> wrote:
>>
>>     Only a comment about the status in TripleO
>>
>> On 07/21/2017 12:40 PM, Attila Fazekas wrote:
>>
>> [...]
>>
>> > We should seriously consider using names instead of ip address also
>> > on the devstack gates to avoid people thinking the catalog entries
>> > meant to be used with ip address and keystone is a replacement for
>> DNS.
>>
>> this is configurable, you can have names or ips in the keystone
>> endpoints ... actually you can chose to use names or ips independently
>> for each service and even for the different endpoints
>> (Internal/Admin/Public) of the same service
>>
>> if an operator, like you suggested, configures the DNS to resolve
>> different IPs for the same name basing on where the request comes
>> from,
>> then he can use the same 'hostname' for all Public, Admin and Internal
>> endpoints which I *think* is what you're suggesting
>>
>> also using names is the default when ssl is enabled
>>
>> check environments/ssl/tls-endpoints-public-dns.yaml and note how
>> EndpointMap can resolve to CLOUDNAME or IP_ADDRESS
>>
>> adding Juan on CC as he did a great work around this and can help
>> further
>> --
>> Giulio Fidente
>> GPG KEY: 08D733BA
>>
>>
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][keystone] internal endpoints vs sanity

2017-07-24 Thread Attila Fazekas

Thanks for your answer.

The real question is do we agree in the
internalULR usage what suggested in [1] is a bad security practice
and should not be told to operators at all.

Also we should try to get rid off the enpointTypes in keystone v4.

Do we have any good (not just making happy funny dev envs) to keep
endpoint types ?



On Fri, Jul 21, 2017 at 1:37 PM, Giulio Fidente <gfide...@redhat.com> wrote:

> Only a comment about the status in TripleO
>
> On 07/21/2017 12:40 PM, Attila Fazekas wrote:
>
> [...]
>
> > We should seriously consider using names instead of ip address also
> > on the devstack gates to avoid people thinking the catalog entries
> > meant to be used with ip address and keystone is a replacement for DNS.
>
> this is configurable, you can have names or ips in the keystone
> endpoints ... actually you can chose to use names or ips independently
> for each service and even for the different endpoints
> (Internal/Admin/Public) of the same service
>
> if an operator, like you suggested, configures the DNS to resolve
> different IPs for the same name basing on where the request comes from,
> then he can use the same 'hostname' for all Public, Admin and Internal
> endpoints which I *think* is what you're suggesting
>
> also using names is the default when ssl is enabled
>
> check environments/ssl/tls-endpoints-public-dns.yaml and note how
> EndpointMap can resolve to CLOUDNAME or IP_ADDRESS
>
> adding Juan on CC as he did a great work around this and can help further
> --
> Giulio Fidente
> GPG KEY: 08D733BA
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [TripleO][keystone] internal endpoints vs sanity

2017-07-21 Thread Attila Fazekas

Hi All,

I thought it is already well know fact the endpoint types are there ONLY
for historical reasons, today they just exists to confuse the one who tries
to deploy OpenStack,
but it is considered as a deprecated concept and it will die out sooner or
later.

The keystone v3 API already allows to not define internal or admin
endpoints at all.

I just noticed the current documentation encourages the internal endpoint
usage. [1]

Is there anybody here who thinks it is a great idea to show private address
to the end users ?
Even tough some people might consider this cwe-200, but I hope at least
looks bad to everyone.

The internal endpoints should not be used for telling internal information
to the
OpenStack services itself.  We are not putting mariadb and rabbitmq address
to the catalog as well, we have config files for that.

Ideally the end users should not even know we are using different network
paths or not,
so the internalURL entries should not be different addresses than the
public one
or they should not be defined at all.

I hope nobody really thinks the public catalog entries expected to contain
ip address instead
 of domain names by any best practice guide.

We are just using ip address in the catalog for dev/test environment,
but  in an ideal case the identity url should start with https:// ,
and it should continue with a domain name, which have several A and 
entry
and the certificate wound not be for a self signed private ip address.

Is there anybody who really thinks we are putting  http:///..
into the catalog on the gate because it is the best practice ?

You can configure your DNS server properly [2] or use the /etc/hosts file,
when for some reason you want some nodes to use different ip address
for reaching the OpenStack services.

Keystone does not needs to solve anything there,
these issues are solved decodes before OpenStack even existed.

I cannot take the single internalURL usage as a serious response for
`isolated networks` ,
because it does not scales when you want divide your network even more.
Adding internal2URL, internal3URL is not a great idea either.

We should seriously consider using names instead of ip address also
on the devstack gates to avoid people thinking the catalog entries
meant to be used with ip address and keystone is a replacement for DNS.

Using https likely a bad idea in a regular dev environment,
but I hope we agree sending unencrypted credentials over the wire
is not a recommended best practice.

Best Regards,
Attila


[1]
https://docs.openstack.org/security-guide/api-endpoints/api-endpoint-configuration-recommendations.html

[2]
https://serverfault.com/questions/332440/dns-bind-how-to-return-a-different-ip-based-on-requests-subnet
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [keystone] We still have a not identical HEAD response

2017-07-11 Thread Attila Fazekas

Hi all,

Long time ago it was discussed to make the keystone HEAD responses
 right [1] as the RFC [2][3] recommends:

"  A response to the HEAD method is identical to what an equivalent
   request made with a GET would have been, except it lacks a body. "

So, the status code needs to be identical as well !

Recently  turned out, keystone is still not correct in all cases [4].

'Get role inference rule' (GET), 'Confirm role inference rule' (HEAD)
 has the same URL pattern, but they differs in the status code (200/204)
 which is not allowed! [5]

This is the only documented case where both the HEAD and GET defined and
the HEAD has a 204 response.

Are you going to fix this [4] as it was fixed before [6] ?

Best Regards,
Attila

PS.:
 Here is the tempest change for accepting the right code [7].

[1] http://lists.openstack.org/pipermail/openstack-dev/2014-July/039140.html
[2] https://tools.ietf.org/html/rfc7231#section-4.3.2
[3] https://tools.ietf.org/html/rfc7234#section-4.3.5
[4] https://bugs.launchpad.net/keystone/+bug/1701541
[5]
https://developer.openstack.org/api-ref/identity/v3/?expanded=confirm-role-inference-rule-detail,get-role-inference-rule-detail
[6] https://bugs.launchpad.net/keystone/+bug/1334368
[7] https://review.openstack.org/#/c/479286/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [qa] Create subnetpool on dynamic credentials

2017-05-22 Thread Attila Fazekas

In order to twist things even more ;-),
We should consider making tempest working in environment where the users
instead of getting IPV4 floating IP, they are allowed to get a globally
route-able
IPV6 range (prefix/ subnet from a subnetpool) .

Tempest should be able to do connectivity tests against vms,
hosted in these subnets.

This should work regardless to the test account usage,
and it's likely requires some extra tweak in our devstack environments as
well.

Best Regards,
Attila

On Mon, May 22, 2017 at 3:22 PM, Andrea Frittoli 
wrote:

> Hi Hongbin,
>
> If several of your test cases require a subnet pool, I think the simplest
> solution would be creating one in the resource creation step of the tests.
> As I understand it, subnet pools can be created by regular projects (they
> do not require admin credentials).
>
> The main advantage that I can think of for having subnet pools provisioned
> as part of the credential provider code is that - in case of
> pre-provisioned credentials - the subnet pool would be created and delete
> once per test user as opposed to once per test class.
>
> That said I'm not opposed to the proposal in general, but if possible I
> would prefer to avoid adding complexity to an already complex part of the
> code.
>
> andrea
>
> On Sun, May 21, 2017 at 2:54 AM Hongbin Lu  wrote:
>
>> Hi QA team,
>>
>>
>>
>> I have a proposal to create subnetpool/subnet pair on dynamic
>> credentials: https://review.openstack.org/#/c/466440/ . We (Zun team)
>> have use cases for using subnets with subnetpools. I wanted to get some
>> early feedback on this proposal. Will this proposal be accepted? If not,
>> would appreciate alternative suggestion if any. Thanks in advance.
>>
>>
>>
>> Best regards,
>>
>> Hongbin
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tempest] Proposing Fanglei Zhu for Tempest core

2017-05-18 Thread Attila Fazekas

+1, Totally agree.

Best Regards,
Attila

On Tue, May 16, 2017 at 10:22 AM, Andrea Frittoli  wrote:

> Hello team,
>
> I'm very pleased to propose Fanglei Zhu (zhufl) for Tempest core.
>
> Over the past two cycle Fanglei has been steadily contributing to Tempest
> and its community.
> She's done a great deal of work in making Tempest code cleaner, easier to
> read, maintain and
> debug, fixing bugs and removing cruft. Both her code as well as her
> reviews demonstrate a
> very good understanding of Tempest internals and of the project future
> direction.
> I believe Fanglei will make an excellent addition to the team.
>
> As per the usual, if the current Tempest core team members would please
> vote +1
> or -1(veto) to the nomination when you get a chance. We'll keep the polls
> open
> for 5 days or until everyone has voted.
>
> References:
> https://review.openstack.org/#/q/owner:zhu.fanglei%2540zte.com.cn
> https://review.openstack.org/#/q/reviewer:zhufl
>
> Thank you,
>
> Andrea (andreaf)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] pingtest vs tempest

2017-04-18 Thread Attila Fazekas

On Tue, Apr 18, 2017 at 11:04 AM, Arx Cruz  wrote:

>
>
> On Tue, Apr 18, 2017 at 10:42 AM, Steven Hardy  wrote:
>
>> On Mon, Apr 17, 2017 at 12:48:32PM -0400, Justin Kilpatrick wrote:
>> > On Mon, Apr 17, 2017 at 12:28 PM, Ben Nemec 
>> wrote:
>> > > Tempest isn't really either of those things.  According to another
>> message
>> > > in this thread it takes around 15 minutes to run just the smoke tests.
>> > > That's unacceptable for a lot of our CI jobs.
>> >
>>
>
> I rather spend 15 minutes running tempest than add a regression or a new
> bug, which already happen in the past.
>
> The smoke tests might not be the best test selection anyway, you should
pick some scenario which does
for example snapshot of images and volumes. yes, these are the slow ones,
but they can run in parallel.

Very likely you do not really want to run all tempest test, but 10~20
minute time,
sounds reasonable for a sanity test.

The tempest config utility also should be extended by some parallel
capability,
and should be able to use already downloaded (part of the image) resources.

Tempest/testr/subunit worker balance is not always the best,
technically would be possible to do dynamic balancing, but it would require
a lot of work.
Let me know when it becomes the main concern, I can check what can/cannot
be done.

>
>> > Ben, is the issue merely the time it takes? Is it the affect that time
>> > taken has on hardware availability?
>>
>> It's both, but the main constraint is the infra job timeout, which is
>> about
>> 2.5hrs - if you look at our current jobs many regularly get close to (and
>> sometimes exceed this), so we just don't have the time budget available to
>> run exhasutive tests every commit.
>>
>
> We have green light from infra to increase the job timeout to 5 hours, we
> do that in our periodic full tempest job.
>

Sounds good, but I am afraid it could hurt more than helping, it could
delay other things get fixed by lot
especially if we got some extra flakiness, because of foobar.

You cannot have all possible tripleo configs on the gate anyway,
so something will pass which will require a quick fix.

IMHO the only real solution, is making the before test-run steps faster or
shorter.

Do you have any option to start the tempest running jobs in a more
developed state ?
I mean, having more things already done at the start time
(images/snapshot)
and just do a fast upgrade at the beginning of the job.

Openstack installation can be completed in a `fast` way (~minute) on
RHEL/Fedora systems
after the yum steps, also if you are able to aggregate all yum step to
single
command execution (transaction) you generally able to save a lot of time.

There is plenty of things what can be made more efficient before the test
run,
when you start considering everything evil which can be accounted for more
than 30 sec
of time, this can happen soon.

For example just executing the cpython interpreter for the openstack
commands is above 30 sec,
the work what they are doing can be done in much much faster way.

Lot of install steps actually does not depends on each other,
it allows more things to be done in parallel, we generally can have more
core than Ghz.

>
>>
>> > Should we focus on how much testing we can get into N time period?
>> > Then how do we decide an optimal N
>> > for our constraints?
>>
>> Well yeah, but that's pretty much how/why we ended up with pingtest, it's
>> simple, fast, and provides an efficient way to do smoke tests, e.g
>> creating
>> just one heat resource is enough to prove multiple OpenStack services are
>> running, as well as the DB/RPC etc etc.
>>
>> > I've been working on a full up functional test for OpenStack CI builds
>> > for a long time now, it works but takes
>> > more than 10 hours. IF you're interested in results kick through to
>> > Kibana here [0]. Let me know off list if you
>> > have any issues, the presentation of this data is all experimental
>> still.
>>
>> This kind of thing is great, and I'd support more exhaustive testing via
>> periodic jobs etc, but the reality is we need to focus on "bang for buck"
>> e.g the deepest possible coverage in the most minimal amount of time for
>> our per-commit tests - we rely on the project gates to provide a full API
>> surface test, and we need to focus on more basic things like "did the
>> service
>> start", and "is the API accessible".  Simple crud operations on a subset
>> of
>> the API's is totally fine for this IMO, whether via pingtest or some other
>> means.
>>
>>
> Right now we do have a periodic job running full tempest, with a few
> skips, and because of the lack of tempest tests in the patches, it's being
> pretty hard to keep it stable enough to have a 100% pass, and of course,
> also the installation very often fails (like in the last five days).
> For example, [1] is the latest run we have in periodic job that we get
> results from tempest, and we have 114 failures that was

Re: [openstack-dev] [gate][neutron][infra] tempest jobs timing out due to general sluggishness of the node?

2017-02-10 Thread Attila Fazekas

I wonder, can we switch to CINDER_ISCSI_HELPER="lioadm"  ?

On Fri, Feb 10, 2017 at 9:17 AM, Miguel Angel Ajo Pelayo <
majop...@redhat.com> wrote:

> I believe those are traces left by the reference implementation of cinder
> setting very high debug level on tgtd. I'm not sure if that's related or
> the culprit at all (probably the culprit is a mix of things).
>
> I wonder if we could disable such verbosity on tgtd, which certainly is
> going to slow down things.
>
> On Fri, Feb 10, 2017 at 9:07 AM, Antonio Ojea  wrote:
>
>> I guess it's an infra issue, specifically related to the storage, or the
>> network that provide the storage.
>>
>> If you look at the syslog file [1] , there are a lot of this entries:
>>
>> Feb 09 04:20:42 
>> 
>>  ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: iscsi_task_tx_start(2024) 
>> no more dataFeb 09 04:20:42 
>> 
>>  ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: iscsi_task_tx_start(1996) 
>> found a task 71 131072 0 0Feb 09 04:20:42 
>> 
>>  ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: iscsi_data_rsp_build(1136) 
>> 131072 131072 0 26214471Feb 09 04:20:42 
>> 
>>  ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: __cmd_done(1281) (nil) 
>> 0x2563000 0 131072
>>
>> grep tgtd syslog.txt.gz| wc
>>   139602 1710808 15699432
>>
>> [1] http://logs.openstack.org/95/429095/2/check/gate-tempest-dsv
>> m-neutron-dvr-ubuntu-xenial/35aa22f/logs/syslog.txt.gz
>>
>>
>>
>> On Fri, Feb 10, 2017 at 5:59 AM, Ihar Hrachyshka 
>> wrote:
>>
>>> Hi all,
>>>
>>> I noticed lately a number of job failures in neutron gate that all
>>> result in job timeouts. I describe
>>> gate-tempest-dsvm-neutron-dvr-ubuntu-xenial job below, though I see
>>> timeouts happening in other jobs too.
>>>
>>> The failure mode is all operations, ./stack.sh and each tempest test
>>> take significantly more time (like 50% to 150% more, which results in
>>> job timeout triggered). An example of what I mean can be found in [1].
>>>
>>> A good run usually takes ~20 minutes to stack up devstack; then ~40
>>> minutes to pass full suite; a bad run usually takes ~30 minutes for
>>> ./stack.sh; and then 1:20h+ until it is killed due to timeout.
>>>
>>> It affects different clouds (we see rax, internap, infracloud-vanilla,
>>> ovh jobs affected; we haven't seen osic though). It can't be e.g. slow
>>> pypi or apt mirrors because then we would see slowdown in ./stack.sh
>>> phase only.
>>>
>>> We can't be sure that CPUs are the same, and devstack does not seem to
>>> dump /proc/cpuinfo anywhere (in the end, it's all virtual, so not sure
>>> if it would help anyway). Neither we have a way to learn whether
>>> slowliness could be a result of adherence to RFC1149. ;)
>>>
>>> We discussed the matter in neutron channel [2] though couldn't figure
>>> out the culprit, or where to go next. At this point we assume it's not
>>> neutron's fault, and we hope others (infra?) may have suggestions on
>>> where to look.
>>>
>>> [1] http://logs.openstack.org/95/429095/2/check/gate-tempest-dsv
>>> m-neutron-dvr-ubuntu-xenial/35aa22f/console.html#_2017-02-09
>>> _04_47_12_874550
>>> [2] http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/
>>> %23openstack-neutron.2017-02-10.log.html#t2017-02-10T04:06:01
>>>
>>> Thanks,
>>> Ihar
>>>
>>> 
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.op
>>> enstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [keystone] Do we really need two listening ports ?

2017-02-02 Thread Attila Fazekas

Today the '-admin' version almost able to fully passing on tempest:
http://logs.openstack.org/91/428091/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/9e4f5e6/logs/stackviz/#/stdin

https://review.openstack.org/#/c/428091/ (Without the port merge,  but it
could be the next step )

At the first looks does not seams to be impossible, to have a
'keystone-wsgi' entry which good
enough to pass on tempest and possible on all real user use case as well,
even tough it might not be 100% bug compatible with the old version, and
It requires some tweaks on the routes on the keystone side.

Would be nice to have a a wsgi entry in the keystone repository
which is also passing on at least test_user_update_own_password, and
advertised as `the merged` entry by
the keystone project.






On Wed, Feb 1, 2017 at 4:35 PM, Dolph Mathews <dolph.math...@gmail.com>
wrote:

> On Wed, Feb 1, 2017 at 6:59 AM Thomas Goirand <z...@debian.org> wrote:
>
>> On 02/01/2017 10:54 AM, Attila Fazekas wrote:
>> > Hi all,
>> >
>> > Typically we have two keystone service listening on two separate ports
>> > 35357 and 5000.
>> >
>> > Historically one of the port had limited functionality, but today I do
>> > not see why we want
>> > to have two separate service/port from the same code base for similar
>> > purposes.
>>
>
> If you're running v2, you do need two endpoints (admin and public;
> keystone does not really have a use case for an internal endpoint). The
> specific port numbers don't particularly matter (other than 35357 is
> conveniently registered with IANA) and should not be hardcoded or assumed
> by clients (and are not, AFAIK). In the case of v2, it is effectively a
> different service running on each port; there's at least one unfortunately
> subtle difference in behavior between admin and public.
>
> If you're *only* running v3, you can run a single process and put the same
> endpoint URL in the service catalog, for both the admin and public
> endpoint. Arbitrary ports don't matter (so just use 443).
>
>
>> >
>> > Effective we use double amount of memory than it is really required,
>> > because both port is served by completely different worker instances,
>> > typically from the same physical server.
>> >
>> > I wonder, would it be difficult to use only a single port or at least
>> > the same pool of workers for all keystone(identity, auth..) purposes?
>> >
>> > Best Regards,
>> > Attila
>>
>> This has been discussed and agreed a long time ago, but nobody did the
>> work.
>
>
> A lot of work has gone into freeing keystone from having to run on two
> ports (Adam Young, in particular, deserves a ton of credit here). You just
> need to consume that operational flexibility.
>
>
>> Please do get rid of the 2nd port. And when you're at it, also get
>> rid of the admin and internal endpoint in the service catalog.
>
>
> v3 has never presumed anything other than a public endpoint. Admin and
> internal are strictly optional and only exist for backwards compatibility
> with v2 (so, just use v3).
>
>
>>
>
>
>> Cheers,
>>
>> Thomas Goirand (zigo)
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> --
> -Dolph
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [keystone] Do we really need two listening ports ?

2017-02-01 Thread Attila Fazekas

Hi all,

Typically we have two keystone service listening on two separate ports
35357 and 5000.

Historically one of the port had limited functionality, but today I do not
see why we want
to have two separate service/port from the same code base for similar
purposes.

Effective we use double amount of memory than it is really required,
because both port is served by completely different worker instances,
typically from the same physical server.

I wonder, would it be difficult to use only a single port or at least the
same pool of workers for all keystone(identity, auth..) purposes?

Best Regards,
Attila
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [QA][all] Propose to remove negative tests from Tempest

2016-03-19 Thread Attila Fazekas

Most negative test supposed to be very simple and we should not spend
too much time in them.

The right question:
Are we able to run 100 negative test/sec ?
Where is the time spent ?

If we are able to solve the main issue,
probably we do not need to worry about how many negative test we have.

Not all negative test a simple dumb thing.
If we would do smarter test selection, likely
we would need to keep only the slower ones.
So at the and almost nothing gained.


BTW, we can increase the number of tempest workers.

Best Regards,
Attila

- Original Message -
> From: "Ken'ichi Ohmichi" 
> To: "OpenStack Development Mailing List" 
> Sent: Thursday, March 17, 2016 2:20:11 AM
> Subject: [openstack-dev] [QA][all] Propose to remove negative tests from  
> Tempest
> 
> Hi
> 
> I have one proposal[1] related to negative tests in Tempest, and
> hoping opinions before doing that.
> 
> Now Tempest contains negative tests and sometimes patches are being
> posted for adding more negative tests, but I'd like to propose
> removing them from Tempest instead.
> 
> Negative tests verify surfaces of REST APIs for each component without
> any integrations between components. That doesn't seem integration
> tests which are scope of Tempest.
> In addition, we need to spend the test operating time on different
> component's gate if adding negative tests into Tempest. For example,
> we are operating negative tests of Keystone and more
> components on the gate of Nova. That is meaningless, so we need to
> avoid more negative tests into Tempest now.
> 
> If wanting to add negative tests, it is a nice option to implement
> these tests on each component repo with Tempest plugin interface. We
> can avoid operating negative tests on different component gates and
> each component team can decide what negative tests are valuable on the
> gate.
> 
> In long term, all negative tests will be migrated into each component
> repo with Tempest plugin interface. We will be able to operate
> valuable negative tests only on each gate.
> 
> Any thoughts?
> 
> Thanks
> Ken Ohmichi
> 
> ---
> [1]: https://review.openstack.org/#/c/293197/
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [cross-project] [all] Quotas -- service vs. library

2016-03-16 Thread Attila Fazekas


NO : For any kind of extra quota service.

In other places I saw other reasons for a quota service or similar,
 the actual cost of this approach is higher than most people would think so NO.


Maybe Library,
But I do not want to see for example the bad pattern used in nova to spread 
everywhere.

The quota usage handling MUST happen in the same DB transaction as the 
resource record (volume, server..) create/update/delete  .

There is no need for.:
- reservation-expirer services or periodic tasks ..
- there is no need for quota usage correcter shell scripts or whatever
- multiple commits


We have a transaction capable DB, to help us,
not using it would be lame.


[2] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061338.html

- Original Message -
> From: "Nikhil Komawar" 
> To: "OpenStack Development Mailing List" 
> Sent: Wednesday, March 16, 2016 7:25:26 AM
> Subject: [openstack-dev] [cross-project] [all] Quotas -- service vs. library
> 
> Hello everyone,
> 
> tl;dr;
> I'm writing to request some feedback on whether the cross project Quotas
> work should move ahead as a service or a library or going to a far
> extent I'd ask should this even be in a common repository, would
> projects prefer to implement everything from scratch in-tree? Should we
> limit it to a guideline spec?
> 
> But before I ask anymore, I want to specifically thank Doug Hellmann,
> Joshua Harlow, Davanum Srinivas, Sean Dague, Sean McGinnis and  Andrew
> Laski for the early feedback that has helped provide some good shape to
> the already discussions.
> 
> Some more context on what the happenings:
> We've this in progress spec [1] up for providing context and platform
> for such discussions. I will rephrase it to say that we plan to
> introduce a new 'entity' in the Openstack realm that may be a library or
> a service. Both concepts have trade-offs and the WG wanted to get more
> ideas around such trade-offs from the larger community.
> 
> Service:
> This would entail creating a new project and will introduce managing
> tables for quotas for all the projects that will use this service. For
> example if Nova, Glance, and Cinder decide to use it, this 'entity' will
> be responsible for handling the enforcement, management and DB upgrades
> of the quotas logic for all resources for all three projects. This means
> less pain for projects during the implementation and maintenance phase,
> holistic view of the cloud and almost a guarantee of best practices
> followed (no clutter or guessing around what different projects are
> doing). However, it results into a big dependency; all projects rely on
> this one service for right enforcement, avoiding races (if do not
> incline on implementing some of that in-tree) and DB
> migrations/upgrades. It will be at the core of the cloud and prone to
> attack vectors, bugs and margin of error.
> 
> Library:
> A library could be thought of in two different ways:
> 1) Something that does not deal with backed DB models, provides a
> generic enforcement and management engine. To think ahead a little bit
> it may be a ABC or even a few standard implementation vectors that can
> be imported into a project space. The project will have it's own API for
> quotas and the drivers will enforce different types of logic; per se
> flat quota driver or hierarchical quota driver with custom/project
> specific logic in project tree. Project maintains it's own DB and
> upgrades thereof.
> 2) A library that has models for DB tables that the project can import
> from. Thus the individual projects will have a handy outline of what the
> tables should look like, implicitly considering the right table values,
> arguments, etc. Project has it's own API and implements drivers in-tree
> by importing this semi-defined structure. Project maintains it's own
> upgrades but will be somewhat influenced by the common repo.
> 
> Library would keep things simple for the common repository and sourcing
> of code can be done asynchronously as per project plans and priorities
> without having a strong dependency. On the other hand, there is a
> likelihood of re-implementing similar patterns in different projects
> with individual projects taking responsibility to keep things up to
> date. Attack vectors, bugs and margin of error are project responsibilities
> 
> Third option is to avoid all of this and simply give guidelines, best
> practices, right packages to each projects to implement quotas in-house.
> Somewhat undesirable at this point, I'd say. But we're all ears!
> 
> Thank you for reading and I anticipate more feedback.
> 
> [1] https://review.openstack.org/#/c/284454/
> 
> --
> 
> Thanks,
> Nikhil
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-12 Thread Attila Fazekas

- Original Message -
 From: Robert Collins robe...@robertcollins.net
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Tuesday, May 12, 2015 3:06:21 AM
 Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

 On 12 May 2015 at 10:12, Attila Fazekas afaze...@redhat.com wrote:

  If you can illustrate a test script that demonstrates the actual failing
  of OS threads that does not occur greenlets here, that would make it
  immediately apparent what it is you're getting at here.

  http://www.fpaste.org/220824/raw/

  I just put together hello word C example and a hello word threading
  example,
  and replaced the print with sleep(3).

  When I use the sleep(3) from python, the 5 thread program runs in ~3
  second,
  when I use the sleep(3) from native code, it runs ~15 sec.

  So yes, it is very likely a GIL lock wait related issue,
  when the native code is not assisting.

 Your test code isn't releasing the GIL here, and I'd expect C DB
 drivers to be releasing the GIL: you've illustrated how a C extension
 can hold the GIL, but not whether thats happening.

Yes.

And you are right the C driver wrapper releases the GIL at every important 
mysql C driver call. (Py_BEGIN_ALLOW_THREADS)

Good to know :)

  Do you need a DB example, by using the mysql C driver,
  and waiting in an actual I/O primitive ?

 waiting in an I/O primitive is fine as long as the GIL has been released.

http://www.fpaste.org/221101/

Actually the eventlet version of the play/test code
is producing the mentioned error:
'Lock wait timeout exceeded; try restarting transaction'.

I have not seen the above issue with the regular python threads.

The driver does not cooperates with the event hub :(

PS.:
The 'Deadlock found when trying to get lock; try restarting transaction'
would be different situation, and it is not related to the eventlet issue.

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas

- Original Message -
 From: John Garbutt j...@johngarbutt.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Cc: Dan Smith d...@danplanet.com
 Sent: Saturday, May 9, 2015 12:45:26 PM
 Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

 On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote:
  On 4/30/15 11:16 AM, Dan Smith wrote:
  There is an open discussion to replace mysql-python with PyMySQL, but
  PyMySQL has worse performance:

  https://wiki.openstack.org/wiki/PyMySQL_evaluation

  My major concern with not moving to something different (i.e. not based
  on the C library) is the threading problem. Especially as we move in the
  direction of cellsv2 in nova, not blocking the process while waiting for
  a reply from mysql is going to be critical. Further, I think that we're
  likely to get back a lot of performance from a supports-eventlet
  database connection because of the parallelism that conductor currently
  can only provide in exchange for the footprint of forking into lots of
  workers.

  If we're going to move, shouldn't we be looking at something that
  supports our threading model?

  yes, but at the same time, we should change our threading model at the
  level
  of where APIs are accessed to refer to a database, at the very least using
  a
  threadpool behind eventlet.   CRUD-oriented database access is faster using
  traditional threads, even in Python, than using an eventlet-like system or
  using explicit async.  The tests at
  http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
  show this.With traditional threads, we can stay on the C-based MySQL
  APIs and take full advantage of their speed.

 Sorry to go back in time, I wanted to go back to an important point.

 It seems we have three possible approaches:
 * C lib and eventlet, blocks whole process
 * pure python lib, and eventlet, eventlet does its thing
 * go for a C lib and dispatch calls via thread pool

* go with pure C protocol lib, which explicitly using `python patch-able` 
  I/O function (Maybe others like.: threading, mutex, sleep ..)

* go with pure C protocol lib and the python part explicitly call
  for `decode` and `encode`, the C part just do CPU intensive operations,
  and it never calls for I/O primitives .   

 We have a few problems:
 * performance sucks, we have to fork lots of nova-conductors and api nodes
 * need to support python2.7 and 3.4, but its not currently possible
 with the lib we use?
 * want to pick a lib that we can fix when there are issues, and work to
 improve

 It sounds like:
 * currently do the first one, it sucks, forking nova-conductor helps
 * seems we are thinking the second one might work, we sure get py3.4 +
 py2.7 support
 * the last will mean more work, but its likely to be more performant
 * worried we are picking a unsupported lib with little future

 I am leaning towards us moving to making DB calls with a thread pool
 and some fast C based library, so we get the 'best' performance.

 Is that a crazy thing to be thinking? What am I missing here?

Using the python socket from C code:
https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100

Also possible to implement a mysql driver just as a protocol parser,
and you are free to use you favorite event based I/O strategy (direct epoll 
usage)
even without eventlet (or similar).

The issue with ultramysql, it does not implements
the `standard` python DB API, so you would need to add an extra wrapper to 
SQLAlchemy.

 Thanks,
 John

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Service group foundations and features

2015-05-11 Thread Attila Fazekas

- Original Message -
 From: John Garbutt j...@johngarbutt.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Saturday, May 9, 2015 1:18:48 PM
 Subject: Re: [openstack-dev] [nova] Service group foundations and features

 On 7 May 2015 at 22:52, Joshua Harlow harlo...@outlook.com wrote:
  Hi all,

  In seeing the following:

  - https://review.openstack.org/#/c/169836/
  - https://review.openstack.org/#/c/163274/
  - https://review.openstack.org/#/c/138607/

  Vilobh and I are starting to come to the conclusion that the service group
  layers in nova really need to be cleaned up (without adding more features
  that only work in one driver), or removed or other... Spec[0] has
  interesting findings on this:

  A summary/highlights:

  * The zookeeper service driver in nova has probably been broken for 1 or
  more releases, due to eventlet attributes that are gone that it via
  evzookeeper[1] library was using. Evzookeeper only works for eventlet 
  0.17.1. Please refer to [0] for details.
  * The memcache service driver really only uses memcache for a tiny piece of
  the service liveness information (and does a database service table scan to
  get the list of services). Please refer to [0] for details.
  * Nova-manage service disable (CLI admin api) does interact with the
  service
  group layer for the 'is_up'[3] API (but it also does a database service
  table scan[4] to get the list of services, so this is inconsistent with the
  service group driver API 'get_all'[2] view on what is enabled/disabled).
  Please refer to [9][10] for nova manage service enable disable for details.
* Nova service delete (REST api) seems to follow a similar broken pattern
  (it also avoids calling into the service group layer to delete a service,
  which means it only works with the database layer[5], and therefore is
  inconsistent with the service group 'get_all'[2] API).

  ^^ Doing the above makes both disable/delete agnostic about other backends
  available that may/might manage service group data for example zookeeper,
  memcache, redis etc... Please refer [6][7] for details. Ideally the API
  should follow the model used in [8] so that the extension, admin interface
  as well as the API interface use the same servicegroup interface which
  should be *fully* responsible for managing services. Doing so we will have
  a
  consistent view of services data, liveness, disabled/enabled and so-on...

  So with no disrespect to the authors of 169836 and 163274 (or anyone else
  involved), I am wondering if we can put a request in to figure out how to
  get the foundation of the service group concepts stabilized (or other...)
  before adding more features (that only work with the DB layer).

  What is the path to request some kind of larger coordination effort by the
  nova folks to fix the service group layers (and the concepts that are not
  disjoint/don't work across them) before continuing to add features on-top
  of
  a 'shakey' foundation?

  If I could propose something it would probably work out like the following:

  Step 0: Figure out if the service group API + layer(s) should be
  maintained/tweaked at all (nova-core decides?)

  If maintain it:

   - Have an agreement that nova service extension, admin
  interface(nova-manage) and API go through a common path for
  update/delete/read.
* This common path should likely be the servicegroup API so as to have a
  consistent view of data and that also helps nova to add different
  data-stores (keeping the services data in a DB and getting numerous updates
  about liveliness every few seconds of N number of compute where N is pretty
  high can be detrimental to Nova's performance)
   - At the same time allow 163274 to be worked on (since it fixes a
   edge-case
  that was asked about in the initial addition of the delete API in its
  initial code commit @ https://review.openstack.org/#/c/39998/)
   - Delay 169836 until the above two/three are fixed (and stabilized); it's
  down concept (and all other usages of services that are hitting a database
  mentioned above) will need to go through the same service group foundation
  that is currently being skipped.

  Else:
- Discard 138607 and start removing the service group code (and just use
  the DB for all the things).
- Allow 163274 and 138607 (since those would be additions on-top of the
DB
  layer that will be preserved).

  Thoughts?

 I wonder about this approach:

 * I think we need to go back and document what we want from the
 service group concept.
 * Then we look at the best approach to implement that concept.
 * Then look at the best way to get to a happy place from where we are now,
 ** Noting we will need live upgrade for (at least) the most widely
 used drivers

 Does that make any sense?

 Things that pop into my head, include:
 * The operators have been asking questions like: Should new services
 not

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas

- Original Message -
 From: Mike Bayer mba...@redhat.com
 To: openstack-dev@lists.openstack.org
 Sent: Monday, May 11, 2015 9:07:13 PM
 Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

 On 5/11/15 2:02 PM, Attila Fazekas wrote:

  Not just with local database connections,
  the 10G network itself also fast. Is is possible you spend more time even
  on
  the kernel side tcp/ip stack (and the context switch..) (Not in physical
  I/O wait)
  than in the actual work on the DB side. (Check netperf TCP_RR)

  The scary part of a blocking I/O call is when you have two
  python thread (or green thread) and one of them is holding a DB lock the
  other
  is waiting for the same lock in a native blocking I/O syscall.
 that's a database deadlock and whether you use eventlet, threads,
 asycnio or even just two transactions in a single-threaded script, that
 can happen regardless.  if your two eventlet non blocking greenlets
 are waiting forever for a deadlock,  you're just as deadlocked as if you
 have OS threads.

  If you do a read(2) in native code, the python itself might not be able to
  preempt it
  Your transaction might be finished with `DB Lock wait timeout`,
  with 30 sec of doing nothing, instead of scheduling to the another python
  thread,
  which would be able to release the lock.

 Here's the you're losing me part because Python threads are OS
 threads, so Python isn't directly involved trying to preempt anything,
 unless you're referring to the effect of the GIL locking up the
 program.   However, it's pretty easy to make two threads in Python hit a
 database and do a deadlock against each other, and the rest of the
 program's threads continue to run just fine; in a DB deadlock situation
 you are blocked on IO and IO releases the GIL.

 If you can illustrate a test script that demonstrates the actual failing
 of OS threads that does not occur greenlets here, that would make it
 immediately apparent what it is you're getting at here.

http://www.fpaste.org/220824/raw/

I just put together hello word C example and a hello word threading example,
and replaced the print with sleep(3).

When I use the sleep(3) from python, the 5 thread program runs in ~3 second,
when I use the sleep(3) from native code, it runs ~15 sec.

So yes, it is very likely a GIL lock wait related issue,
when the native code is not assisting.

Do you need a DB example, by using the mysql C driver,
and waiting in an actual I/O primitive ?

The greenthreads will not help here.

If I would import the python time.sleep from the C code it might help.

Using pure python driver helps to avoid this kind of issues,
but in this case you have the `cPython is slow` issue.

  [1]
  http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

  __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

  __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas

- Original Message -
 From: Mike Bayer mba...@redhat.com
 To: openstack-dev@lists.openstack.org
 Sent: Monday, May 11, 2015 4:44:58 PM
 Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

 On 5/11/15 9:58 AM, Attila Fazekas wrote:

  - Original Message -
  From: John Garbutt j...@johngarbutt.com
  To: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org
  Cc: Dan Smith d...@danplanet.com
  Sent: Saturday, May 9, 2015 12:45:26 PM
  Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

  On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote:
  On 4/30/15 11:16 AM, Dan Smith wrote:
  There is an open discussion to replace mysql-python with PyMySQL, but
  PyMySQL has worse performance:

  https://wiki.openstack.org/wiki/PyMySQL_evaluation
  My major concern with not moving to something different (i.e. not based
  on the C library) is the threading problem. Especially as we move in the
  direction of cellsv2 in nova, not blocking the process while waiting for
  a reply from mysql is going to be critical. Further, I think that we're
  likely to get back a lot of performance from a supports-eventlet
  database connection because of the parallelism that conductor currently
  can only provide in exchange for the footprint of forking into lots of
  workers.

  If we're going to move, shouldn't we be looking at something that
  supports our threading model?
  yes, but at the same time, we should change our threading model at the
  level
  of where APIs are accessed to refer to a database, at the very least
  using
  a
  threadpool behind eventlet.   CRUD-oriented database access is faster
  using
  traditional threads, even in Python, than using an eventlet-like system
  or
  using explicit async.  The tests at
  http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
  show this.With traditional threads, we can stay on the C-based MySQL
  APIs and take full advantage of their speed.
  Sorry to go back in time, I wanted to go back to an important point.

  It seems we have three possible approaches:
  * C lib and eventlet, blocks whole process
  * pure python lib, and eventlet, eventlet does its thing
  * go for a C lib and dispatch calls via thread pool
  * go with pure C protocol lib, which explicitly using `python patch-able`
 I/O function (Maybe others like.: threading, mutex, sleep ..)

  * go with pure C protocol lib and the python part explicitly call
 for `decode` and `encode`, the C part just do CPU intensive operations,
 and it never calls for I/O primitives .

  We have a few problems:
  * performance sucks, we have to fork lots of nova-conductors and api nodes
  * need to support python2.7 and 3.4, but its not currently possible
  with the lib we use?
  * want to pick a lib that we can fix when there are issues, and work to
  improve

  It sounds like:
  * currently do the first one, it sucks, forking nova-conductor helps
  * seems we are thinking the second one might work, we sure get py3.4 +
  py2.7 support
  * the last will mean more work, but its likely to be more performant
  * worried we are picking a unsupported lib with little future

  I am leaning towards us moving to making DB calls with a thread pool
  and some fast C based library, so we get the 'best' performance.

  Is that a crazy thing to be thinking? What am I missing here?
  Using the python socket from C code:
  https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100

  Also possible to implement a mysql driver just as a protocol parser,
  and you are free to use you favorite event based I/O strategy (direct epoll
  usage)
  even without eventlet (or similar).

  The issue with ultramysql, it does not implements
  the `standard` python DB API, so you would need to add an extra wrapper to
  SQLAlchemy.

 This driver appears to have seen its last commit about a year ago, that
 doesn't even implement the standard DBAPI (which is already a red
 flag).   There is apparently a separately released (!) DBAPI-compat
 wrapper https://pypi.python.org/pypi/umysqldb/1.0.3 which has had no
 releases in two years. If this wrapper is indeed compatible with
 MySQLdb then it would run in SQLAlchemy without changes (though I'd be
 extremely surprised if it passes our test suite).

 How would using these obscure libraries be any preferable than running
 Nova API functions within the thread-pooling facilities already included
 with eventlet ?Keeping in mind that I've now done the work [1]
 to show that there is no performance gain to be had for all the trouble
 we go through to use eventlet/gevent/asyncio with local database
 connections.

Not just with local database connections,
the 10G network itself also fast. Is is possible you spend more time even on
the kernel side tcp/ip stack (and the context switch..) (Not in physical I/O 
wait)
than in the actual

Re: [openstack-dev] [Nova][Neutron] Linuxbridge as the default in DevStack [was: Status of the nova-network to Neutron migration work]

2015-04-28 Thread Attila Fazekas

You can tcpdump the ovs ports as usual.

Please keep in mind ovs does not have `single contention` port.
OVS does MAC learning by default and you may not see `learned` uni-cast traffic
on a random trunk port. You MAY see BUM traffic, but many of them also can be 
canceled
by neutron-ml2-ovs, AFAIK it is not enabled by default. 

OVS behaves like a real switch, real switches also does not have 5 Tbit/sec 
ports for monitoring :(
If you need to tcpudump on a port which is not visible in the userspace 
(internal patch links) by default
you should do port mirroring. [1]

Usually you do not need to dump the traffic,
What you should do as basic trouble shooting is checking the tags on the ports,
(`ovsdb-client dump` show everything, excluding the oflow rules)

Hopefully the root cause is fixed, but you should check is the port not trunk
when it needs to be tagged.

Neutron also dedicates the vlan-4095 on br-int as dead vlan,
If you have a port in this, it can mean a miss configuration
or a message lost in the void or something Exceptional happened.

If you really need to redirect exceptional `out of band` traffic to a special 
port
or to an external service (controller) it would be more complex thing
then just doing the mirroring.

[1] http://www.yet.org/2014/09/openvswitch-troubleshooting/

PS.:
OVS does not generates ICMP packets in many cases when a real `L3` switch would 
do,
thats why the MTU size differences causes issues and requires extra care at 
configuration,
when ovs used with tunneling. (OVS also can be used with vlans)

Probably this caused the most headache for many user.

PS2.:
Somewhere I read the ovs had the PMTUD support, but it was removed because
it was not conforming to the standard.
It just does silent packet drop :(
 


- Original Message -
 From: Jeremy Stanley fu...@yuggoth.org
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Tuesday, April 21, 2015 5:00:24 PM
 Subject: Re: [openstack-dev] [Nova][Neutron] Linuxbridge as the default in 
 DevStack [was: Status of the nova-network
 to Neutron migration work]
 
 On 2015-04-21 03:19:04 -0400 (-0400), Attila Fazekas wrote:
 [...]
  IMHO the OVS is less complex than netfilter (iptables, *tables),
  if someone able to deal with reading the netfilter rules he should
  be able to deal with OVS as well.
 
 In a simple DevStack setup, you really have that many
 iptables/ebtables rules?
 
  OVS has debugging tools for internal operations, I guess you are
  looking for something else. I do not have any `good debugging`
  tool for net-filter either.
 [...]
 
 Complexity of connecting tcpdump to the bridge was the primary
 concern here (convenient means of debugging network problems when
 you're using OVS, less tools for debugging OVS itself though it can
 come down to that at times as well). Also ebtables can easily be
 configured to log every frame it blocks, forwards or rewrites
 (presumably so can the OVS flow handler? but how?).
 --
 Jeremy Stanley
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [openstack][nova] Does anyone use Zookeeper, Memcache Nova ServiceGroup Driver ?

2015-04-28 Thread Attila Fazekas

How many compute nodes do you want to manage ?

If it less than ~1000, you do not need to care.
If you have more, just use SSD with good write IOPS value.

Mysql actually can be fast with enough memory and good SSD. 
Even faster than [1].

zk as technology is good, the current nova driver is not. Not recommended.
The current mc driver does lot of tcp ping-pong for every node, 
it can be slower than the SQL.

IMHO At high compute node count you would face with scheduler latency issues
sooner than sg driver issues. (It is not Log(N) :()

The sg drivers was introduced to eliminate 100 Update/sec at 1000 Host,
but it caused all service is being fetched from the DB even if at the given code
part you just need to alive services. 


[1] 
http://www.percona.com/blog/2013/10/18/innodb-scalability-issues-tables-without-primary-keys/

- Original Message -
 From: Vilobh Meshram vilobhmeshram.openst...@gmail.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org, OpenStack
 Mailing List (not for usage questions) openst...@lists.openstack.org
 Sent: Tuesday, April 28, 2015 1:21:58 AM
 Subject: [openstack-dev] [openstack][nova] Does anyone use Zookeeper, 
 Memcache Nova ServiceGroup Driver ?
 
 Hi,
 
 Does anyone use Zookeeper[1], Memcache[2] Nova ServiceGroup Driver ?
 
 If yes how has been your experience with it. It was noticed that most of the
 deployment try to use the default Database driver[3]. Any experiences with
 Zookeeper, Memcache driver will be helpful.
 
 -Vilobh
 
 [1]
 https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/zk.py
 [2]
 https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/mc.py
 [3]
 https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/db.py
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova][Neutron] Linuxbridge as the default in DevStack [was: Status of the nova-network to Neutron migration work]

2015-04-21 Thread Attila Fazekas

- Original Message -
 From: Jeremy Stanley fu...@yuggoth.org
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Friday, April 17, 2015 9:35:07 PM
 Subject: Re: [openstack-dev] [Nova][Neutron] Linuxbridge as the default in 
 DevStack [was: Status of the nova-network
 to Neutron migration work]

 On 2015-04-17 11:49:23 -0700 (-0700), Kevin Benton wrote:
  I definitely understand that. But what is the major complaint from
  operators? I understood that quote to imply it was around
  Neutron's model of self-service networking.

 My takeaway from Tom's message was that there was a concern about
 complexity in all forms (not just of the API but also due to the
 lack of maturity, documentation and debuggability of the underlying
 technology), and that the self-service networking model was simply
 one example of that. Perhaps I was reading between the lines too
 much because of prior threads on both the operators and developers
 mailing lists. Anyway, I'm sure Tom will clarify what he meant if
 necessary.

IMHO the OVS is less complex than netfilter (iptables, *tables),
if someone able to deal with reading the netfilter rules
he should be able to deal with OVS as well.

OVS has debugging tools for internal operations, I guess you are looking
for something else.
I do not have any `good debugging` tool for net-filter either.

The way how openstack/neutron/devstack by default uses OVS is simpler,
than most small (non openstack related) OVS example trying to explain.

I kind of agree with the lack of documentation part. 
A documentation which explains howto use OVS
in same way as neutron does would be helpfull for new comers.  

  If the main reason the remaining Nova-net operators don't want to
  use Neutron is due to the fact that they don't want to deal with
  the Neutron API, swapping some implementation defaults isn't
  really going to get us anywhere on that front.

 This is where I think the subthread has definitely wandered off
 topic too. Swapping implementation defaults in DevStack because it's
 quicker and easier to get running on the typical
 all-in-one/single-node setup and faster to debug problems with
 (particularly when you're trying to work on non-network-related bits
 and just need to observe the network communication between your
 services) doesn't seem like it should have a lot to do with the
 recommended default configuration for a large production deployment.
 One size definitely does not fit all.

  It's an important distinction because it determines what
  actionable items we can take (e.g. what Salvatore mentioned in his
  email about defaults). Does that make sense?

 It makes sense in the context of the Neutron/Nova network parity
 topic, but not so much in the context of the DevStack default
 settings topic. DevStack needs a simple default that just works, and
 doesn't need the kitchen sink. You can turn on more complex options
 as you need to test them out. In some ways this has parallels to the
 complexity concerns the operator community has over Neutron and OVS,
 but I think they're still relatively distinct topics.
 --
 Jeremy Stanley

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-17 Thread Attila Fazekas

- Original Message -
From: joehuang joehu...@huawei.com
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Sent: Friday, April 17, 2015 9:46:12 AM
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi, Attila,

only address the issue of agent status/liveness management is not enough for
Neutron scalability. The concurrent dynamic load impact on large scale ( for
example 100k managed nodes with the dynamic load like security group rule
update, routers_updated, etc ) should also be taken into account too. So
even if is agent status/liveness management improved in Neutron, that
doesn't mean the scalability issue totally being addressed.

This story is not about the heartbeat.
https://bugs.launchpad.net/neutron/+bug/1438159

What I am looking for is managing lot of nodes, with minimal `controller`
resources.

The actual required system changes like (for example regarding to vm boot)
per/sec
is relative low, even if you have many nodes and vms. - Consider the instances
average lifetime -

The `bug` is for the resources what the agents are related and querying many
times,
BTW: I am thinking about several alternatives and other variants.

In neutron case a `system change` can affect multiple agents
like security group rule change.

It seams possible to have all agents to `query` a resource only once,
and being notified by any subsequent change `for free`. (IP, sec group rule,
new neighbor)

This is the scenario when the message brokers can shine and scale,
and it also offloads lot of work from the DB.

And on the other hand, Nova already supports several segregation concepts,
for example, Cells, Availability Zone... If there are 100k nodes to be
managed by one OpenStack instances, it's impossible to work without hardware
resources segregation. It's weird to put agent liveness manager in
availability zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is
power off, then all agents in AZ2 lost management.

The benchmark is already here for scalability test report for million ports
scalability of Neutron
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

The cascading may be not perfect, but at least it provides a feasible way if
we really want scalability.

I am also working to evolve OpenStack to a world no need to worry about
OpenStack Scalability Issue based on cascading:

Tenant level virtual OpenStack service over hybrid or federated or multiple
OpenStack based clouds:

There are lots of OpenStack based clouds, each tenant will be allocated with
one cascading OpenStack as the virtual OpenStack service, and single
OpenStack API endpoint served for this tenant. The tenant's resources can be
distributed or dynamically scaled to multi-OpenStack based clouds, these
clouds may be federated with KeyStone, or using shared KeyStone, or even
some OpenStack clouds built in AWS or Azure, or VMWare vSphere.

Under this deployment scenario, unlimited scalability in a cloud can be
achieved, no unified cascading layer, tenant level resources orchestration
among multi-OpenStack clouds fully distributed(even geographically). The
database and load for one casacding OpenStack is very very small, easy for
disaster recovery or backup. Multiple tenant may share one cascading
OpenStack to reduce resource waste, but the principle is to keep the
cascading OpenStack as thin as possible.

You can find the information here:
https://wiki.openstack.org/wiki/OpenStack_cascading_solution#Use_Case

Best Regards
Chaoyi Huang ( joehuang )

-Original Message-
From: Attila Fazekas [mailto:afaze...@redhat.com]
Sent: Thursday, April 16, 2015 3:06 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

- Original Message -
From: joehuang joehu...@huawei.com
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Sent: Sunday, April 12, 2015 3:46:24 AM
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

As Kevin talking about agents, I want to remind that in TCP/IP stack,
port ( not Neutron Port ) is a two bytes field, i.e. port ranges from
0 ~ 65535, supports maximum 64k port number.

above 100k managed node means more than 100k L2 agents/L3
agents... will be alive under Neutron.

Want to know the detail design how to support 99.9% possibility for
scaling Neutron in this way, and PoC and test would be a good support for
this idea.

Would you consider something as PoC which uses the technology in similar way,
with a similar port - security problem, but with a lower level API than
neutron using currently ?

Is it an acceptable flaw:
If you kill -9 the q-svc 1 times

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-16 Thread Attila Fazekas

As Kevin talking about agents, I want to remind that in TCP/IP stack, port (
not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535,
supports maximum 64k port number.

above 100k managed node means more than 100k L2 agents/L3 agents... will
be alive under Neutron.

Want to know the detail design how to support 99.9% possibility for scaling
Neutron in this way, and PoC and test would be a good support for this idea.

Would you consider something as PoC which uses the technology in similar way,
with a similar port - security problem, but with a lower level API
than neutron using currently ?

Is it an acceptable flaw:
If you kill -9 the q-svc 1 times at the `right` millisec the rabbitmq
memory usage increases by ~1MiB ? (Rabbit usually eats ~10GiB under pressure)
The memory can be freed without broker restart, it also gets freed on
agent restart.

I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.

Best Regards

Chaoyi Huang ( joehuang )

From: Kevin Benton [blak...@gmail.com]
Sent: 11 April 2015 12:34
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the few
remaining ones I can think of is sync_routers but it would be great if you
can enumerate the ones you observed because eliminating overhead in agents
is something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I don't
think we can't eliminate them because they are used to determine if the
agents are still alive for scheduling purposes. Did you have something else
in mind to determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com
wrote:

I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.

The problem is openstack using the right tools SQL/AMQP/(zk),
but in a wrong way.

For example.:
Periodic updates can be avoided almost in all cases

The new data can be pushed to the agent just when it needed.
The agent can know when the AMQP connection become unreliable (queue or
connection loose),
and needs to do full sync.
https://bugs.launchpad.net/neutron/+bug/1438159

Also the agents when gets some notification, they start asking for details
via the
AMQP - SQL. Why they do not know it already or get it with the notification
?

- Original Message -
From: Neil Jerram neil.jer...@metaswitch.com
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Sent: Thursday, April 9, 2015 5:01:45 PM
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi Joe,

Many thanks for your reply!

On 09/04/15 03:34, joehuang wrote:
Hi, Neil,

From theoretic, Neutron is like a broadcast domain, for example,
enforcement of DVR and security group has to touch each regarding host
where there is VM of this project resides. Even using SDN controller, the
touch to regarding host is inevitable. If there are plenty of physical
hosts, for example, 10k, inside one Neutron, it's very hard to overcome
the broadcast storm issue under concurrent operation, that's the
bottleneck for scalability of Neutron.

I think I understand that in general terms - but can you be more
specific about the broadcast storm? Is there one particular message
exchange that involves broadcasting? Is it only from the server to
agents, or are there 'broadcasts' in other directions as well?

(I presume you are talking about control plane messages here, i.e.
between Neutron components. Is that right? Obviously there can also be
broadcast storm problems in the data plane - but I don't think that's
what you are talking about here.)

We need layered architecture in Neutron to solve the broadcast domain
bottleneck of scalability. The test report from OpenStack cascading shows
that through layered architecture Neutron cascading, Neutron can
supports up to million level ports and 100k level physical hosts. You can
find the report here:
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

Many thanks, I will take a look at this.

Neutron cascading also brings extra benefit: One cascading Neutron can

Re: [openstack-dev] [all] QPID incompatible with python 3 and untested in gate -- what to do?

2015-04-16 Thread Attila Fazekas

- Original Message -
 From: Ken Giusti kgiu...@gmail.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Thursday, April 16, 2015 4:47:50 PM
 Subject: Re: [openstack-dev] [all] QPID incompatible with python 3 and 
 untested in gate -- what to do?

 On Wed, Apr 15, 2015 at 8:18 PM, Joshua Harlow harlo...@outlook.com wrote:
  Ken Giusti wrote:

  On Wed, Apr 15, 2015 at 1:33 PM, Doug Hellmannd...@doughellmann.com
  wrote:

  Excerpts from Ken Giusti's message of 2015-04-15 09:31:18 -0400:

  On Tue, Apr 14, 2015 at 6:23 PM, Joshua Harlowharlo...@outlook.com
  wrote:

  Ken Giusti wrote:

  Just to be clear: you're asking specifically about the 0-10 based
  impl_qpid.py driver, correct?   This is the driver that is used for
  the qpid:// transport (aka rpc_backend).

  I ask because I'm maintaining the AMQP 1.0 driver (transport
  amqp://) that can also be used with qpidd.

  However, the AMQP 1.0 driver isn't yet Python 3 compatible due to its
  dependency on Proton, which has yet to be ported to python 3 - though
  that's currently being worked on [1].

  I'm planning on porting the AMQP 1.0 driver once the dependent
  libraries are available.

  [1]: https://issues.apache.org/jira/browse/PROTON-490

  What's the expected date on this as it appears this also blocks python
  3
  work as well... Seems like that hasn't been updated since nov 2014
  which
  doesn't inspire that much confidence (especially for what appears to be
  mostly small patches).

  Good point.  I reached out to the bug owner.  He got it 'mostly
  working' but got hung up on porting the proton unit tests.   I've
  offered to help this along and he's good with that.  I'll make this a
  priority to move this along.

  In terms of availability - proton tends to do releases about every 4-6
  months.  They just released 0.9, so the earliest availability would be
  in that 4-6 month window (assuming that should be enough time to
  complete the work).   Then there's the time it will take for the
  various distros to pick it up...

  so, definitely not 'real soon now'. :(

  This seems like a case where if we can get the libs we need to a point
  where they install via pip, we can let the distros catch up instead of
  waiting for them.

  Sadly just the python wrappers are available via pip.  Its C extension
  requires that the native proton shared library (libqpid-proton) is
  available.   To date we've relied on the distro to provide that
  library.

  How does that (c extension) work with eventlet? Does it?

 I haven't experienced any issues in my testing.

 To be clear - the libqpid-proton library is non-blocking and
 non-threading.  It's simply an protocol processing engine - the driver
 hands it raw network data and messages magically pop out (and vise
 versa).

  All I/O, blocking, threading etc is done in the python driver itself.
 I suspect there's nothing eventlet needs to do that requires
 overloading functionality provided by the binary proton library, but
 my knowledge of eventlet is pretty slim.

Usually to make a C code I/O lib eventlet friendly you need to use python
sockets. It is possible to explicitly use python socket on the C code.
For. Ex.: 
https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100

If the driver using python sockets and just passing the data to the C code as 
you said,
it is fine!

  Similarly, if we have *an* approach for Python 3 on oslo.messaging, that
  means the library isn't blocking us from testing applications with
  Python 3. If some of the drivers lag, their test jobs may need to be
  removed or disabled if the apps start testing under Python 3.

  Doug

  __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe:
  openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

  __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 --
 Ken Giusti  (kgiu...@gmail.com)

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] 答复: [neutron] Neutron scaling datapoints?

2015-04-14 Thread Attila Fazekas

- Original Message -
 From: Wangbibo wangb...@huawei.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Monday, April 13, 2015 10:51:39 AM
 Subject: [openstack-dev] 答复:  [neutron] Neutron scaling datapoints?

 Hi Kevin,

 Totally agree with you that heartbeat from each agent is something that we
 cannot eliminate currently. Agent status depends on it, and further
 scheduler and HA depends on agent status.

Actually we could eliminate it regarding to the q-agt,
The q-agt can be monitored by the n-cpu
n-cpu also should change his status to dead when the q-agt dead.

So neutron could reuse the aliveness data from n-cpu aliveness.

Sooner or later I will suggest a direct
connection between n-cpu and q-agt anyway.

--

Also it is possible to implement the is_up by a dummy message send:

- All agent has to have an auto-delete queue, which consumed only by the agent.

- The is_up can use the Default exchange and start publishing a message
  with `immediate` flag.
  If the broker does not refuses it, the target system is alive.

https://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.publish

This method has the same issue as the current memcached driver,
each is_up is a tcp request/response which consumes too many
time and resources when you `list` 100k node.

---

Actually the recommended method is:

Have a service which is:
 - HA (3(+) node)
 - Really able to use multiple threads (not cpython)
 - Does not do a real state change when the service state did not changed
 - The availability is based on the tcp connection health, which is checked 
either by
   - Frequent TCP keep-alive packages managed by the kernel
   - By an Application level payload
 - The service state interested parties are notified about state changes only
   when state change happened

For ex.: Zookeeper with ephemeral znodes.

Have a second service:
 - Which is subscribed to the first one (it can use tooz)
 - Consulting the service state changes with the database (add is_alive field 
to the table)
 - You can run multiple instances for HA,
   they does the same DB change with small delay, no split brain issue,
   but you should not run more than 3 instances.

Benefits of this approach compered to other zk based approaches.:

- Does not have all API worker to keep all service data in memory !
- Does not have all worker to cross reference the DB data with something else
  (especially in list context)
- Selecting only the dead or alive nodes will be simple and efficient

Cons.:

- 0.001 DB UPDATE/sec expected at 100k node (nothing :))
- Additional service component, but actually it saves memory 

PS.:
The zk has one other advantage compared to mc or the current db driver:
 Faster state change detection and report. 

 I proposed a Liberty spec for introducing open framework/pluggable agent
 status drivers.[1][2] It allows us to use some other 3 rd party backend to
 monitor agent status, such as zookeeper, memcached. Meanwhile, it guarantees
 backward compatibility so that users could still use db-based status
 monitoring mechanism as their default choice.

 Base on that, we may do further optimization on issues Attila and you
 mentioned. Thanks.

 [1] BP -
 https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers

 [2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/

 Best,

 Robin

 发件人 : Kevin Benton [mailto:blak...@gmail.com]
 发送时间 : 2015 年 4 月 11 日 12:35
 收件人 : OpenStack Development Mailing List (not for usage questions)
 主题 : Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Which periodic updates did you have in mind to eliminate? One of the few
 remaining ones I can think of is sync_routers but it would be great if you
 can enumerate the ones you observed because eliminating overhead in agents
 is something I've been working on as well.

 One of the most common is the heartbeat from each agent. However, I don't
 think we can't eliminate them because they are used to determine if the
 agents are still alive for scheduling purposes. Did you have something else
 in mind to determine if an agent is alive?

 On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas  afaze...@redhat.com 
 wrote:

 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.

 The problem is openstack using the right tools SQL/AMQP/(zk),
 but in a wrong way.

 For example.:
 Periodic updates can be avoided almost in all cases

 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection become unreliable (queue or
 connection loose),
 and needs to do full sync.
 https://bugs.launchpad.net/neutron/+bug/1438159

 Also the agents when gets some notification, they start asking

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-13 Thread Attila Fazekas

- Original Message -
 From: joehuang joehu...@huawei.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Sunday, April 12, 2015 1:20:48 PM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi, Kevin,

 I assumed that all agents are connected to same IP address of RabbitMQ, then
 the connection will exceed the port ranges limitation.

https://news.ycombinator.com/item?id=1571300

TCP connections are identified by the (src ip, src port, dest ip, dest port) 
tuple.

The server doesn't need multiple IPs to handle  65535 connections. All the 
server connections to a given IP are to the same port. For a given client, the 
unique key for an http connection is (client-ip, PORT, server-ip, 80). The only 
number that can vary is PORT, and that's a value on the client. So, the client 
is limited to 65535 connections to the server. But, a second client could also 
have another 65K connections to the same server-ip:port.

 For a RabbitMQ cluster, for sure the client can connect to any one of member
 in the cluster, but in this case, the client has to be designed in fail-safe
 manner: the client should be aware of the cluster member failure, and
 reconnect to other survive member. No such mechnism has been implemented
 yet.

 Other way is to use LVS or DNS based like load balancer, or something else.
 If you put one load balancer ahead of a cluster, then we have to take care
 of the port number limitation, there are so many agents will require
 connection concurrently, 100k level, and the requests can not be rejected.

 Best Regards

 Chaoyi Huang ( joehuang )

 From: Kevin Benton [blak...@gmail.com]
 Sent: 12 April 2015 9:59
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 The TCP/IP stack keeps track of connections as a combination of IP + TCP
 port. The two byte port limit doesn't matter unless all of the agents are
 connecting from the same IP address, which shouldn't be the case unless
 compute nodes connect to the rabbitmq server via one IP address running port
 address translation.

 Either way, the agents don't connect directly to the Neutron server, they
 connect to the rabbit MQ cluster. Since as many Neutron server processes can
 be launched as necessary, the bottlenecks will likely show up at the
 messaging or DB layer.

 On Sat, Apr 11, 2015 at 6:46 PM, joehuang  joehu...@huawei.com  wrote:

 As Kevin talking about agents, I want to remind that in TCP/IP stack, port (
 not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535,
 supports maximum 64k port number.

  above 100k managed node  means more than 100k L2 agents/L3 agents... will
 be alive under Neutron.

 Want to know the detail design how to support 99.9% possibility for scaling
 Neutron in this way, and PoC and test would be a good support for this idea.

 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.

 Best Regards

 Chaoyi Huang ( joehuang )

 From: Kevin Benton [ blak...@gmail.com ]
 Sent: 11 April 2015 12:34
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Which periodic updates did you have in mind to eliminate? One of the few
 remaining ones I can think of is sync_routers but it would be great if you
 can enumerate the ones you observed because eliminating overhead in agents
 is something I've been working on as well.

 One of the most common is the heartbeat from each agent. However, I don't
 think we can't eliminate them because they are used to determine if the
 agents are still alive for scheduling purposes. Did you have something else
 in mind to determine if an agent is alive?

 On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas  afaze...@redhat.com 
 wrote:

 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.

 The problem is openstack using the right tools SQL/AMQP/(zk),
 but in a wrong way.

 For example.:
 Periodic updates can be avoided almost in all cases

 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection become unreliable (queue or
 connection loose),
 and needs to do full sync.
 https://bugs.launchpad.net/neutron/+bug/1438159

 Also the agents when gets some notification, they start asking for details
 via the
 AMQP - SQL. Why they do not know it already or get it with the notification
 ?

 - Original Message -
  From: Neil Jerram  neil.jer...@metaswitch.com 
  To: OpenStack Development Mailing List (not for usage questions

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-13 Thread Attila Fazekas

- Original Message -
 From: Kevin Benton blak...@gmail.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Sunday, April 12, 2015 4:17:29 AM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 So IIUC tooz would be handling the liveness detection for the agents. That
 would be nice to get ride of that logic in Neutron and just register
 callbacks for rescheduling the dead.

 Where does it store that state, does it persist timestamps to the DB like
 Neutron does? If so, how would that scale better? If not, who does a given
 node ask to know if an agent is online or offline when making a scheduling
 decision?

You might find interesting the proposed solution in this bug:
https://bugs.launchpad.net/nova/+bug/1437199

 However, before (what I assume is) the large code change to implement tooz, I
 would like to quantify that the heartbeats are actually a bottleneck. When I
 was doing some profiling of them on the master branch a few months ago,
 processing a heartbeat took an order of magnitude less time (50ms) than the
 'sync routers' task of the l3 agent (~300ms). A few query optimizations
 might buy us a lot more headroom before we have to fall back to large
 refactors.
 Kevin Benton wrote:

 One of the most common is the heartbeat from each agent. However, I
 don't think we can't eliminate them because they are used to determine
 if the agents are still alive for scheduling purposes. Did you have
 something else in mind to determine if an agent is alive?

 Put each agent in a tooz[1] group; have each agent periodically heartbeat[2],
 have whoever needs to schedule read the active members of that group (or use
 [3] to get notified via a callback), profit...

 Pick from your favorite (supporting) driver at:

 http://docs.openstack.org/ developer/tooz/compatibility. html

 [1] http://docs.openstack.org/ developer/tooz/compatibility. html#grouping
 [2] https://github.com/openstack/ tooz/blob/0.13.1/tooz/ coordination.py#L315
 [3] http://docs.openstack.org/ developer/tooz/tutorial/group_
 membership.html#watching- group-changes

 __ __ __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: OpenStack-dev-request@lists. openstack.org?subject: unsubscribe
 http://lists.openstack.org/ cgi-bin/mailman/listinfo/ openstack-dev

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][database][quotas] reservations table ??

2015-04-13 Thread Attila Fazekas

- Original Message -
 From: Kevin L. Mitchell kevin.mitch...@rackspace.com
 To: openstack-dev@lists.openstack.org
 Sent: Friday, April 10, 2015 5:47:26 PM
 Subject: Re: [openstack-dev] [nova][database][quotas] reservations table ??

 On Fri, 2015-04-10 at 02:38 -0400, Attila Fazekas wrote:
  I noticed the nova DB has reservations table with an expire field (+24h)
  and a periodic task
  in the scheduler (60 sec) for expire the otherwise not deleted records [2].

  Both the table and the observed operations are strange.

  What this table and its operations are trying to solve ?
  Why does it needed ?
  Why this solution was chosen ?

 It might help to know that this is reservations for the quota system.
 The basic reason that this exists is because of parallelism: say the
 user makes a request to boot a new instance, and that new instance would
 fill their quota.  Nova begins processing the request, but while it's
 doing so, the user makes a second (or third, fourth, fifth, etc.)
 request.  With a reservation, we can count the first request against
 their quota and reject the extra requests; without a reservation, we
 have no way of knowing that nova is already processing a request, and so
 could allow the user to vastly exceed their quota.

Just the very existence of the `expire` makes the solution very suspicious.

As I see the operations does no ensure parallel safe quota enforcement 
at resource creation and based on stale data. (wireshark)

It is based on a data originated from different transaction,
 even without SELECT .. WITH SHARED LOCK.

When moving the delta to/from reservations the service puts a lock 
(SELECT .. FOR UPDATE) on all same tenant related quota_usages row,
this is the only safety mechanism I saw.
Alone it is not enough.

No quota related table touched in the same transaction
when the instance state changed (or created). :(

---
The reservations table is not really needed.

What is really needed is doing the quota_usages changes
 and resource state changes in the same transaction !

The transactions are all or nothing constructs,
nothing can happen which needs any `expire` thing.

The transaction needs to ensure really it does the state change.
It can mean just read it with SELECT .. FOR UPDATE  
for an existing record (for ex.: instance)

The transaction also needs to ensure quota check happened 
based on not stale data - SELECT .. WITH SHARED LOCK for
- quota limit queries
- for calculating the actual number of things or for just reading the
  values from the quota_usages

In most cases, the quota check and update can be merged to a single UPDATE 
statement
and it fully can happen on the DB side, without actually fetching
any quota related information by the service.

The mysql UPDATE statement with the right expressions and sub-queries,
automatically can place the minimum required locks and do the update when 
needed.

The number of changed rows returned by the UPDATE,
can indicate is the quota successfully allocated (passed the check) or not.

When it's not successful, just ROLLBACK and tell something to the user about
the `Out of Quota` issue.

It is recommended to put the quota check close to the end of the transaction,
in order to minimize the lock hold time related to quota_usages table.

At the end we will not lock the quota_usages twice (as we do now),
and we do not left behind 4 virtually deleted rows in a `bonus` table,
and do not use +1 extra transaction and +8 extra UPDATE  per instance create,
and consistency is ensured.

  PS.:
  Is the uuid in the table referenced by anything?

 Once the operation that allocated the reservation completes, it either
 rolls back the reservation (in the case of failure) or it commits the
 reservation (updating a cache quota usages table).  This involves
 updating the reservation table to delete the reservation, and a UUID
 helps match up the specific row.  (Or rows; most operations involve more
 than one quota and thus more than one row.)  The expiration logic is to
 deal with the case that the operation never completed because nova
 crashed in the middle, and provides a stop-gap measure to ensure that
 the usage isn't counted against the user forever.

Just to confirm, the same UUID just exists in the reservation table only,
and temporary in one workers memory .?

 --
 Kevin L. Mitchell kevin.mitch...@rackspace.com
 Rackspace

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

PS.:
The `Refresh` is also strange thing in this context.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin

Re: [openstack-dev] [nova] if by archived you mean, wipes out your tables completely, then sure, it works fine

2015-03-16 Thread Attila Fazekas

Hi Mike,

The points was, there is no real need or real use case for archiving the db
as the nova-mange does.

What is the exact use case ? Auditing ? Accounting ?

* Keystone allows permanent delete, if you need to do auditing probably
  the user accounts would the primary target for saving.

* The logs+elasticsearch(or just grep) and ceilometer+mongodb is designed to
  help in `archiving` and keep the things what you actually need.

* After one year you can have ~100M deleted server instance record 
  in the shadow tables (+ the related rows), what to do with them ? Truncate ?
  If you have proper indexes on the main tables the deleted records mostly just
  consumes disk space, otherwise they also causes serious performance issues.

If anybody would like to keep the deleted things in SQL for whatever reason,
he very likely want to do in a different database instance on a different 
server,
it is also likely he would like to do some transformation(OLAP) instead of 
attacking
the production DB with full table scans while also invalidating the `Buffer 
Pool` content.

The feature as it is does not makes sense even after fixing the existing bugs.
I do not know what would be it's actual use case, even if there is one, 
probably it is
 not the best approach.

My suggestion is just nuke it,
and came up with `simple` script which archives the old records to /dev/null.
$ nova-mange db flush 7d 
This would deletes the soft-deleted records in small chunks (like token-flush). 

(or just stop doing soft-delete.)


- Original Message -
 From: Mike Bayer mba...@redhat.com
 To: Attila Fazekas afaze...@redhat.com
 Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Friday, March 13, 2015 5:04:21 PM
 Subject: Re: [openstack-dev] [nova] if by archived you mean,wipes 
 out your tables completely, then sure, it
 works fine
 
 
 
 Attila Fazekas afaze...@redhat.com wrote:
 
  The archiving has issues since very long time [1],
  something like this [2] is expected to replace it.
 
 
 yeah I was thinking of just rewriting the archive routine in Nova to be
 reasonable, but I can build this routine into Oslo.db as well as a generic
 “move rows with criteria X into tables”. Archiving as it is is mostly
 useless if it isn’t considering dependencies between tables
 (https://bugs.launchpad.net/nova/+bug/1183523) so the correct approach would
 need to consider tables and potentially rows in terms of foreign key
 dependency. This is what the unit of work was built to handle. Though I’m
 not sure I can make this a generic ORM play since we want to be able to
 delete “only N” rows, and it would probably be nice for the system to not
 spend its time reading in the entire DB if it is only tasked with a few
 dozen rows, so it might need to implement its own mini-unit-of-work system
 that works against the same paradigm but specific to this use case.
 
 The simplest case is that we address the archival of tables in order of
 foreign key dependency. However, that has two issues in the “generic” sense.
 One is that there can be cycles between tables, or a table that refers to
 itself has a cycle to itself. So in those cases the archival on a “sort the
 tables” basis needs to be broken into a “sort the rows” basis. This is what
 SQLAlchemy’s unit of work does and I’d adapt that here.
 
 The other possible, but probably unlikely, issue is that to address this
 “generically”, if a row “Table A row 1” is referred to by a “Table B row 2”,
 it might not be assumable that it is safe to remove “Table B Row 2” and
 *not* “Table A row 1”. The application may rely on both of these rows being
 present, and the SQLAlchemy pattern where this is the case is the so-called
 “joined table inheritance” case. But the “joined table inheritance” pattern
 is actually not very easy to adapt to the “shadow” model so I doubt anyone
 is doing that.

IMHO we should forget about solving how to move them safely to a different 
table,
the issue is how to delete them in relative small transactions
 ~100 instances(+referenced/related records), without causing full table scans 
or causing reference violation issues.

keystone token-flush also has a logic to do the delete in smaller chunks,
in order to do not stall regular processing for a long time or hit DB 
replication
limit issues. keystone targets to do 1000 row delete per transaction with 
mysql, 
some cases actually the deleted row number differs.

PS.:
Adding indexes on the deleted_at fields is acceptable.

  The archiving just move trash to the other side of the desk,
  usually just permanently deleting everything what is deleted
  for more than 7 day is better for everyone.
  
  For now, maybe just wiping out the shadow tables and the existing
  nova-mange
  functionality is better choice. [3]
  
  [1] https://bugs.launchpad.net/nova/+bug/1305892
  [2] https://blueprints.launchpad.net/nova/+spec/db-purge-engine
  [3]
  
  - Original Message

Re: [openstack-dev] [qa][tempest] Service tag blueprint incomplete

2015-03-16 Thread Attila Fazekas

- Original Message -
 From: Rohan Kanade openst...@rohankanade.com
 To: openstack-dev@lists.openstack.org
 Sent: Monday, March 16, 2015 1:13:12 PM
 Subject: [openstack-dev] [qa][tempest] Service tag blueprint incomplete

 Hi,

 I could find some tests in tempest are still not tagged with services as per
 blueprint  https://blueprints.launchpad.net/tempest/+spec/add-service-tags

 eg: .tempest.api.compute.test_live_block_migration:test_iscsi_volume (should
 have volume tag)

 I have started adding tags where appropriate

 https://review.openstack.org/#/c/164634/

 Please correct me if above observation is wrong.

The identity _service_ refers to keystone, that test is not really 
related more to keystone, than any other swift test
 when the auth backed is keystone.

Implicitly in practice almost all test is using keystone,
this it is not mentioned explicitly everywhere. The tests in the `identity` 
directory are considered taged, by the test selection logic.

Also we does not mention `image` when booting a nova server.

The service tags main expected usage is on the scenarios,
or when certain api has explicit feature regarding to other services.

The mentioned swift test case grants WORLD readability,
it is definitely AAA related thing, but in this case not really keystone 
related.

If we would like to distinguish the AAA related test we might need a different 
tag,
or we need to redefine the meaning of the `identity` tag. 

PS.:
AAA = Auth* =  Authentication, Authorization, and Accounting

 Regards,
 Rohan Kanade

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] if by archived you mean, wipes out your tables completely, then sure, it works fine

2015-03-13 Thread Attila Fazekas

The archiving has issues since very long time [1],
something like this [2] is expected to replace it.

The archiving just move trash to the other side of the desk,
usually just permanently deleting everything what is deleted
for more than 7 day is better for everyone.

For now, maybe just wiping out the shadow tables and the existing nova-mange 
functionality is better choice. [3]

[1] https://bugs.launchpad.net/nova/+bug/1305892
[2] https://blueprints.launchpad.net/nova/+spec/db-purge-engine
[3] https://bugs.launchpad.net/nova/+bug/1426873

- Original Message -
 From: Mike Bayer mba...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Friday, March 13, 2015 12:29:55 AM
 Subject: [openstack-dev] [nova] if by archived you mean,wipes out your 
 tables completely, then sure, it works
 fine
 
 Hello Nova -
 
 Not sure if I’m just staring at this for too long, or if
 archive_deleted_rows_for_table() is just not something we ever use.
 Because it looks like it’s really, really broken very disastrously, and I’m
 wondering if I’m just missing something in front of me.
 
 Let’s look at what it does!
 
 First, archive_deleted_rows() calls it with a table name. These names are
 taken by collecting every single table name from nova.db.sqlalchemy.models.
 
 Then, the function uses table reflection (that is, doesn’t look in the model
 at all, just goes right to the database) to load the table definitions:
 
 table = Table(tablename, metadata, autoload=True)
 shadow_tablename = _SHADOW_TABLE_PREFIX + tablename
 rows_archived = 0
 try:
 shadow_table = Table(shadow_tablename, metadata, autoload=True)
 except NoSuchTableError:
 # No corresponding shadow table; skip it.
 return rows_archived
 
 this is pretty heavy handed and wasteful from an efficiency point of view,
 and I’d like to fix this too, but let’s go with it. Now we have the two
 tables.
 
 Then we do this:
 
 deleted_column = table.c.deleted
 query_insert = sql.select([table],
   deleted_column != deleted_column.default).\
   order_by(column).limit(max_rows)
 query_delete = sql.select([column],
   deleted_column != deleted_column.default).\
   order_by(column).limit(max_rows)
 
 We make some SELECT statements that we’re going to use to find “soft
 deleted” rows, and these will be embedded into an INSERT
 and a DELETE. It is trying to make a statement like “SELECT .. FROM
 table WHERE deleted != deleted_default”, so that it finds rows where
 “deleted” has been changed to something, e.g. the row was
 soft deleted.
 
 But what’s the value of “deleted_default” ?   Remember, all this
 table knows is what the database just told us about it, because it only
 uses reflection.  Let’s see what the “deleted” column in a table like
 instance_types looks like:
 
 MariaDB [nova] show create table instance_types;
 | instance_types | CREATE TABLE `instance_types` (
   `created_at` datetime DEFAULT NULL,
 
   …  [omitted] ...
 
   `deleted` int(11) DEFAULT NULL,
 )
 
 The default that we get for this column is NULL. That is very interesting!
 Because, if we look at the *Python-side value of deleted*, we see something
 that is quite the opposite of NULL, e.g. a thing that is most certainly not
 null:
 
 class SoftDeleteMixin(object):
 deleted_at = Column(DateTime)
 deleted = Column(Integer, default=0)
 
 See that zero there? That’s a ***Python-side default***. It is **not the
 server default**!! You will **not** get it from reflection, the database has
 no clue about it (oddly enough, this entire subject matter is fully
 documented in SQLAlchemy’s documentation, and guess what, the docs are free!
 Read them all you like, I won’t ask for a dime, no questions asked!).
 
 So, all of our INSERTS **will** put a zero, not NULL, into that column.
 Let’s look in instance_types and see:
 
 MariaDB [nova] select id, name, deleted from instance_types;
 ++---+-+
 | id | name  | deleted |
 ++---+-+
 |  3 | m1.large  |   0 |
 |  1 | m1.medium |   0 |
 |  7 | m1.micro  |   0 |
 |  6 | m1.nano   |   0 |
 |  5 | m1.small  |   0 |
 |  2 | m1.tiny   |   0 |
 |  4 | m1.xlarge |   0 |
 ++---+-+
 7 rows in set (0.00 sec)
 
 No NULLs.  The value of non-deleted rows is zero.
 
 What does this all mean?
 
 It means, when this archival routine runs, it runs queries like this:
 
 INSERT INTO shadow_quota_usages SELECT quota_usages.created_at,
 quota_usages.updated_at, quota_usages.deleted_at, quota_usages.id,
 quota_usages.project_id, quota_usages.resource, quota_usages.in_use,
 quota_usages.reserved, quota_usages.until_refresh, quota_usages.deleted,
 quota_usages.user_id
 FROM quota_usages
 WHERE quota_usages.deleted IS NOT NULL ORDER BY quota_usages.id
  LIMIT ? OFFSET ?

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-10 Thread Attila Fazekas

- Original Message -
 From: Jay Pipes jaypi...@gmail.com
 To: openstack-dev@lists.openstack.org
 Sent: Wednesday, March 4, 2015 9:22:43 PM
 Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
 supported in nova-scheduler

 On 03/04/2015 01:51 AM, Attila Fazekas wrote:
  Hi,

  I wonder what is the planned future of the scheduling.

  The scheduler does a lot of high field number query,
  which is CPU expensive when you are using sqlalchemy-orm.
  Does anyone tried to switch those operations to sqlalchemy-core ?

 Actually, the scheduler does virtually no SQLAlchemy ORM queries. Almost
 all database access is serialized from the nova-scheduler through the
 nova-conductor service via the nova.objects remoting framework.

It does not helps you.

  The scheduler does lot of thing in the application, like filtering
  what can be done on the DB level more efficiently. Why it is not done
  on the DB side ?

 That's a pretty big generalization. Many filters (check out NUMA
 configuration, host aggregate extra_specs matching, any of the JSON
 filters, etc) don't lend themselves to SQL column-based sorting and
 filtering.

What a basic SQL query can do,
and what is the limit of the SQL is two different thing.
Even if you do not move everything to the DB side,
the dataset the application need to deal with could be limited.

  There are use cases when the scheduler would need to know even more data,
  Is there a plan for keeping `everything` in all schedulers process memory
  up-to-date ?
  (Maybe zookeeper)

 Zookeeper has nothing to do with scheduling decisions -- only whether or
 not a compute node's service descriptor is active or not. The end goal
 (after splitting the Nova scheduler out into Gantt hopefully at the
 start of the L release cycle) is to have the Gantt database be more
 optimized to contain the resource usage amounts of all resources
 consumed in the entire cloud, and to use partitioning/sharding to scale
 the scheduler subsystem, instead of having each scheduler process handle
 requests for all resources in the cloud (or cell...)

What is the current optional usage of zookeeper, 
and what it could be used for is very different thing.
The resource tracking is possible. 

  The opposite way would be to move most operation into the DB side,
  since the DB already knows everything.
  (stored procedures ?)

 See above. This assumes that the data the scheduler is iterating over is
 well-structured and consistent, and that is a false assumption.

With stored procedures you can do almost anything,
and in many ceases it is more readable than an complex query.

 Best,
 -jay

  Best Regards,
  Attila

  - Original Message -
  From: Rui Chen chenrui.m...@gmail.com
  To: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org
  Sent: Wednesday, March 4, 2015 4:51:07 AM
  Subject: [openstack-dev] [nova] blueprint about multiple workers supported
 in nova-scheduler

  Hi all,

  I want to make it easy to launch a bunch of scheduler processes on a host,
  multiple scheduler workers will make use of multiple processors of host
  and
  enhance the performance of nova-scheduler.

  I had registered a blueprint and commit a patch to implement it.
  https://blueprints.launchpad.net/nova/+spec/scheduler-multiple-workers-support

  This patch had applied in our performance environment and pass some test
  cases, like: concurrent booting multiple instances, currently we didn't
  find
  inconsistent issue.

  IMO, nova-scheduler should been scaled horizontally on easily way, the
  multiple workers should been supported as an out of box feature.

  Please feel free to discuss this feature, thanks.

  Best Regards

  __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

  __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-10 Thread Attila Fazekas

- Original Message -
 From: Nikola Đipanov ndipa...@redhat.com
 To: openstack-dev@lists.openstack.org
 Sent: Tuesday, March 10, 2015 10:53:01 AM
 Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
 supported in nova-scheduler

 On 03/06/2015 03:19 PM, Attila Fazekas wrote:
  Looks like we need some kind of _per compute node_ mutex in the critical
  section,
  multiple scheduler MAY be able to schedule to two compute node at same
  time,
  but not for scheduling to the same compute node.

  If we don't want to introduce another required component or
  reinvent the wheel there are some possible trick with the existing globally
  visible
  components like with the RDMS.

  `Randomized` destination choose is recommended in most of the possible
  solutions,
  alternatives are much more complex.

  One SQL example:

  * Add `sched_cnt`, defaul=0, Integer field; to a hypervisors related table.

  When the scheduler picks one (or multiple) node, he needs to verify is the
  node(s) are
  still good before sending the message to the n-cpu.

  It can be done by re-reading the ONLY the picked hypervisor(s) related
  data.
  with `LOCK IN SHARE MODE`.
  If the destination hyper-visors still OK:

  Increase the sched_cnt value exactly by 1,
  test is the UPDATE really update the required number of rows,
  the WHERE part needs to contain the previous value.

  You also need to update the resource usage on the hypervisor,
   by the expected cost of the new vms.

  If at least one selected node was ok, the transaction can be COMMITed.
  If you were able to COMMIT the transaction, the relevant messages
   can be sent.

  The whole process needs to be repeated with the items which did not passed
  the
  post verification.

  If a message sending failed, `act like` migrating the vm to another host.

  If multiple scheduler tries to pick multiple different host in different
  order,
  it can lead to a DEADLOCK situation.
  Solution: Try to have all scheduler to acquire to Shared RW locks in the
  same order,
  at the end.

  Galera multi-writer (Active-Active) implication:
  As always, retry on deadlock.

  n-sch + n-cpu crash at the same time:
  * If the scheduling is not finished properly, it might be fixed manually,
  or we need to solve which still alive scheduler instance is
  responsible for fixing the particular scheduling..

 So if I am reading the above correctly - you are basically proposing to
 move claims to the scheduler (we would atomically check if there were
 changes since the time we picked the host with the UPDATE .. WHERE using
 LOCK IN SHARE MODE (assuming REPEATABLE READS is the used isolation
 level) and then updating the usage, a.k.a doing the claim in the same
 transaction.

 The issue here is that we still have a window between sending the
 message, and the message getting picked up by the compute host (or
 timing out) or the instance outright failing, so for sure we will need
 to ack/nack the claim in some way on the compute side.

 I believe something like this has come up before under the umbrella term
 of moving claims to the scheduler, and was discussed in some detail on
 the latest Nova mid-cycle meetup, but only artifacts I could find were a
 few lines on this etherpad Sylvain pointed me to [1] that I am copying here:

 * White board the scheduler service interface
  ** note: this design won't change the existing way/logic of reconciling
 nova db != hypervisor view
  ** gantt should just return claim ids, not entire claim objects
  ** claims are acked as being in use via the resource tracker updates
 from nova-compute
  ** we still need scheduler retries for exceptional situations (admins
 doing things outside openstack, hardware changes / failures)
  ** retry logic in conductor? probably a separate item/spec

 As you can see - not much to go on (but that is material for a separate
 thread that I may start soon).

In my example, the resource needs to be considered as used before we get
anything back from the compute.
The resource can be `freed` at error handling,
hopefully be migrating to another node.

 The problem I have with this particular approach is that while it claims
 to fix some of the races (and probably does) it does so by 1) turning
 the current scheduling mechanism on it's head 2) and not providing any
 thought into the trade-offs that it will make. For example, we may get
 more correct scheduling in the general case and the correctness will not
 be affected by the number of workers, but how does the fact that we now
 do locking DB access on every request fare against the retry mechanism
 for some of the more common usage patterns. What is the increased
 overhead of calling back to he scheduler to confirm the claim? In the
 end - how do we even measure that we are going in the right direction
 with the new design.

 I personally think that different workloads will have different needs
 from the scheduler

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-10 Thread Attila Fazekas

- Original Message -
 From: Attila Fazekas afaze...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Tuesday, March 10, 2015 12:48:00 PM
 Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
 supported in nova-scheduler

 - Original Message -
  From: Nikola Đipanov ndipa...@redhat.com
  To: openstack-dev@lists.openstack.org
  Sent: Tuesday, March 10, 2015 10:53:01 AM
  Subject: Re: [openstack-dev] [nova] blueprint about multiple workers
  supported in nova-scheduler

  On 03/06/2015 03:19 PM, Attila Fazekas wrote:
   Looks like we need some kind of _per compute node_ mutex in the critical
   section,
   multiple scheduler MAY be able to schedule to two compute node at same
   time,
   but not for scheduling to the same compute node.

   If we don't want to introduce another required component or
   reinvent the wheel there are some possible trick with the existing
   globally
   visible
   components like with the RDMS.

   `Randomized` destination choose is recommended in most of the possible
   solutions,
   alternatives are much more complex.

   One SQL example:

   * Add `sched_cnt`, defaul=0, Integer field; to a hypervisors related
   table.

   When the scheduler picks one (or multiple) node, he needs to verify is
   the
   node(s) are
   still good before sending the message to the n-cpu.

   It can be done by re-reading the ONLY the picked hypervisor(s) related
   data.
   with `LOCK IN SHARE MODE`.
   If the destination hyper-visors still OK:

   Increase the sched_cnt value exactly by 1,
   test is the UPDATE really update the required number of rows,
   the WHERE part needs to contain the previous value.

   You also need to update the resource usage on the hypervisor,
by the expected cost of the new vms.

   If at least one selected node was ok, the transaction can be COMMITed.
   If you were able to COMMIT the transaction, the relevant messages
can be sent.

   The whole process needs to be repeated with the items which did not
   passed
   the
   post verification.

   If a message sending failed, `act like` migrating the vm to another host.

   If multiple scheduler tries to pick multiple different host in different
   order,
   it can lead to a DEADLOCK situation.
   Solution: Try to have all scheduler to acquire to Shared RW locks in the
   same order,
   at the end.

   Galera multi-writer (Active-Active) implication:
   As always, retry on deadlock.

   n-sch + n-cpu crash at the same time:
   * If the scheduling is not finished properly, it might be fixed manually,
   or we need to solve which still alive scheduler instance is
   responsible for fixing the particular scheduling..

  So if I am reading the above correctly - you are basically proposing to
  move claims to the scheduler (we would atomically check if there were
  changes since the time we picked the host with the UPDATE .. WHERE using
  LOCK IN SHARE MODE (assuming REPEATABLE READS is the used isolation
  level) and then updating the usage, a.k.a doing the claim in the same
  transaction.

  The issue here is that we still have a window between sending the
  message, and the message getting picked up by the compute host (or
  timing out) or the instance outright failing, so for sure we will need
  to ack/nack the claim in some way on the compute side.

  I believe something like this has come up before under the umbrella term
  of moving claims to the scheduler, and was discussed in some detail on
  the latest Nova mid-cycle meetup, but only artifacts I could find were a
  few lines on this etherpad Sylvain pointed me to [1] that I am copying
  here:

  * White board the scheduler service interface
   ** note: this design won't change the existing way/logic of reconciling
  nova db != hypervisor view
   ** gantt should just return claim ids, not entire claim objects
   ** claims are acked as being in use via the resource tracker updates
  from nova-compute
   ** we still need scheduler retries for exceptional situations (admins
  doing things outside openstack, hardware changes / failures)
   ** retry logic in conductor? probably a separate item/spec

  As you can see - not much to go on (but that is material for a separate
  thread that I may start soon).

 In my example, the resource needs to be considered as used before we get
 anything back from the compute.
 The resource can be `freed` at error handling,
 hopefully be migrating to another node.

  The problem I have with this particular approach is that while it claims
  to fix some of the races (and probably does) it does so by 1) turning
  the current scheduling mechanism on it's head 2) and not providing any
  thought into the trade-offs that it will make. For example, we may get
  more correct scheduling in the general case and the correctness will not
  be affected

Re: [openstack-dev] [nova][api] Microversions. And why do we need API extensions for new API functionality?

2015-03-09 Thread Attila Fazekas

- Original Message -
From: Christopher Yeoh cbky...@gmail.com
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Sent: Monday, March 9, 2015 1:04:15 PM
Subject: Re: [openstack-dev] [nova][api] Microversions. And why do we need
API extensions for new API functionality?

On Mon, Mar 9, 2015 at 10:08 PM, John Garbutt j...@johngarbutt.com wrote:

Hi,

I think I agree with Jay here, but let me explain...

On 8 March 2015 at 12:10, Alex Xu sou...@gmail.com wrote:
Thanks for Jay point this out! If we have agreement on this and document
it,
that will be great for guiding developer how to add new API.

Please could you submit a dev ref for this?

We can argue on the review, a bit like this one:
https://github.com/openstack/nova/blob/master/doc/source/devref/policy_enforcement.rst

For modularity, we need define what should be in a separated module(it is
extension now.) There are three cases:

1. Add new resource
This is totally worth to put in a separated module.

2. Add new sub-resource
like server-tags, I prefer to put in a separated module, I don't think
put another 100 lines code in the servers.py is good choice.

-1

I hate the idea of show instance extension code for version 2.4 living
separately to the rest of the instance show logic, when it really
doesn't have to.

It feels too heavyweight in its current form.

If the only thing server-tags did was to add a parameter then we wouldn't
need a new extension,
but its not, it adds another resource with associated actions

Maybe we need a more modular way of expressing the extension within
the same file?

I think servers.py is simply to big. Its much harder to read and debug than
any other plugin just because of its size - or
maybe I just need a 50 monitor :) I'd rather ensure functionality common
server-tags and the API is kept together rather than
spread through servers.py

No, it isn't.
It is bellow 2k line. I usually use low level tools even for python related
debugging.
For ex.: strace, gdb..
With the extension I get lot of files which may be involved may be not.
This causes me additional headache, because more difficult to see which file
is involved. After an strace I usually know what is the mistake, I just need to
find
it in the code.
I do not like when I had to open more than 3 files, after I see what went wrong.
I some cases I use gdb, just to get python stack traces just before the first
incorrect
step is detected, in other cases git grep is sufficient.

Actually for me the extensions increases the required monitor number,
some cases I also need to use more complicated approaches.
I tied lot of python profiler tool as well, but there is no single all cases
win version,
extra custom hack is required in many cases to get something close what I want.

3. extend attributes and methods for a existed resource
like add new attributes for servers, we can choice one of existed module
to put it in. Just like this patch https://review.openstack.org/#/c/155853/

I wish it was easier to read, but I hope thats fixable long term.

2015-03-08 8:31 GMT+08:00 Jay Pipes jaypi...@gmail.com :
Now that microversions have been introduced to the Nova API (meaning we
can now have novaclient request, say, version 2.3 of the Nova API using
the
special X-OpenStack-Nova-API-Version HTTP header), is there any good
reason
to require API extensions at all for *new* functionality.

As above, a new resource probably should get a new plugins/v3 module
right?

It feels (at worst) borderline in the os-server-tags case, due to the
extra actions.

What is the point of creating a new plugin/API extension for this new
functionality? Why can't we just modify the
nova/api/openstack/compute/server.py Controller.show() method and decorate
it with a 2.4 microversion that adds a tags attribute to the returned
server dictionary?

Similarly, new microversion API functionality should live in a module, as
a top-level (or subcollection) Controller in /nova/api/openstack/compute/,
and should not be in the /nova/api/openstack/compute/plugins/ directory.
Why? Because it's not a plugin.

Everything is a plugin in v3, no more distinction between core vs
plugin. It needs renaming really.

It should look just like servers, I guess, which is a top level item:
https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/plugins/v3/servers.py

Why are we continuing to use these awkward, messy, and cumbersome API
extensions?

We certainly should never be forced to add an extension to advertise
new functionality anymore.

Its a big reason why I want to see the API micro-versions succeed.

Yep, there is I think no reason except to support /extensions for now and I
don't really think its worth having
two entry points, one for modules which will appear in /extensions and

Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming features (was: [nova] blueprint about multiple workers)

2015-03-09 Thread Attila Fazekas

- Original Message -
 From: Mike Bayer mba...@redhat.com
 To: Attila Fazekas afaze...@redhat.com
 Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Friday, March 6, 2015 2:20:45 AM
 Subject: Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming 
 features (was: [nova] blueprint about
 multiple workers)

 Attila Fazekas afaze...@redhat.com wrote:

  I see lot of improvements,
  but cPython is still cPython.

  When you benchmarking query related things, please try to
  get the actual data from the returned objects

 that goes without saying. I’ve been benching SQLAlchemy and DBAPIs for many
 years. New performance improvements tend to be the priority for pretty much
 every major release.

  and try to do
  something with data what is not expected to be optimized out even by
  a smarter compiler.

 Well I tend to favor breaking out the different elements into individual
 tests here, though I guess if you’re trying to trick a JIT then the more
 composed versions may be more relevant. For example, I could already tell
 you that the AttributeDict thing would perform terribly without having to
 mix it up with the DB access. __getattr__ is a poor performer (learned that
 in SQLAlchemy 0.1 about 9 years ago).
Equivalent things also slower in perl. 

  Here is my play script and several numbers:
  http://www.fpaste.org/193999/25585380/raw/
  Is there any faster ORM way for the same op?

 Absolutely, as I’ve been saying for months all the way back in my wiki entry
 on forward, query for individual columns, also skip the session.rollback()
 and do a close() instead (the transaction is still rolled back, we just skip
 the bookkeeping we don’t need).  You get the nice attribute access
 pattern too:

The script probably will be extended with explicit transaction management,
I agree my close / rollback usage is bad and ugly.
Also thanks for the URL usage fix.

 http://www.fpaste.org/194098/56040781/

 def query_sqla_cols(self):
 SQLAlchemy yield(100) named tuples
 session = self.Session()
 start = time.time()
 summary = 0
 for obj in session.query(
 Ints.id, Ints.A, Ints.B, Ints.C).yield_per(100):
 summary += obj.id + obj.A + obj.B + obj.C
 session.rollback()
 end = time.time()
 return [end-start, summary]

 def query_sqla_cols_a3(self):
 SQLAlchemy yield(100) named tuples 3*access
 session = self.Session()
 start = time.time()
 summary = 0
 for obj in session.query(
 Ints.id, Ints.A, Ints.B, Ints.C).yield_per(100):
 summary += obj.id + obj.A + obj.B + obj.C
 summary += obj.id + obj.A + obj.B + obj.C
 summary += obj.id + obj.A + obj.B + obj.C
 session.rollback()
 end = time.time()
 return [end-start, summary/3]

 Here’s that:

 0 SQLAlchemy yield(100) named tuples: time: 0.635045 (data [18356026L])
 1 SQLAlchemy yield(100) named tuples: time: 0.630911 (data [18356026L])
 2 SQLAlchemy yield(100) named tuples: time: 0.641687 (data [18356026L])
 0 SQLAlchemy yield(100) named tuples 3*access: time: 0.807285 (data
 [18356026L])
 1 SQLAlchemy yield(100) named tuples 3*access: time: 0.814160 (data
 [18356026L])
 2 SQLAlchemy yield(100) named tuples 3*access: time: 0.829011 (data
 [18356026L])

 compared to the fastest Core test:

 0 SQlAlchemy core simple: time: 0.707205 (data [18356026L])
 1 SQlAlchemy core simple: time: 0.702223 (data [18356026L])
 2 SQlAlchemy core simple: time: 0.708816 (data [18356026L])

 This is using 1.0’s named tuple which is faster than the one in 0.9. As I
 discussed in the migration notes I linked, over here
 http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html#new-keyedtuple-implementation-dramatically-faster
 is where I discuss how I came up with that named tuple approach.

 In 0.9, the tuples are much slower (but still faster than straight entities):

 0 SQLAlchemy yield(100) named tuples: time: 1.083882 (data [18356026L])
 1 SQLAlchemy yield(100) named tuples: time: 1.097783 (data [18356026L])
 2 SQLAlchemy yield(100) named tuples: time: 1.113621 (data [18356026L])
 0 SQLAlchemy yield(100) named tuples 3*access: time: 1.204280 (data
 [18356026L])
 1 SQLAlchemy yield(100) named tuples 3*access: time: 1.245768 (data
 [18356026L])
 2 SQLAlchemy yield(100) named tuples 3*access: time: 1.258327 (data
 [18356026L])

 Also note that the difference in full object fetches for 0.9 vs. 1.0 are
 quite different:

 0.9.8:

 0 SQLAlchemy yield(100): time: 2.802273 (data [18356026L])
 1 SQLAlchemy yield(100): time: 2.778059 (data [18356026L])
 2 SQLAlchemy yield(100): time: 2.841441 (data [18356026L])

 1.0:

 0 SQLAlchemy yield(100): time: 2.019153 (data [18356026L])
 1 SQLAlchemy yield(100

[openstack-dev] [Tempest] isolation default config change notification

2015-03-09 Thread Attila Fazekas

Hi All,

This is follow up on [1].
Running the full tempest test-suite in parallel without the 
allow_tenant_isolation=True settings, can cause random not too obvious
failures, which caused lot of issue to tempest newcomers. 

There are special uses case when you might want to disable it,
for example when you would like to run just for several test cases for
benchmarking, when you know it is safe for sure, and you do not want to
include account creation related times to the result.

Now, the other case when you might want to disable this feature, when
you running tempest without admin account. This is expected to change
with the upcoming `test accounts` [2], where allow_tenant_isolation=True
is expected to be the recommended configuration. 
 
Best Regards,
Attila

[1] https://review.openstack.org/#/c/157052/
[2] https://blueprints.launchpad.net/tempest/+spec/test-accounts

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][api] Microversions. And why do we need API extensions for new API functionality?

2015-03-09 Thread Attila Fazekas

I agree with Jay.

The extension layer is also expensive in CPU usage,
and it also makes more difficult to troubleshoot issues.

- Original Message -
From: Jay Pipes jaypi...@gmail.com
To: OpenStack Development Mailing List openstack-dev@lists.openstack.org,
Sergey Nikitin
sniki...@mirantis.com
Sent: Sunday, March 8, 2015 1:31:34 AM
Subject: [openstack-dev] [nova][api] Microversions. And why do we need API
extensions for new API functionality?

Hi Stackers,

Now that microversions have been introduced to the Nova API (meaning we
can now have novaclient request, say, version 2.3 of the Nova API using
the special X-OpenStack-Nova-API-Version HTTP header), is there any good
reason to require API extensions at all for *new* functionality.

Sergey Nikitin is currently in the process of code review for the final
patch that adds server instance tagging to the Nova API:

https://review.openstack.org/#/c/128940

Unfortunately, for some reason I really don't understand, Sergey is
being required to create an API extension called os-server-tags in
order to add the server tag functionality to the API. The patch
implements the 2.4 Nova API microversion, though, as you can see from
this part of the patch:

https://review.openstack.org/#/c/128940/43/nova/api/openstack/compute/plugins/v3/server_tags.py

What is the point of creating a new plugin/API extension for this new
functionality? Why can't we just modify the
nova/api/openstack/compute/server.py Controller.show() method and
decorate it with a 2.4 microversion that adds a tags attribute to the
returned server dictionary?

https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/servers.py#L369

Because we're using an API extension for this new server tags
functionality, we are instead having the extension extend the server
dictionary with an os-server-tags:tags key containing the list of
string tags.

This is ugly and pointless. We don't need to use API extensions any more
for this stuff.

A client knows that server tags are supported by the 2.4 API
microversion. If the client requests the 2.4+ API, then we should just
include the tags attribute in the server dictionary.

Similarly, new microversion API functionality should live in a module,
as a top-level (or subcollection) Controller in
/nova/api/openstack/compute/, and should not be in the
/nova/api/openstack/compute/plugins/ directory. Why? Because it's not a
plugin.

Why are we continuing to use these awkward, messy, and cumbersome API
extensions?

Please, I am begging the Nova core team. Let us stop this madness. No
more API extensions.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-06 Thread Attila Fazekas

Looks like we need some kind of _per compute node_ mutex in the critical 
section,
multiple scheduler MAY be able to schedule to two compute node at same time,
but not for scheduling to the same compute node.

If we don't want to introduce another required component or
reinvent the wheel there are some possible trick with the existing globally 
visible
components like with the RDMS.

`Randomized` destination choose is recommended in most of the possible 
solutions,
alternatives are much more complex.

One SQL example:

* Add `sched_cnt`, defaul=0, Integer field; to a hypervisors related table.

When the scheduler picks one (or multiple) node, he needs to verify is the 
node(s) are 
still good before sending the message to the n-cpu.

It can be done by re-reading the ONLY the picked hypervisor(s) related data.
with `LOCK IN SHARE MODE`.
If the destination hyper-visors still OK:

Increase the sched_cnt value exactly by 1,
test is the UPDATE really update the required number of rows,
the WHERE part needs to contain the previous value.

You also need to update the resource usage on the hypervisor,
 by the expected cost of the new vms.

If at least one selected node was ok, the transaction can be COMMITed.
If you were able to COMMIT the transaction, the relevant messages 
 can be sent.

The whole process needs to be repeated with the items which did not passed the
post verification.

If a message sending failed, `act like` migrating the vm to another host.

If multiple scheduler tries to pick multiple different host in different order,
it can lead to a DEADLOCK situation.
Solution: Try to have all scheduler to acquire to Shared RW locks in the same 
order,
at the end.

Galera multi-writer (Active-Active) implication:
As always, retry on deadlock. 

n-sch + n-cpu crash at the same time:
* If the scheduling is not finished properly, it might be fixed manually,
or we need to solve which still alive scheduler instance is 
responsible for fixing the particular scheduling..


- Original Message -
 From: Nikola Đipanov ndipa...@redhat.com
 To: openstack-dev@lists.openstack.org
 Sent: Friday, March 6, 2015 10:29:52 AM
 Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
 supported in nova-scheduler
 
 On 03/06/2015 01:56 AM, Rui Chen wrote:
  Thank you very much for in-depth discussion about this topic, @Nikola
  and @Sylvain.
  
  I agree that we should solve the technical debt firstly, and then make
  the scheduler better.
  
 
 That was not necessarily my point.
 
 I would be happy to see work on how to make the scheduler less volatile
 when run in parallel, but the solution must acknowledge the eventually
 (or never really) consistent nature of the data scheduler has to operate
 on (in it's current design - there is also the possibility of offering
 an alternative design).
 
 I'd say that fixing the technical debt that is aimed at splitting the
 scheduler out of Nova is a mostly orthogonal effort.
 
 There have been several proposals in the past for how to make the
 scheduler horizontally scalable and improve it's performance. One that I
 remember from the Atlanta summit time-frame was the work done by Boris
 and his team [1] (they actually did some profiling and based their work
 on the bottlenecks they found). There are also some nice ideas in the
 bug lifeless filed [2] since this behaviour particularly impacts ironic.
 
 N.
 
 [1] https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
 [2] https://bugs.launchpad.net/nova/+bug/1341420
 
 
  Best Regards.
  
  2015-03-05 21:12 GMT+08:00 Sylvain Bauza sba...@redhat.com
  mailto:sba...@redhat.com:
  
  
  Le 05/03/2015 13:00, Nikola Đipanov a écrit :
  
  On 03/04/2015 09:23 AM, Sylvain Bauza wrote:
  
  Le 04/03/2015 04:51, Rui Chen a écrit :
  
  Hi all,
  
  I want to make it easy to launch a bunch of scheduler
  processes on a
  host, multiple scheduler workers will make use of
  multiple processors
  of host and enhance the performance of nova-scheduler.
  
  I had registered a blueprint and commit a patch to
  implement it.
  
  https://blueprints.launchpad.__net/nova/+spec/scheduler-__multiple-workers-support
  
  https://blueprints.launchpad.net/nova/+spec/scheduler-multiple-workers-support
  
  This patch had applied in our performance environment
  and pass some
  test cases, like: concurrent booting multiple instances,
  currently we
  didn't find inconsistent issue.
  
  IMO, nova-scheduler should been scaled horizontally on
  easily way, the
  multiple workers should been supported as an out of box
  feature.
  
  Please feel free to discuss this feature, thanks.

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-06 Thread Attila Fazekas

- Original Message -
 From: Attila Fazekas afaze...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Friday, March 6, 2015 4:19:18 PM
 Subject: Re: [openstack-dev] [nova] blueprint about multiple workers 
 supported in nova-scheduler

 Looks like we need some kind of _per compute node_ mutex in the critical
 section,
 multiple scheduler MAY be able to schedule to two compute node at same time,
 but not for scheduling to the same compute node.

 If we don't want to introduce another required component or
 reinvent the wheel there are some possible trick with the existing globally
 visible
 components like with the RDMS.

 `Randomized` destination choose is recommended in most of the possible
 solutions,
 alternatives are much more complex.

 One SQL example:

 * Add `sched_cnt`, defaul=0, Integer field; to a hypervisors related table.

 When the scheduler picks one (or multiple) node, he needs to verify is the
 node(s) are
 still good before sending the message to the n-cpu.

 It can be done by re-reading the ONLY the picked hypervisor(s) related data.
 with `LOCK IN SHARE MODE`.
 If the destination hyper-visors still OK:

 Increase the sched_cnt value exactly by 1,
 test is the UPDATE really update the required number of rows,
 the WHERE part needs to contain the previous value.

This part is very likely not needed, if all scheduler needs,
to update the (any) same field regarding to the same host, and they
acquire the RW lock for reading before they change it to WRITE lock.

Other strategy might consider pre acquiring the write lock only,
but the write intent is not sure before we re-read and verify the data.  

 You also need to update the resource usage on the hypervisor,
  by the expected cost of the new vms.

 If at least one selected node was ok, the transaction can be COMMITed.
 If you were able to COMMIT the transaction, the relevant messages
  can be sent.

 The whole process needs to be repeated with the items which did not passed
 the
 post verification.

 If a message sending failed, `act like` migrating the vm to another host.

 If multiple scheduler tries to pick multiple different host in different
 order,
 it can lead to a DEADLOCK situation.
 Solution: Try to have all scheduler to acquire to Shared RW locks in the same
 order,
 at the end.

 Galera multi-writer (Active-Active) implication:
 As always, retry on deadlock.

 n-sch + n-cpu crash at the same time:
 * If the scheduling is not finished properly, it might be fixed manually,
 or we need to solve which still alive scheduler instance is
 responsible for fixing the particular scheduling..

 - Original Message -
  From: Nikola Đipanov ndipa...@redhat.com
  To: openstack-dev@lists.openstack.org
  Sent: Friday, March 6, 2015 10:29:52 AM
  Subject: Re: [openstack-dev] [nova] blueprint about multiple workers
  supported in nova-scheduler

  On 03/06/2015 01:56 AM, Rui Chen wrote:
   Thank you very much for in-depth discussion about this topic, @Nikola
   and @Sylvain.

   I agree that we should solve the technical debt firstly, and then make
   the scheduler better.

  That was not necessarily my point.

  I would be happy to see work on how to make the scheduler less volatile
  when run in parallel, but the solution must acknowledge the eventually
  (or never really) consistent nature of the data scheduler has to operate
  on (in it's current design - there is also the possibility of offering
  an alternative design).

  I'd say that fixing the technical debt that is aimed at splitting the
  scheduler out of Nova is a mostly orthogonal effort.

  There have been several proposals in the past for how to make the
  scheduler horizontally scalable and improve it's performance. One that I
  remember from the Atlanta summit time-frame was the work done by Boris
  and his team [1] (they actually did some profiling and based their work
  on the bottlenecks they found). There are also some nice ideas in the
  bug lifeless filed [2] since this behaviour particularly impacts ironic.

  N.

  [1] https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
  [2] https://bugs.launchpad.net/nova/+bug/1341420

   Best Regards.

   2015-03-05 21:12 GMT+08:00 Sylvain Bauza sba...@redhat.com
   mailto:sba...@redhat.com:

   Le 05/03/2015 13:00, Nikola Đipanov a écrit :

   On 03/04/2015 09:23 AM, Sylvain Bauza wrote:

   Le 04/03/2015 04:51, Rui Chen a écrit :

   Hi all,

   I want to make it easy to launch a bunch of scheduler
   processes on a
   host, multiple scheduler workers will make use of
   multiple processors
   of host and enhance the performance of nova-scheduler.

   I had registered a blueprint and commit a patch to
   implement

Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

2015-03-06 Thread Attila Fazekas

Can you check is this patch does the right thing[1]:

[1] https://review.openstack.org/#/c/112523/6

- Original Message -
 From: Fredy Neeser fredy.nee...@solnet.ch
 To: openstack-dev@lists.openstack.org
 Sent: Friday, March 6, 2015 6:01:08 PM
 Subject: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes:   
 Avoiding the MTU pitfalls

 Hello world

 I recently created a VXLAN test setup with single-NIC compute nodes
 (using OpenStack Juno on Fedora 20), conciously ignoring the OpenStack
 advice of using nodes with at least 2 NICs ;-) .

 The fact that both native and encapsulated traffic needs to pass through
 the same NIC does create some interesting challenges, but finally I got
 it working cleanly, staying clear of MTU pitfalls ...

 I documented my findings here:

[1]
 http://blog.systemathic.ch/2015/03/06/openstack-vxlan-with-single-nic-compute-nodes/
[2]
 http://blog.systemathic.ch/2015/03/05/openstack-mtu-pitfalls-with-tunnels/

 For those interested in single-NIC setups, I'm curious what you think
 about [1]  (a small patch is needed to add VLAN awareness to the
 qg-XXX Neutron gateway ports).

 While catching up with Neutron changes for OpenStack Kilo, I came across
 the in-progress work on MTU selection and advertisement:

[3]  Spec:
 https://github.com/openstack/neutron-specs/blob/master/specs/kilo/mtu-selection-and-advertisement.rst
[4]  Patch review:  https://review.openstack.org/#/c/153733/
[5]  Spec update:  https://review.openstack.org/#/c/159146/

 Seems like [1] eliminates some additional MTU pitfalls that are not
 addressed by [3-5].

 But I think it would be nice if we could achieve [1] while coordinating
 with the MTU selection and advertisement work [3-5].

 Thoughts?

 Cheers,
 - Fredy

 Fredy (Freddie) Neeser
 http://blog.systeMathic.ch

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming features (was: [nova] blueprint about multiple workers)

2015-03-05 Thread Attila Fazekas

I see lot of improvements,
but cPython is still cPython.

When you benchmarking query related things, please try to
get the actual data from the returned objects and try to do
something with data what is not expected to be optimized out even by
a smarter compiler.

Here is my play script and several numbers:
http://www.fpaste.org/193999/25585380/raw/
Is there any faster ORM way for the same op?

Looks like still worth to convert the results to dict,
when you access the data multiple times.

dict is also the typical input type for the json serializers. 

The plain dict is good enough if you do not want to mange
which part is changed, especially when you are not planning to `save` it.

- Original Message -
 From: Mike Bayer mba...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Wednesday, March 4, 2015 11:30:49 PM
 Subject: Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming  
 features (was: [nova] blueprint about
 multiple workers)
 
 
 
 Mike Bayer mba...@redhat.com wrote:
 
  
  
  Attila Fazekas afaze...@redhat.com wrote:
  
  Hi,
  
  I wonder what is the planned future of the scheduling.
  
  The scheduler does a lot of high field number query,
  which is CPU expensive when you are using sqlalchemy-orm.
  Does anyone tried to switch those operations to sqlalchemy-core ?
  
  An upcoming feature in SQLAlchemy 1.0 will remove the vast majority of CPU
  overhead from the query side of SQLAlchemy ORM by caching all the work done
 
 Just to keep the Openstack community of what’s upcoming, here’s some more
 detail
 on some of the new SQLAlchemy performance features, which are based on the
 goals I first set up last summer at
 https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy.
 
 As 1.0 features a lot of new styles of doing things that are primarily in
 the name of performance, in order to help categorize and document these
 techniques, 1.0 includes a performance suite in examples/ which features a
 comprehensive collection of common database idioms run under timing and
 function-count profiling. These idioms are broken into major categories like
 “short selects”, “large resultsets”, “bulk inserts”, and serve not only as a
 way to compare the relative performance of different techniques, but also as
 a way to provide example code categorized into use cases that illustrate the
 variety of ways to achieve that case, including the tradeoffs for each,
 across Core and ORM. So in this case, we can see what the “baked” query
 looks like in the “short_selects” suite, which times how long it takes to
 perform 1 queries, each of which return one object or row:
 
 https://bitbucket.org/zzzeek/sqlalchemy/src/cc58a605d6cded0594f7db1caa840b3c00b78e5a/examples/performance/short_selects.py?at=ticket_3054#cl-73
 
 The results of this suite look like the following:
 
 test_orm_query : test a straight ORM query of the full entity. (1
 iterations); total time 7.363434 sec
 test_orm_query_cols_only : test an ORM query of only the entity columns.
 (1 iterations); total time 6.509266 sec
 test_baked_query : test a baked query of the full entity. (1 iterations);
 total time 1.999689 sec
 test_baked_query_cols_only : test a baked query of only the entity columns.
 (1 iterations); total time 1.990916 sec
 test_core_new_stmt_each_time : test core, creating a new statement each time.
 (1 iterations); total time 3.842871 sec
 test_core_reuse_stmt : test core, reusing the same statement (but recompiling
 each time). (1 iterations); total time 2.806590 sec
 test_core_reuse_stmt_compiled_cache : test core, reusing the same statement +
 compiled cache. (1 iterations); total time 0.659902 sec
 
 Where above, “test_orm” and “test_baked” are both using the ORM API
 exclusively. We can see that the “baked” approach, returning column tuples
 is almost twice as fast as a naive Core approach, that is, one which
 constructs select() objects each time and does not attempt to use any
 compilation caching.
 
 For the use case of fetching large numbers of rows, we can look at the
 large_resultsets suite
 (https://bitbucket.org/zzzeek/sqlalchemy/src/cc58a605d6cded0594f7db1caa840b3c00b78e5a/examples/performance/large_resultsets.py?at=ticket_3054).
 This suite illustrates a single query which fetches 500K rows. The “Baked”
 approach isn’t relevant here as we are only emitting a query once, however
 the approach we use to fetch rows is significant. Here we can see that
 ORM-based “tuple” approaches are very close in speed to the fetching of rows
 using Core directly. We also have a comparison of Core against raw DBAPI
 access, where we see very little speed improvement; an example where we
 create a very simple object for each DBAPI row fetched is also present to
 illustrate how quickly even the most minimal Python function overhead adds
 up when we do something 500K times.
 
 test_orm_full_objects_list : Load fully

Re: [openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

2015-03-04 Thread Attila Fazekas

Hi,

I wonder what is the planned future of the scheduling.

The scheduler does a lot of high field number query,
which is CPU expensive when you are using sqlalchemy-orm.
Does anyone tried to switch those operations to sqlalchemy-core ?

The scheduler does lot of thing in the application, like filtering 
what can be done on the DB level more efficiently. Why it is not done
on the DB side ? 

There are use cases when the scheduler would need to know even more data,
Is there a plan for keeping `everything` in all schedulers process memory 
up-to-date ?
(Maybe zookeeper)

The opposite way would be to move most operation into the DB side,
since the DB already knows everything. 
(stored procedures ?)

Best Regards,
Attila


- Original Message -
 From: Rui Chen chenrui.m...@gmail.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Wednesday, March 4, 2015 4:51:07 AM
 Subject: [openstack-dev] [nova] blueprint about multiple workers supported
 in nova-scheduler
 
 Hi all,
 
 I want to make it easy to launch a bunch of scheduler processes on a host,
 multiple scheduler workers will make use of multiple processors of host and
 enhance the performance of nova-scheduler.
 
 I had registered a blueprint and commit a patch to implement it.
 https://blueprints.launchpad.net/nova/+spec/scheduler-multiple-workers-support
 
 This patch had applied in our performance environment and pass some test
 cases, like: concurrent booting multiple instances, currently we didn't find
 inconsistent issue.
 
 IMO, nova-scheduler should been scaled horizontally on easily way, the
 multiple workers should been supported as an out of box feature.
 
 Please feel free to discuss this feature, thanks.
 
 Best Regards
 
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-12 Thread Attila Fazekas

- Original Message -
From: Attila Fazekas afaze...@redhat.com
To: Jay Pipes jaypi...@gmail.com
Cc: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org, Pavel Kholkin pkhol...@mirantis.com
Sent: Thursday, February 12, 2015 11:52:39 AM
Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
should know about Galera

- Original Message -
 From: Jay Pipes jaypi...@gmail.com
 To: Attila Fazekas afaze...@redhat.com
 Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org, Pavel
 Kholkin pkhol...@mirantis.com
 Sent: Wednesday, February 11, 2015 9:52:55 PM
 Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
 should know about Galera

 On 02/11/2015 06:34 AM, Attila Fazekas wrote:
  - Original Message -
  From: Jay Pipes jaypi...@gmail.com
  To: Attila Fazekas afaze...@redhat.com
  Cc: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org, Pavel
  Kholkin pkhol...@mirantis.com
  Sent: Tuesday, February 10, 2015 7:32:11 PM
  Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody
  should know about Galera

  On 02/10/2015 06:28 AM, Attila Fazekas wrote:
  - Original Message -
  From: Jay Pipes jaypi...@gmail.com
  To: Attila Fazekas afaze...@redhat.com, OpenStack Development
  Mailing
  List (not for usage questions)
  openstack-dev@lists.openstack.org
  Cc: Pavel Kholkin pkhol...@mirantis.com
  Sent: Monday, February 9, 2015 7:15:10 PM
  Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things
  everybody
  should know about Galera

  On 02/09/2015 01:02 PM, Attila Fazekas wrote:
  I do not see why not to use `FOR UPDATE` even with multi-writer or
  Is the retry/swap way really solves anything here.
  snip
  Am I missed something ?

  Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
  that are needed to support SELECT FOR UPDATE statements across multiple
  cluster nodes.

  Galere does not replicates the row-level locks created by UPDATE/INSERT
  ...
  So what to do with the UPDATE?

  No, Galera replicates the write sets (binary log segments) for
  UPDATE/INSERT/DELETE statements -- the things that actually
  change/add/remove records in DB tables. No locks are replicated, ever.

  Galera does not do any replication at UPDATE/INSERT/DELETE time.

  $ mysql
  use test;
  CREATE TABLE test (id integer PRIMARY KEY AUTO_INCREMENT, data CHAR(64));

  $(echo 'use test; BEGIN;'; while true ; do echo 'INSERT INTO test(data)
  VALUES (test);'; done )  | mysql

  The writer1 is busy, the other nodes did not noticed anything about the
  above pending
  transaction, for them this transaction does not exists as long as you do
  not call a COMMIT.

  Any kind of DML/DQL you issue without a COMMIT does not happened in the
  other nodes perspective.

  Replication happens at COMMIT time if the `write sets` is not empty.

 We're going in circles here. I was just pointing out that SELECT ... FOR
 UPDATE will never replicate anything. INSERT/UPDATE/DELETE statements
 will cause a write-set to be replicated (yes, upon COMMIT of the
 containing transaction).

 Please see my repeated statements in this thread and others that the
 compare-and-swap technique is dependent on issuing *separate*
 transactions for each SELECT and UPDATE statement...

  When a transaction wins a voting, the other nodes rollbacks all transaction
  which had a local conflicting row lock.

 A SELECT statement in a separate transaction does not ever trigger a
 ROLLBACK, nor will an UPDATE statement that does not match any rows.
 That is IMO how increased throughput is achieved in the compare-and-swap
 technique versus the SELECT FOR UPDATE technique.

yes, I mentioned this way in one bug [0].

But the related changes on the review, actually works as I said [1][2][3],
and the SELECT is not in a separated dedicated transaction.

[0] https://bugs.launchpad.net/neutron/+bug/1410854 [sorry I sent a wrong link 
before]
[1] https://review.openstack.org/#/c/143837/
[2] https://review.openstack.org/#/c/153558/
[3] https://review.openstack.org/#/c/149261/

 -jay

 -jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-12 Thread Attila Fazekas

- Original Message -
 From: Jay Pipes jaypi...@gmail.com
 To: Attila Fazekas afaze...@redhat.com
 Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org, Pavel
 Kholkin pkhol...@mirantis.com
 Sent: Wednesday, February 11, 2015 9:52:55 PM
 Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
 should know about Galera

 On 02/11/2015 06:34 AM, Attila Fazekas wrote:
  - Original Message -
  From: Jay Pipes jaypi...@gmail.com
  To: Attila Fazekas afaze...@redhat.com
  Cc: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org, Pavel
  Kholkin pkhol...@mirantis.com
  Sent: Tuesday, February 10, 2015 7:32:11 PM
  Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody
  should know about Galera

  On 02/10/2015 06:28 AM, Attila Fazekas wrote:
  - Original Message -
  From: Jay Pipes jaypi...@gmail.com
  To: Attila Fazekas afaze...@redhat.com, OpenStack Development
  Mailing
  List (not for usage questions)
  openstack-dev@lists.openstack.org
  Cc: Pavel Kholkin pkhol...@mirantis.com
  Sent: Monday, February 9, 2015 7:15:10 PM
  Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things
  everybody
  should know about Galera

  On 02/09/2015 01:02 PM, Attila Fazekas wrote:
  I do not see why not to use `FOR UPDATE` even with multi-writer or
  Is the retry/swap way really solves anything here.
  snip
  Am I missed something ?

  Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
  that are needed to support SELECT FOR UPDATE statements across multiple
  cluster nodes.

  Galere does not replicates the row-level locks created by UPDATE/INSERT
  ...
  So what to do with the UPDATE?

  No, Galera replicates the write sets (binary log segments) for
  UPDATE/INSERT/DELETE statements -- the things that actually
  change/add/remove records in DB tables. No locks are replicated, ever.

  Galera does not do any replication at UPDATE/INSERT/DELETE time.

  $ mysql
  use test;
  CREATE TABLE test (id integer PRIMARY KEY AUTO_INCREMENT, data CHAR(64));

  $(echo 'use test; BEGIN;'; while true ; do echo 'INSERT INTO test(data)
  VALUES (test);'; done )  | mysql

  The writer1 is busy, the other nodes did not noticed anything about the
  above pending
  transaction, for them this transaction does not exists as long as you do
  not call a COMMIT.

  Any kind of DML/DQL you issue without a COMMIT does not happened in the
  other nodes perspective.

  Replication happens at COMMIT time if the `write sets` is not empty.

 We're going in circles here. I was just pointing out that SELECT ... FOR
 UPDATE will never replicate anything. INSERT/UPDATE/DELETE statements
 will cause a write-set to be replicated (yes, upon COMMIT of the
 containing transaction).

 Please see my repeated statements in this thread and others that the
 compare-and-swap technique is dependent on issuing *separate*
 transactions for each SELECT and UPDATE statement...

  When a transaction wins a voting, the other nodes rollbacks all transaction
  which had a local conflicting row lock.

 A SELECT statement in a separate transaction does not ever trigger a
 ROLLBACK, nor will an UPDATE statement that does not match any rows.
 That is IMO how increased throughput is achieved in the compare-and-swap
 technique versus the SELECT FOR UPDATE technique.

yes, I mentioned this way in one bug [0].

But the related changes on the review, actually works as I said [1][2][3],
and the SELECT is not in a separated dedicated transaction.

[0] https://blueprints.launchpad.net/nova/+spec/lock-free-quota-management
[1] https://review.openstack.org/#/c/143837/
[2] https://review.openstack.org/#/c/153558/
[3] https://review.openstack.org/#/c/149261/

 -jay

 -jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-11 Thread Attila Fazekas

- Original Message -
 From: Jay Pipes jaypi...@gmail.com
 To: Attila Fazekas afaze...@redhat.com
 Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org, Pavel
 Kholkin pkhol...@mirantis.com
 Sent: Tuesday, February 10, 2015 7:32:11 PM
 Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
 should know about Galera

 On 02/10/2015 06:28 AM, Attila Fazekas wrote:
  - Original Message -
  From: Jay Pipes jaypi...@gmail.com
  To: Attila Fazekas afaze...@redhat.com, OpenStack Development Mailing
  List (not for usage questions)
  openstack-dev@lists.openstack.org
  Cc: Pavel Kholkin pkhol...@mirantis.com
  Sent: Monday, February 9, 2015 7:15:10 PM
  Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody
  should know about Galera

  On 02/09/2015 01:02 PM, Attila Fazekas wrote:
  I do not see why not to use `FOR UPDATE` even with multi-writer or
  Is the retry/swap way really solves anything here.
  snip
  Am I missed something ?

  Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
  that are needed to support SELECT FOR UPDATE statements across multiple
  cluster nodes.

  Galere does not replicates the row-level locks created by UPDATE/INSERT ...
  So what to do with the UPDATE?

 No, Galera replicates the write sets (binary log segments) for
 UPDATE/INSERT/DELETE statements -- the things that actually
 change/add/remove records in DB tables. No locks are replicated, ever.

Galera does not do any replication at UPDATE/INSERT/DELETE time. 

$ mysql
use test;
CREATE TABLE test (id integer PRIMARY KEY AUTO_INCREMENT, data CHAR(64));

$(echo 'use test; BEGIN;'; while true ; do echo 'INSERT INTO test(data) VALUES 
(test);'; done )  | mysql

The writer1 is busy, the other nodes did not noticed anything about the above 
pending
transaction, for them this transaction does not exists as long as you do not 
call a COMMIT.

Any kind of DML/DQL you issue without a COMMIT does not happened in the other 
nodes perspective.

Replication happens at COMMIT time if the `write sets` is not empty.

When a transaction wins a voting, the other nodes rollbacks all transaction
which had a local conflicting row lock.

  Why should I handle the FOR UPDATE differently?

 Because SELECT FOR UPDATE doesn't change any rows, and therefore does
 not trigger any replication event in Galera.

What matters is the full transaction changed any row at COMMIT time or not.
The DMLs them-self does not starts a replication as `SELECT FOR UPDATE` does 
not.

 See here:

 http://www.percona.com/blog/2014/09/11/openstack-users-shed-light-on-percona-xtradb-cluster-deadlock-issues/

 -jay

  https://groups.google.com/forum/#!msg/codership-team/Au1jVFKQv8o/QYV_Z_t5YAEJ

  Best,
  -jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-10 Thread Attila Fazekas

- Original Message -
 From: Jay Pipes jaypi...@gmail.com
 To: openstack-dev@lists.openstack.org
 Sent: Monday, February 9, 2015 9:36:45 PM
 Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
 should know about Galera

 On 02/09/2015 03:10 PM, Clint Byrum wrote:
  Excerpts from Jay Pipes's message of 2015-02-09 10:15:10 -0800:
  On 02/09/2015 01:02 PM, Attila Fazekas wrote:
  I do not see why not to use `FOR UPDATE` even with multi-writer or
  Is the retry/swap way really solves anything here.
  snip
  Am I missed something ?

  Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
  that are needed to support SELECT FOR UPDATE statements across multiple
  cluster nodes.

  https://groups.google.com/forum/#!msg/codership-team/Au1jVFKQv8o/QYV_Z_t5YAEJ

  Attila acknowledged that. What Attila was saying was that by using it
  with Galera, the box that is doing the FOR UPDATE locks will simply fail
  upon commit because a conflicting commit has already happened and arrived
  from the node that accepted the write. Further what Attila is saying is
  that this means there is not such an obvious advantage to the CAS method,
  since the rollback and the # updated rows == 0 are effectively equivalent
  at this point, seeing as the prior commit has already arrived and thus
  will not need to wait to fail certification and be rolled back.

 No, that is not correct. In the case of the CAS technique, the frequency
 of rollbacks due to certification failure is demonstrably less than when
 using SELECT FOR UPDATE and relying on the certification timeout error
 to signal a deadlock.

  I am not entirely certain that is true though, as I think what will
  happen in sequential order is:

  writer1: UPDATE books SET genre = 'Scifi' WHERE genre = 'sciencefiction';
  writer1: -- send in-progress update to cluster
  writer2: SELECT FOR UPDATE books WHERE id=3;
  writer1: COMMIT
  writer1: -- try to certify commit in cluster
  ** Here is where I stop knowing for sure what happens **
  writer2: certifies writer1's transaction or blocks?

 It will certify writer1's transaction. It will only block another thread
 hitting writer2 requesting write locks or write-intent read locks on the
 same records.

  writer2: UPDATE books SET genre = 'sciencefiction' WHERE id=3;
  writer2: COMMIT -- One of them is rolled back.

The other transaction can be rolled back before you do an actual commit:
writer1: BEGIN
writer2: BEGIN
writer1: update test set val=42 where id=1;
writer2: update test set val=42 where id=1;
writer1: COMMIT
writer2: show variables;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting 
transaction

As you can see 2th transaction failed without issuing a COMMIT after the 1th 
one committed.
You could write anything to mysql on writer2 at this point,
 even invalid things returns with `Deadlock`.

  So, at that point where I'm not sure (please some Galera expert tell
  me):

  If what happens is as I suggest, writer1's transaction is certified,
  then that just means the lock sticks around blocking stuff on writer2,
  but that the data is updated and it is certain that writer2's commit will
  be rolled back. However, if it blocks waiting on the lock to resolve,
  then I'm at a loss to determine which transaction would be rolled back,
  but I am thinking that it makes sense that the transaction from writer2
  would be rolled back, because the commit is later.

 That is correct. writer2's transaction would be rolled back. The
 difference is that the CAS method would NOT trigger a ROLLBACK. It would
 instead return 0 rows affected, because the UPDATE statement would
 instead look like this:

 UPDATE books SET genre = 'sciencefiction' WHERE id = 3 AND genre = 'SciFi';

 And the return of 0 rows affected would trigger a simple retry of the
 read and then update attempt on writer2 instead of dealing with ROLLBACK
 semantics on the transaction.

 Note that in the CAS method, the SELECT statement and the UPDATE are in
 completely different transactions. This is a very important thing to
 keep in mind.

  All this to say that usually the reason for SELECT FOR UPDATE is not
  to only do an update (the transactional semantics handle that), but
  also to prevent the old row from being seen again, which, as Jay says,
  it cannot do.  So I believe you are both correct:

  * Attila, yes I think you're right that CAS is not any more efficient
  at replacing SELECT FOR UPDATE from a blocking standpoint.

 It is more efficient because there are far fewer ROLLBACKs of
 transactions occurring in the system.

 If you look at a slow query log (with a 0 slow query time) for a MySQL
 Galera server in a multi-write cluster during a run of Tempest or Rally,
 you will notice that the number of ROLLBACK statements is extraordinary.
 AFAICR, when Peter Boros and I benchmarked a Rally launch and delete 10K
 VM run, we saw nearly 11% of *total* queries executed

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-10 Thread Attila Fazekas

- Original Message -
 From: Jay Pipes jaypi...@gmail.com
 To: Attila Fazekas afaze...@redhat.com, OpenStack Development Mailing 
 List (not for usage questions)
 openstack-dev@lists.openstack.org
 Cc: Pavel Kholkin pkhol...@mirantis.com
 Sent: Monday, February 9, 2015 7:15:10 PM
 Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
 should know about Galera

 On 02/09/2015 01:02 PM, Attila Fazekas wrote:
  I do not see why not to use `FOR UPDATE` even with multi-writer or
  Is the retry/swap way really solves anything here.
 snip
  Am I missed something ?

 Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
 that are needed to support SELECT FOR UPDATE statements across multiple
 cluster nodes.

Galere does not replicates the row-level locks created by UPDATE/INSERT ...
So what to do with the UPDATE ?

Why should I handle the FOR UPDATE differently ?

 https://groups.google.com/forum/#!msg/codership-team/Au1jVFKQv8o/QYV_Z_t5YAEJ

 Best,
 -jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-09 Thread Attila Fazekas

- Original Message -
 From: Jay Pipes jaypi...@gmail.com
 To: openstack-dev@lists.openstack.org, Pavel Kholkin pkhol...@mirantis.com
 Sent: Wednesday, February 4, 2015 8:04:10 PM
 Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
 should know about Galera

 On 02/04/2015 12:05 PM, Sahid Orentino Ferdjaoui wrote:
  On Wed, Feb 04, 2015 at 04:30:32PM +, Matthew Booth wrote:
  I've spent a few hours today reading about Galera, a clustering solution
  for MySQL. Galera provides multi-master 'virtually synchronous'
  replication between multiple mysql nodes. i.e. I can create a cluster of
  3 mysql dbs and read and write from any of them with certain consistency
  guarantees.

  I am no expert[1], but this is a TL;DR of a couple of things which I
  didn't know, but feel I should have done. The semantics are important to
  application design, which is why we should all be aware of them.

  * Commit will fail if there is a replication conflict

  foo is a table with a single field, which is its primary key.

  A: start transaction;
  B: start transaction;
  A: insert into foo values(1);
  B: insert into foo values(1); -- 'regular' DB would block here, and
 report an error on A's commit
  A: commit; -- success
  B: commit; -- KABOOM

  Confusingly, Galera will report a 'deadlock' to node B, despite this not
  being a deadlock by any definition I'm familiar with.

 It is a failure to certify the writeset, which bubbles up as an InnoDB
 deadlock error. See my article here:

 http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/

 Which explains this.

I do not see why not to use `FOR UPDATE` even with multi-writer or
Is the retry/swap way really solves anything here.

Using 'FOR UPDATE' in with 'repeatable read' isolation level, seams still more 
efficient
and has several advantages.

* The SELECT with 'FOR UPDATE' will read the committed version, so you do not 
really need to
  worry about when the transaction actually started. You will get fresh data 
before you reaching the
  actual UPDATE.

* In the article the example query will not return 
  new version of data in the same transaction even if you are retrying, so
  you need to restart the transaction anyway.

  When you are using the 'FOR UPDATE' way if any other transaction successfully 
commits conflicting
  row on any other galera writer, your pending transaction will be rolled back 
at your next statement,
  WITHOUT spending any time in certificating that transaction.
  In this perspective the checking the number after the update `Compare and 
swap` or
  handling an exception does not makes any difference.

* Using FOR UPDATE in a galera transaction (multi-writer) is not more evil than 
using UPDATE, 
  concurrent commit invalidates both of them in the same way (DBDeadlock).  

* The 'FOR UPDATE' if you are using just a `single writer` does not lets other 
threads to do useless work
  while wasting resources.

* The swap way also can be rolled back by galera almost anywhere (DBDeadLock).
  At the end the swap way looks like it just replaced  the exception handling,
  with a return code check + manual transaction restart.

Am I missed something ?

  Yes ! and if I can add more information and I hope I do not make
  mistake I think it's a know issue which comes from MySQL, that is why
  we have a decorator to do a retry and so handle this case here:

  http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py#n177

 It's not an issue with MySQL. It's an issue with any database code that
 is highly contentious.

 Almost all highly distributed or concurrent applications need to handle
 deadlock issues, and the most common way to handle deadlock issues on
 database records is using a retry technique. There's nothing new about
 that with Galera.

 The issue with our use of the @_retry_on_deadlock decorator is *not*
 that the retry decorator is not needed, but rather it is used too
 frequently. The compare-and-swap technique I describe in the article
 above dramatically* reduces the number of deadlocks that occur (and need
 to be handled by the @_retry_on_deadlock decorator) and dramatically
 reduces the contention over critical database sections.

 Best,
 -jay

 * My colleague Pavel Kholkin is putting together the results of a
 benchmark run that compares the compare-and-swap method with the raw
 @_retry_on_deadlock decorator method. Spoiler: the compare-and-swap
 method cuts the runtime of the benchmark by almost *half*.

  Essentially, anywhere that a regular DB would block, Galera will not
  block transactions on different nodes. Instead, it will cause one of the
  transactions to fail on commit. This is still ACID, but the semantics
  are quite different.

  The impact of this is that code which makes correct use of locking may
  still fail with a 'deadlock'. The solution to this is to either fail the

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-05 Thread Attila Fazekas

- Original Message -
 From: Matthew Booth mbo...@redhat.com
 To: openstack-dev@lists.openstack.org
 Sent: Thursday, February 5, 2015 12:32:33 PM
 Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
 should know about Galera

 On 05/02/15 11:01, Attila Fazekas wrote:
  I have a question related to deadlock handling as well.

  Why the DBDeadlock exception is not caught generally for all api/rpc
  request ?

  The mysql recommendation regarding to Deadlocks [1]:
  Normally, you must write your applications so that they are always
   prepared to re-issue a transaction if it gets rolled back because of a
   deadlock.

 This is evil imho, although it may well be pragmatic. A deadlock (a real
 deadlock, that is) occurs because of a preventable bug in code. It
 occurs because 2 transactions have attempted to take multiple locks in a
 different order. Getting this right is hard, but it is achievable. The
 solution to real deadlocks is to fix the bugs.

 Galera 'deadlocks' on the other hand are not deadlocks, despite being
 reported as such (sounds as though this is due to an implementation
 quirk?). They don't involve 2 transactions holding mutual locks, and
 there is never any doubt about how to proceed. They involve 2
 transactions holding the same lock, and 1 of them committed first. In a
 real deadlock they wouldn't get as far as commit. This isn't any kind of
 bug: it's normal behaviour in this environment and you just have to
 handle it.

  Now the services are just handling the DBDeadlock in several places.
  We have some logstash hits for other places even without galera.

 I haven't had much success with logstash. Could you post a query which
 would return these? This would be extremely interesting.

Just use this:
message: DBDeadlock

If you would like to exclude the lock wait timeout ones:
message: Deadlock found when trying to get lock

  Instead of throwing 503 to the end user, the request could be repeated
  `silently`.

  The users would be able repeat the request himself,
  so the automated repeat should not cause unexpected new problem.

 Good point: we could argue 'no worse than now', even if it's buggy.

  The retry limit might be configurable, the exception needs to be watched
  before
  anything sent to the db on behalf of the transaction or request.

  Considering all request handler as potential deadlock thrower seams much
  easier than,
  deciding case by case.

 Well this happens at the transaction level, and we don't quite have a
 1:1 request:transaction relationship. We're moving towards it, but
 potentially long running requests will always have to use multiple
 transactions.

 However, I take your point. I think retry on transaction failure is
 something which would benefit from standard handling in a library.

 Matt
 --
 Matthew Booth
 Red Hat Engineering, Virtualisation Team

 Phone: +442070094448 (UK)
 GPG ID:  D33C3490
 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

2015-02-05 Thread Attila Fazekas

I have a question related to deadlock handling as well.

Why the DBDeadlock exception is not caught generally for all api/rpc request ?

The mysql recommendation regarding to Deadlocks [1]:
Normally, you must write your applications so that they are always 
 prepared to re-issue a transaction if it gets rolled back because of a 
deadlock.

Now the services are just handling the DBDeadlock in several places.
We have some logstash hits for other places even without galera.

Instead of throwing 503 to the end user, the request could be repeated 
`silently`.

The users would be able repeat the request himself,
so the automated repeat should not cause unexpected new problem.

The retry limit might be configurable, the exception needs to be watched before
anything sent to the db on behalf of the transaction or request.

Considering all request handler as potential deadlock thrower seams much easier 
than,
deciding case by case.  

[1] http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html

- Original Message -
 From: Matthew Booth mbo...@redhat.com
 To: openstack-dev@lists.openstack.org
 Sent: Thursday, February 5, 2015 10:36:55 AM
 Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody 
 should know about Galera
 
 On 04/02/15 17:05, Sahid Orentino Ferdjaoui wrote:
  * Commit will fail if there is a replication conflict
 
  foo is a table with a single field, which is its primary key.
 
  A: start transaction;
  B: start transaction;
  A: insert into foo values(1);
  B: insert into foo values(1); -- 'regular' DB would block here, and
report an error on A's commit
  A: commit; -- success
  B: commit; -- KABOOM
 
  Confusingly, Galera will report a 'deadlock' to node B, despite this not
  being a deadlock by any definition I'm familiar with.
  
  Yes ! and if I can add more information and I hope I do not make
  mistake I think it's a know issue which comes from MySQL, that is why
  we have a decorator to do a retry and so handle this case here:
  

  http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py#n177
 
 Right, and that remains a significant source of confusion and
 obfuscation in the db api. Our db code is littered with races and
 potential actual deadlocks, but only some functions are decorated. Are
 they decorated because of real deadlocks, or because of Galera lock
 contention? The solutions to those 2 problems are very different! Also,
 hunting deadlocks is hard enough work. Adding the possibility that they
 might not even be there is just evil.
 
 Incidentally, we're currently looking to replace this stuff with some
 new code in oslo.db, which is why I'm looking at it.
 
 Matt
 --
 Matthew Booth
 Red Hat Engineering, Virtualisation Team
 
 Phone: +442070094448 (UK)
 GPG ID:  D33C3490
 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [fedora] Re: upstream f21 devstack test

2015-01-25 Thread Attila Fazekas

I have tried the old 'vmlinuz-3.17.4-301.fc21.x86_64' kernel in my env,
 with this version volume attachment related tests are failing, but in the test 
case
so I do not see the secondary network failures.

In my env with '3.17.8-300.fc21.x86_64' everything passes with nnet,
I would say 3.17.4-301.fc21.x86_64 version of kernel is buggy.

On the gate vm the new kernel(3.17.8-300.fc21.x86_64) was installed before the 
boot,
but the boot manager config still picks the old kernel. I tried to switch to 
the new
kernel in `your vm`, but the machine failed to reboot, maybe I miss configured 
the extlinux.conf or we have
some environment specific issue.
I lost your onhold vm. :(

Looks like https://bugs.launchpad.net/nova/+bug/1353939 was always triggered,  
ie.
the vm failed to delete which caused wrong iptables rules left behind, which 
caused
several subsequent ssh test failure if the test used the same fixedip as the 
test_rescued_vm_detach_volume.

Tempest could be stricter and fail the test suite at tearDownClass when the vm 
moves to ERROR state at delete.


- Eredeti üzenet -
 Feladó: Attila Fazekas afaze...@redhat.com
 Címzett: Ian Wienand iwien...@redhat.com
 Másolatot kap: Alvaro Lopez Ortega\, \Jeremy Stanley\, \Sean Dague\ 
 s...@dague.net, \dean Troyer\
 dtro...@gmail.com, \OpenStack Development Mailing List fu...@yuggoth.org
 Elküldött üzenetek: Csütörtök, 2015. Január 22. 18:16:01
 Tárgy: [fedora] Re: upstream f21 devstack test
 
 
 
 - Mail original -
  De: Attila Fazekas afaze...@redhat.com
  À: Ian Wienand iwien...@redhat.com
  Cc: Alvaro Lopez Ortega aort...@redhat.com, Jeremy Stanley
  fu...@yuggoth.org, Sean Dague s...@dague.net,
  dean Troyer dtro...@gmail.com, OpenStack Development Mailing List (not
  for usage questions)
  openstack-dev@lists.openstack.org
  Envoyé: Lundi 19 Janvier 2015 18:02:17
  Objet: Re: upstream f21 devstack test
  
  Per request moving this thread to the openstack-dev list.
  
  I was not able to reproduce the issue so far either on the
  vm you pointed me or in any of my VMs.
  
  Several things I observed on `your` machine:
  1. The installed kernel is newer then the actually used (No known related
  issue)
 
 strace on libvirt does not wants to terminate properly on Ctrl+C,
 probably this not the only miss behavior related to processes.
 
 The kernel version and hyper-visor type might be relevant to the
 'Exception during message handling: Failed to terminate process 32495 with
 SIGKILL: Device or resource busy'
 
 According to the strace the signal was sent, and the process was killed,
 but it is zombie until the strace not killed.
 
 
  2. On the First tempest (run logs are collected [0]) lp#1353939 was
  triggered, but not related
 I was wrong.
 This was related. An exception during instance delete can live behind
 iptables rules, so not the correct security group rules will be applied.
 
 In the other jenkins jobs this situation is rare.
 
 On `your` vm 'tox -eall test_rescued_vm_detach_volume' triggers the issue
 almost always, in other env I was not able tor reproduce it so far.
 
  3. After tried to reproduce the use many-many times I hit lp#1411525, the
  patch
 which introduced is already reverted.
  4. Once I saw 'Returning 400 to user: No nw_info cache associated with
  instance' what I haven't
 seen with nova network for a long time.  (once in 100 run)
  5. I see many annoying iscsi related logging, It also does not related to
  the
  connection issue,
 IMHO the tgtadm can be considered as DEPRECATED thing, and we should
 switch to lioadm.
  
  So far, No Log entry found in connection to connection issue
   which would worth to search on logstash.
  
  The nova network log is not sufficient to figure out the actual netfilter
  state at any moment.
  According the log it should have update the chains with something, but who
  knows..
  
  With the ssh connection issues you can do very few things as post-mortem
  analyses.
  Tempest normally deletes the related resources, so less evidences
  remaining.
  If the issue is reproducible some cases enough to alter the test to do not
  destroy evidences,
  but very frequently some kind of real debugger required.
  
  Several suspected thing:
  * The vm was able to acquire address via dhcp - successful boot, has L2
  connectivity.
  * No evidence found for a dead qemu, no special libvirt operation requested
  before failure.
  * nnet claims it added the floating ip to the br100
  * L3 issue / security group rules ?..
  
  The basic network debug was removed form tempest[1]. I would like to
  recommend to revert that change
  in order to have an idea at least the interfaces and netfilter was or
  wasn't
  in a good shape [1].
  
 Full tempest runs was required to reproduce the issue and reverting [1] to
 see
 what is really happened.
 
 test_rescued_vm_detach_volume + any ssh test can be sufficient to reproduce
 the issue.
 
  I also created a vm with enabled firewalld (normally

Re: [openstack-dev] upstream f21 devstack test

2015-01-19 Thread Attila Fazekas

Per request moving this thread to the openstack-dev list.

I was not able to reproduce the issue so far either on the
vm you pointed me or in any of my VMs.

Several things I observed on `your` machine:
1. The installed kernel is newer then the actually used (No known related issue)
2. On the First tempest (run logs are collected [0]) lp#1353939 was triggered,
but not related
3. After tried to reproduce the use many-many times I hit lp#1411525, the patch
which introduced is already reverted.
4. Once I saw 'Returning 400 to user: No nw_info cache associated with
instance' what I haven't
seen with nova network for a long time. (once in 100 run)
5. I see many annoying iscsi related logging, It also does not related to the
connection issue,
IMHO the tgtadm can be considered as DEPRECATED thing, and we should switch
to lioadm.

So far, No Log entry found in connection to connection issue
which would worth to search on logstash.

The nova network log is not sufficient to figure out the actual netfilter state
at any moment.
According the log it should have update the chains with something, but who
knows..

With the ssh connection issues you can do very few things as post-mortem
analyses.
Tempest normally deletes the related resources, so less evidences remaining.
If the issue is reproducible some cases enough to alter the test to do not
destroy evidences,
but very frequently some kind of real debugger required.

Several suspected thing:
* The vm was able to acquire address via dhcp - successful boot, has L2
connectivity.
* No evidence found for a dead qemu, no special libvirt operation requested
before failure.
* nnet claims it added the floating ip to the br100
* L3 issue / security group rules ?..

The basic network debug was removed form tempest[1]. I would like to recommend
to revert that change
in order to have an idea at least the interfaces and netfilter was or wasn't in
a good shape [1].

I also created a vm with enabled firewalld (normally it is not in my devstack
setups), the 3
mentioned test case working fine even after running these tests for hours.
However the '/var/log/firewalld' contains COMMAD_FAILURES as in `your` vm.

I will try run more full tempest+nnet@F21 in my env to have more sample for
success rate.

So far I reproduced 0 ssh failure,
so I will scan the logs[0] again more carefully on `your` machine,
maybe I missed something, maybe those tests interfered with something less
obvious.

I'll check the other gate f21 logs (~100 job/week),
does anything happened when the issue started and/or is the issue still
exists.

So, I have nothing useful at the moment, but I did not given up.

[0]
http://logs.openstack.org/87/139287/14/check/check-tempest-dsvm-f21/5f3d210/console.html.gz
[1] https://review.openstack.org/#/c/140531/

PS.:
F21's HaProxy is more sensitive to services which stops listening,
and it will not be evenly balanced.
For a working F21 neutron job better listener is required:
https://review.openstack.org/#/c/146039/ .

- Original Message -
From: Ian Wienand iwien...@redhat.com
To: Attila Fazekas afaze...@redhat.com
Cc: Alvaro Lopez Ortega aort...@redhat.com, Jeremy Stanley
fu...@yuggoth.org, Sean Dague s...@dague.net,
dean Troyer dtro...@gmail.com
Sent: Friday, January 16, 2015 5:24:38 AM
Subject: upstream f21 devstack test

Hi Attila,

I don't know if you've seen, but upstream f21 testing is happening for
devstack jobs. As an experimental job I was getting good runs, but in
the last day and a bit, all runs have started failing.

The failing tests are varied; a small sample I pulled:

[1]
tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_compute_with_volumes
[2]
tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern[compute,image,network]
[3]
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance[compute,image,network]

The common thread is that they can't ssh to the cirros instance
started up.

So far I can not replicate this locally. I know there were some
firewalld/neutron issues, but this is not a neutron job.

Unfortunately, I'm about to head out the door on PTO until 2015-01-27.
I don't like the idea of this being broken while I don't have time to
look at it, so I'm hoping you can help out.

There is a failing f21 machine on hold at

jenk...@xx.yy.zz.qq
Sanitized.

I've attached a private key that should let you log in. This
particular run failed in [4]:

tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_compute_with_volumes

tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario[compute,image,network,volume]

Sorry I haven't got very far in debugging this. Nothing obviously
jumped out at me in the logs, but I only had a brief look. I'm hoping
as the best tempest guy I know you can find some time to take a look
at this in my absence :)

Thanks,

-i

Re: [openstack-dev] [QA][Tempest] Proposing Ghanshyam Mann for Tempest Core

2014-11-26 Thread Attila Fazekas

+1

- Original Message -
From: Marc Koderer m...@koderer.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Sent: Wednesday, November 26, 2014 7:58:06 AM
Subject: Re: [openstack-dev] [QA][Tempest] Proposing Ghanshyam Mann for Tempest 
Core

+1 

Am 22.11.2014 um 15:51 schrieb Andrea Frittoli  andrea.fritt...@gmail.com : 





+1 
On 21 Nov 2014 18:25, Ken1 Ohmichi  ken1ohmi...@gmail.com  wrote: 


+1 :-) 

Sent from my iPod 

On 2014/11/22, at 7:56, Christopher Yeoh  cbky...@gmail.com  wrote: 

 +1 
 
 Sent from my iPad 
 
 On 22 Nov 2014, at 4:56 am, Matthew Treinish  mtrein...@kortar.org  wrote: 
 
 
 Hi Everyone, 
 
 I'd like to propose we add Ghanshyam Mann (gmann) to the tempest core team. 
 Over 
 the past couple of cycles Ghanshyam has been actively engaged in the Tempest 
 community. Ghanshyam has had one of the highest review counts on Tempest for 
 the past cycle, and he has consistently been providing reviews that have 
 been 
 of consistently high quality that show insight into both the project 
 internals 
 and it's future direction. I feel that Ghanshyam will make an excellent 
 addition 
 to the core team. 
 
 As per the usual, if the current Tempest core team members would please vote 
 +1 
 or -1(veto) to the nomination when you get a chance. We'll keep the polls 
 open 
 for 5 days or until everyone has voted. 
 
 Thanks, 
 
 Matt Treinish 
 
 References: 
 
 https://review.openstack.org/#/q/reviewer:%22Ghanshyam+Mann+%253Cghanshyam.mann%2540nectechnologies.in%253E%22,n,z
  
 
 http://stackalytics.com/?user_id=ghanshyammannmetric=marks 
 
 ___ 
 OpenStack-dev mailing list 
 OpenStack-dev@lists.openstack.org 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
 
 ___ 
 OpenStack-dev mailing list 
 OpenStack-dev@lists.openstack.org 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 

___ 
OpenStack-dev mailing list 
OpenStack-dev@lists.openstack.org 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
___ 
OpenStack-dev mailing list 
OpenStack-dev@lists.openstack.org 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova][cinder][qa] Volume attachment not visible on the guest

2014-08-18 Thread Attila Fazekas

Hi All,

I have a `little` trouble with the volume attachment stability.

The test_stamp_pattern test is skipped since long, you
can see what would happen if it would be enabled [1] now.

There is a workaround kind way for enabling that test [2].

I suspected the acpi hot plug event is not detected by the kernel 
at some phases of the boot, for example after the first pci scan,
 but before the pci hot plug initialized.

Is the above blind spot really exists ?

If yes, is something what needs to be handled by init system or
 kernel needs to ensure all device is discovered before calling init ?

Long time ago I had trouble with reproducing the above issue,
but now I was able to see a PCI rescan can solve the issue.
'echo 1  /sys/bus/pci/rescan' (ssh to guest)

Recently we found `another type` of volume attachment issue,
when booting from volume. [3]

Here I would expect the PCI device is ready before,
the VM actually started, but according to the 
console log, the disk device is missing.

When I am booting from an iscsi volume, is the virtual device show up
guaranteed by nova/cinder/libvirt/qemu/whatever
 to be present at the first pci scan ?

Is there anything what can delay the device/disk appearance ?

Best Regards,
Attila

[1] https://review.openstack.org/#/c/52740/
[2] https://review.openstack.org/#/c/62886/
[3] https://bugs.launchpad.net/nova/+bug/1357677

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [QA] Proposed Changes to Tempest Core

2014-07-25 Thread Attila Fazekas

+1

- Original Message -
 From: Matthew Treinish mtrein...@kortar.org
 To: openstack-dev@lists.openstack.org
 Sent: Tuesday, July 22, 2014 12:34:28 AM
 Subject: [openstack-dev] [QA] Proposed Changes to Tempest Core

 Hi Everyone,

 I would like to propose 2 changes to the Tempest core team:

 First, I'd like to nominate Andrea Fritolli to the Tempest core team. Over
 the
 past cycle Andrea has been steadily become more actively engaged in the
 Tempest
 community. Besides his code contributions around refactoring Tempest's
 authentication and credentials code, he has been providing reviews that have
 been of consistently high quality that show insight into both the project
 internals and it's future direction. In addition he has been active in the
 qa-specs repo both providing reviews and spec proposals, which has been very
 helpful as we've been adjusting to using the new process. Keeping in mind
 that
 becoming a member of the core team is about earning the trust from the
 members
 of the current core team through communication and quality reviews, not
 simply a
 matter of review numbers, I feel that Andrea will make an excellent addition
 to
 the team.

 As per the usual, if the current Tempest core team members would please vote
 +1
 or -1(veto) to the nomination when you get a chance. We'll keep the polls
 open
 for 5 days or until everyone has voted.

 References:

 https://review.openstack.org/#/q/reviewer:%22Andrea+Frittoli+%22,n,z

 http://stackalytics.com/?user_id=andrea-frittolimetric=marksmodule=qa-group

 The second change that I'm proposing today is to remove Giulio Fidente from
 the
 core team. He asked to be removed from the core team a few weeks back because
 he
 is no longer able to dedicate the required time to Tempest reviews. So if
 there
 are no objections to this I will remove him from the core team in a few days.
 Sorry to see you leave the team Giulio...

 Thanks,

 Matt Treinish

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [qa] Proposals for Tempest core

2013-11-21 Thread Attila Fazekas

+1 for both!

- Original Message -
 From: Sean Dague s...@dague.net
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Friday, November 15, 2013 2:38:27 PM
 Subject: [openstack-dev] [qa] Proposals for Tempest core

 It's post summit time, so time to evaluate our current core group for
 Tempest. There are a few community members that I'd like to nominate for
 Tempest core, as I've found their review feedback over the last few
 months to be invaluable. Tempest core folks, please +1 or -1 as you feel
 appropriate:

 Masayuki Igawa

 His review history is here -
 https://review.openstack.org/#/q/reviewer:masayuki.igawa%2540gmail.com+project:openstack/tempest,n,z

 Ken'ichi Ohmichi

 His review history is here -
 https://review.openstack.org/#/q/reviewer:ken1ohmichi%2540gmail.com+project:openstack/tempest,n,z

 They have both been actively engaged in the Tempest community, and have
 been actively contributing to both Tempest and OpenStack integrated
 projects, working hard to both enhance test coverage, and fix the issues
 found in the projects themselves. This has been hugely beneficial to
 OpenStack as a whole.

 At the same time, it's also time, I think, to remove Jay Pipes from
 tempest-core. Jay's not had much time for reviews of late, and it's
 important that the core review team is a working title about actively
 reviewing code.

 With this change Tempest core would end up no longer being majority
 north american, or even majority english as first language (that kind of
 excites me). Adjusting to both there will be another mailing list thread
 about changing our weekly meeting time to make it more friendly to our
 APAC contributors.

   -Sean

 --
 Sean Dague
 http://dague.net

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

65 matches

Mail list logo