Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-22 Thread Mike Bayer



On 05/22/2017 05:39 AM, Matthew Booth wrote:


There are also a couple of optimisations to make which I won't bother 
with up front. Dan suggested in his CellsV2 talk that we would only 
query cells where the user actually has instances. If we find users tend 
to clump in a small number of cells this would be a significant 
optimisation, although the overhead on the api node for a query 
returning no rows is probably very little. Also, I think you mentioned 
that there's an option to tell SQLA not to batch-process rows, but that 
it is less efficient for total throughput? I suspect there would be a 
point at which we'd want that. 


it's the yield_per() option and I think you should use it up front, just 
so it's there and we can hit any issues it might cause (shouldn't be any 
provided no eager loading is used).  Have it yield on about 5 rows at a 
time.  The pymysql driver these days I think does not actually buffer 
the rows but 50 is very little anyway.





If there's a reasonable way to calculate

a tipping point, that might give us some additional life.

Bear in mind that the principal advantages to not using Searchlight are:

* It is simpler to implement
* It is simpler to manage
* It will return accurate results

Following the principal of 'as simple as possible, but no simpler', I 
think there's enormous benefit to this much simpler approach for anybody 
who doesn't need a more complex approach. However, while it reduces the 
urgency of something like the Searchlight solution, I expect there are 
going to be deployments which need that.



More over, during the query there are instances operation(
create, delete)  in parallel during the pagination/sort query,
there is situation some cells may not provide response in time,
or network connection broken, etc, many abnormal cases may
happen. How to deal with some of cells abnormal query response
is also one great factor to be considered.


Aside: For a query operation, what's the better user experience when a 
single cell is failing:


1. The whole query fails.
2. The user gets incomplete results.

Either of these are simple to implement. Incomplete results would also 
additionally be logged as an ERROR, but I can't think of any way to also 
return to the user that there's a problem with the data we returned 
without throwing an error.


Thoughts?

Matt


It's not good idea to support pagination and sort at the same
time (may not provide exactly the result end user want) if
searchlight should not be integrated.

In fact in Tricircle, when query ports from neutron where
tricircle central plugin is installed, the tricircle central
plugin do the similar cross local Neutron ports query, and not
support pagination/sort together.

Best Regards
Chaoyi Huang (joehuang)


From: Matt Riedemann [mriede...@gmail.com
<mailto:mriede...@gmail.com>]
Sent: 19 May 2017 5:21
To: openstack-dev@lists.openstack.org
<mailto:openstack-dev@lists.openstack.org>
    Subject: [openstack-dev] [nova] Boston Forum session recap -
    searchlight    integration

Hi everyone,

After previous summits where we had vertical tracks for Nova
sessions I
would provide a recap for each session.

The Forum in Boston was a bit different, so here I'm only
attempting to
recap the Forum sessions that I ran. Dan Smith led a session on
Cells
v2, John Garbutt led several sessions on the VM and Baremetal
platform
concept, and Sean Dague led sessions on hierarchical quotas and API
microversions, and I'm going to leave recaps for those sessions
to them.

I'll do these one at a time in separate emails.


Using Searchlight to list instances across cells in nova-api


The etherpad for this session is here [1]. The goal for this
session was
to explain the problem and proposed plan from the spec [2] to the
operators in the room and get feedback.

Polling the room we found that not many people are deploying
Searchlight
but most everyone was using ElasticSearch.

An immediate concern that came up was the complexity involved with
integrating Searchlight, especially around issues with latency
for state
changes and questioning how this does not redo the top-level
cells v1
sync issue. It admittedly does to an extent, but we don't have
all of
the weird side code paths with cells v1 and it should be
self-healing.
Kris Lindgren noted that the instance.usage.exists periodic
notification
from the computes hammers their notification bus; we suggeste

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-22 Thread Sean Dague
On 05/22/2017 05:39 AM, Matthew Booth wrote:
> Aside: For a query operation, what's the better user experience when a
> single cell is failing:
> 
> 1. The whole query fails.
> 2. The user gets incomplete results.
> 
> Either of these are simple to implement. Incomplete results would also
> additionally be logged as an ERROR, but I can't think of any way to also
> return to the user that there's a problem with the data we returned
> without throwing an error.

The rough plan of record was to abuse HTTP 206 as an indicator that
something is missing in the result set, and return best information we
can reconstruct from the top level database.

In the filtered case, that means some stuff might silently get dropped.
In the all_instances / paginated case, you would get everything for the
project_id of your token, just some returned servers would only have
server uuid.

We could also put a microversion in place so that something more
specific about server list status (all sources reported) was there.

No one expects a 500 error on server list, so we definitely don't want
to give that to people.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-22 Thread Belmiro Moreira
 The user gets incomplete results.
>
> Either of these are simple to implement. Incomplete results would also
> additionally be logged as an ERROR, but I can't think of any way to also
> return to the user that there's a problem with the data we returned without
> throwing an error.
>
> Thoughts?
>
> Matt
>
>
>>
>>> It's not good idea to support pagination and sort at the same time (may
>>> not provide exactly the result end user want) if searchlight should not be
>>> integrated.
>>>
>>> In fact in Tricircle, when query ports from neutron where tricircle
>>> central plugin is installed, the tricircle central plugin do the similar
>>> cross local Neutron ports query, and not support pagination/sort together.
>>>
>>> Best Regards
>>> Chaoyi Huang (joehuang)
>>>
>>> 
>>> From: Matt Riedemann [mriede...@gmail.com]
>>> Sent: 19 May 2017 5:21
>>> To: openstack-dev@lists.openstack.org
>>> Subject: [openstack-dev] [nova] Boston Forum session recap -
>>> searchlightintegration
>>>
>>> Hi everyone,
>>>
>>> After previous summits where we had vertical tracks for Nova sessions I
>>> would provide a recap for each session.
>>>
>>> The Forum in Boston was a bit different, so here I'm only attempting to
>>> recap the Forum sessions that I ran. Dan Smith led a session on Cells
>>> v2, John Garbutt led several sessions on the VM and Baremetal platform
>>> concept, and Sean Dague led sessions on hierarchical quotas and API
>>> microversions, and I'm going to leave recaps for those sessions to them.
>>>
>>> I'll do these one at a time in separate emails.
>>>
>>>
>>> Using Searchlight to list instances across cells in nova-api
>>> 
>>>
>>> The etherpad for this session is here [1]. The goal for this session was
>>> to explain the problem and proposed plan from the spec [2] to the
>>> operators in the room and get feedback.
>>>
>>> Polling the room we found that not many people are deploying Searchlight
>>> but most everyone was using ElasticSearch.
>>>
>>> An immediate concern that came up was the complexity involved with
>>> integrating Searchlight, especially around issues with latency for state
>>> changes and questioning how this does not redo the top-level cells v1
>>> sync issue. It admittedly does to an extent, but we don't have all of
>>> the weird side code paths with cells v1 and it should be self-healing.
>>> Kris Lindgren noted that the instance.usage.exists periodic notification
>>> from the computes hammers their notification bus; we suggested he report
>>> a bug so we can fix that.
>>>
>>> It was also noted that if data is corrupted in ElasticSearch or is out
>>> of sync, you could re-sync that from nova to searchlight, however,
>>> searchlight syncs up with nova via the compute REST API, which if the
>>> compute REST API is using searchlight in the backend, you end up getting
>>> into an infinite loop of broken. This could probably be fixed with
>>> bypass query options in the compute API, but it's not a fun problem.
>>>
>>> It was also suggested that we store a minimal set of data about
>>> instances in the top-level nova API database's instance_mappings table,
>>> where all we have today is the uuid. Anything that is set in the API
>>> would probably be OK for this, but operators in the room noted that they
>>> frequently need to filter instances by an IP, which is set in the
>>> compute. So this option turns into a slippery slope, and is potentially
>>> not inter-operable across clouds.
>>>
>>> Matt Booth is also skeptical that we can't have a multi-cell query
>>> perform well, and he's proposed a POC here [3]. If that works out, then
>>> it defeats the main purpose for using Searchlight for listing instances
>>> in the compute API.
>>>
>>> Since sorting instances across cells is the main issue, it was also
>>> suggested that we allow a config option to disable sorting in the API.
>>> It was stated this would be without a microversion, and filtering/paging
>>> would still be supported. I'm personally skeptical about how this could
>>> be consider inter-operable or discoverable for API users, and would need
>>> more thought and input from users like Monty Taylor and Clark Boylan.
>>>

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-22 Thread Matthew Booth
orts query, and not support pagination/sort together.
>>
>> Best Regards
>> Chaoyi Huang (joehuang)
>>
>> 
>> From: Matt Riedemann [mriede...@gmail.com]
>> Sent: 19 May 2017 5:21
>> To: openstack-dev@lists.openstack.org
>> Subject: [openstack-dev] [nova] Boston Forum session recap - searchlight
>>   integration
>>
>> Hi everyone,
>>
>> After previous summits where we had vertical tracks for Nova sessions I
>> would provide a recap for each session.
>>
>> The Forum in Boston was a bit different, so here I'm only attempting to
>> recap the Forum sessions that I ran. Dan Smith led a session on Cells
>> v2, John Garbutt led several sessions on the VM and Baremetal platform
>> concept, and Sean Dague led sessions on hierarchical quotas and API
>> microversions, and I'm going to leave recaps for those sessions to them.
>>
>> I'll do these one at a time in separate emails.
>>
>>
>> Using Searchlight to list instances across cells in nova-api
>> 
>>
>> The etherpad for this session is here [1]. The goal for this session was
>> to explain the problem and proposed plan from the spec [2] to the
>> operators in the room and get feedback.
>>
>> Polling the room we found that not many people are deploying Searchlight
>> but most everyone was using ElasticSearch.
>>
>> An immediate concern that came up was the complexity involved with
>> integrating Searchlight, especially around issues with latency for state
>> changes and questioning how this does not redo the top-level cells v1
>> sync issue. It admittedly does to an extent, but we don't have all of
>> the weird side code paths with cells v1 and it should be self-healing.
>> Kris Lindgren noted that the instance.usage.exists periodic notification
>> from the computes hammers their notification bus; we suggested he report
>> a bug so we can fix that.
>>
>> It was also noted that if data is corrupted in ElasticSearch or is out
>> of sync, you could re-sync that from nova to searchlight, however,
>> searchlight syncs up with nova via the compute REST API, which if the
>> compute REST API is using searchlight in the backend, you end up getting
>> into an infinite loop of broken. This could probably be fixed with
>> bypass query options in the compute API, but it's not a fun problem.
>>
>> It was also suggested that we store a minimal set of data about
>> instances in the top-level nova API database's instance_mappings table,
>> where all we have today is the uuid. Anything that is set in the API
>> would probably be OK for this, but operators in the room noted that they
>> frequently need to filter instances by an IP, which is set in the
>> compute. So this option turns into a slippery slope, and is potentially
>> not inter-operable across clouds.
>>
>> Matt Booth is also skeptical that we can't have a multi-cell query
>> perform well, and he's proposed a POC here [3]. If that works out, then
>> it defeats the main purpose for using Searchlight for listing instances
>> in the compute API.
>>
>> Since sorting instances across cells is the main issue, it was also
>> suggested that we allow a config option to disable sorting in the API.
>> It was stated this would be without a microversion, and filtering/paging
>> would still be supported. I'm personally skeptical about how this could
>> be consider inter-operable or discoverable for API users, and would need
>> more thought and input from users like Monty Taylor and Clark Boylan.
>>
>> Next steps are going to be fleshing out Matt Booth's POC for efficiently
>> listing instances across cells. I think we can still continue working on
>> the versioned notifications changes we're making for searchlight as
>> those are useful on their own. And we should still work on enabling
>> searchlight in the nova-next CI job so we can get an idea for how the
>> versioned notifications are working by a consumer. However, any major
>> development for actually integrating searchlight into Nova is probably
>> on hold at the moment until we know how Matt's POC works.
>>
>> [1]
>> https://etherpad.openstack.org/p/BOS-forum-using-searchlight
>> -to-list-instances
>> [2]
>> https://specs.openstack.org/openstack/nova-specs/specs/pike/
>> approved/list-instances-using-searchlight.html
>> [3] https://review.openstack.org/#/c/463618/
>>
>> --
>>
>> Thanks,
>>
>> Matt
>>
>> __

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-19 Thread Mike Bayer



On 05/19/2017 02:46 AM, joehuang wrote:

Support sort and pagination together will be the biggest challenge: it's up to 
how many cells will be involved in the query, 3,5 may be OK, you can search 
each cells, and cached data. But how about 20, 50 or more, and how many data 
will be cached?



I've talked to Matthew in Boston and I am also a little concerned about 
this.The approach involves trying to fetch just the smallest number 
of records possible from each backend, merging them as they come in, and 
then discarding the rest (unfetched) once there's enough for a page. 
But there is latency around invoking query before any results are 
received, and the database driver really wants to send out all the rows 
as well, not to mention the ORM (with configurability) wants to convert 
the whole set of rows received to objects, all has overhead.


To at least handle the problem of 50 connections that have all executed 
a statement and waiting on results, to parallelize that means there 
needs to be a threadpool , greenlet pool, or explicit non-blocking 
approach put in place.  The "thread pool" would be the approach that's 
possible, which with eventlet monkeypatching transparently becomes a 
greenlet pool.  But that's where this starts getting a little intense 
for something you want to do in the context of "a web request".   So I 
think the DB-based solution here is feasible but I'm a little skeptical 
of it at higher scale.   Usually, the search engine would be something 
pluggable, like, "SQL" or "searchlight".








More over, during the query there are instances operation( create, delete)  in 
parallel during the pagination/sort query, there is situation some cells may 
not provide response in time, or network connection broken, etc, many abnormal 
cases may happen. How to deal with some of cells abnormal query response is 
also one great factor to be considered.

It's not good idea to support pagination and sort at the same time (may not 
provide exactly the result end user want) if searchlight should not be 
integrated.

In fact in Tricircle, when query ports from neutron where tricircle central 
plugin is installed, the tricircle central plugin do the similar cross local 
Neutron ports query, and not support pagination/sort together.

Best Regards
Chaoyi Huang (joehuang)


From: Matt Riedemann [mriede...@gmail.com]
Sent: 19 May 2017 5:21
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [nova] Boston Forum session recap - searchlight
integration

Hi everyone,

After previous summits where we had vertical tracks for Nova sessions I
would provide a recap for each session.

The Forum in Boston was a bit different, so here I'm only attempting to
recap the Forum sessions that I ran. Dan Smith led a session on Cells
v2, John Garbutt led several sessions on the VM and Baremetal platform
concept, and Sean Dague led sessions on hierarchical quotas and API
microversions, and I'm going to leave recaps for those sessions to them.

I'll do these one at a time in separate emails.


Using Searchlight to list instances across cells in nova-api


The etherpad for this session is here [1]. The goal for this session was
to explain the problem and proposed plan from the spec [2] to the
operators in the room and get feedback.

Polling the room we found that not many people are deploying Searchlight
but most everyone was using ElasticSearch.

An immediate concern that came up was the complexity involved with
integrating Searchlight, especially around issues with latency for state
changes and questioning how this does not redo the top-level cells v1
sync issue. It admittedly does to an extent, but we don't have all of
the weird side code paths with cells v1 and it should be self-healing.
Kris Lindgren noted that the instance.usage.exists periodic notification
from the computes hammers their notification bus; we suggested he report
a bug so we can fix that.

It was also noted that if data is corrupted in ElasticSearch or is out
of sync, you could re-sync that from nova to searchlight, however,
searchlight syncs up with nova via the compute REST API, which if the
compute REST API is using searchlight in the backend, you end up getting
into an infinite loop of broken. This could probably be fixed with
bypass query options in the compute API, but it's not a fun problem.

It was also suggested that we store a minimal set of data about
instances in the top-level nova API database's instance_mappings table,
where all we have today is the uuid. Anything that is set in the API
would probably be OK for this, but operators in the room noted that they
frequently need to filter instances by an IP, which is set in the
compute. So this option turns into a slippery slope, and is potentially
not inter-operable across clouds.

Matt Booth is also skeptical that we ca

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-19 Thread Dean Troyer
On Thu, May 18, 2017 at 4:21 PM, Matt Riedemann  wrote:
> Since sorting instances across cells is the main issue, it was also
> suggested that we allow a config option to disable sorting in the API. It
> was stated this would be without a microversion, and filtering/paging would
> still be supported. I'm personally skeptical about how this could be
> consider inter-operable or discoverable for API users, and would need more
> thought and input from users like Monty Taylor and Clark Boylan.

Please please please make that config option discoverable, do not
propagate that silent config option pattern any more.  Please.

This is totally a microversion-required situation in my view as the
API will behave differently and clients will need to do the sorting
locally if that is what they require.  Doing it locally is (usually)
fine, but we need to know.

Now the question of how to actually do this?  If we had some
side-channel to return results metadata then this config change would
be discoverable after-the-fact, which in this case would be acceptable
as the condition checking happens after (at least some of) results are
returned anyway.

dt

-- 

Dean Troyer
dtro...@gmail.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-19 Thread Matt Riedemann

On 5/19/2017 1:46 AM, joehuang wrote:

Support sort and pagination together will be the biggest challenge: it's up to 
how many cells will be involved in the query, 3,5 may be OK, you can search 
each cells, and cached data. But how about 20, 50 or more, and how many data 
will be cached?

More over, during the query there are instances operation( create, delete)  in 
parallel during the pagination/sort query, there is situation some cells may 
not provide response in time, or network connection broken, etc, many abnormal 
cases may happen. How to deal with some of cells abnormal query response is 
also one great factor to be considered.


I think we've always stated that paging and sorting is not guaranteed to 
be perfect. With paging the marker is the last instance uuid in the last 
page, and if you create a new instance before querying for the next 
page, you might not find that new instance in the results. I don't think 
integrating searchlight is going to fix that as there is still latency 
in getting the new instance.create event results to searchlight so it's 
indexed.




It's not good idea to support pagination and sort at the same time (may not 
provide exactly the result end user want) if searchlight should not be 
integrated.


As noted above, I don't see how Searchlight is going to fix the 
"instance created while in the middle of paging" issue. Searchlight may 
increase the performance of querying a large number of instances across 
dozens of cells, yes, that was the point in going down this path in the 
first place.




In fact in Tricircle, when query ports from neutron where tricircle central 
plugin is installed, the tricircle central plugin do the similar cross local 
Neutron ports query, and not support pagination/sort together.


Doesn't that break the contract on the networking API if paging/sorting 
isn't supported when using Tricircle but it is supported when using 
Neutron's networking API directly? It's my understanding that Tricircle 
(and Cascading before it) are proxies to separate OpenStack deployments, 
which can be at various versions (maybe one deployment is mitaka, others 
are newton). But I would expect that the end user facing API is 
compatible with the native APIs, or is that not the case - and users 
understand that when using Tricircle / Cascading? If so, then how do 
libraries/SDKs and CLIs like openstackclient work with Tricircle?


The point of what we're trying to do in nova is expose the same API and 
honor it regardless of whether or not you're using a single cell or 10 
cells - it should be transparent to the end user of the cloud.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-19 Thread joehuang
Support sort and pagination together will be the biggest challenge: it's up to 
how many cells will be involved in the query, 3,5 may be OK, you can search 
each cells, and cached data. But how about 20, 50 or more, and how many data 
will be cached?

More over, during the query there are instances operation( create, delete)  in 
parallel during the pagination/sort query, there is situation some cells may 
not provide response in time, or network connection broken, etc, many abnormal 
cases may happen. How to deal with some of cells abnormal query response is 
also one great factor to be considered. 

It's not good idea to support pagination and sort at the same time (may not 
provide exactly the result end user want) if searchlight should not be 
integrated.

In fact in Tricircle, when query ports from neutron where tricircle central 
plugin is installed, the tricircle central plugin do the similar cross local 
Neutron ports query, and not support pagination/sort together.

Best Regards
Chaoyi Huang (joehuang)


From: Matt Riedemann [mriede...@gmail.com]
Sent: 19 May 2017 5:21
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [nova] Boston Forum session recap - searchlight
integration

Hi everyone,

After previous summits where we had vertical tracks for Nova sessions I
would provide a recap for each session.

The Forum in Boston was a bit different, so here I'm only attempting to
recap the Forum sessions that I ran. Dan Smith led a session on Cells
v2, John Garbutt led several sessions on the VM and Baremetal platform
concept, and Sean Dague led sessions on hierarchical quotas and API
microversions, and I'm going to leave recaps for those sessions to them.

I'll do these one at a time in separate emails.


Using Searchlight to list instances across cells in nova-api


The etherpad for this session is here [1]. The goal for this session was
to explain the problem and proposed plan from the spec [2] to the
operators in the room and get feedback.

Polling the room we found that not many people are deploying Searchlight
but most everyone was using ElasticSearch.

An immediate concern that came up was the complexity involved with
integrating Searchlight, especially around issues with latency for state
changes and questioning how this does not redo the top-level cells v1
sync issue. It admittedly does to an extent, but we don't have all of
the weird side code paths with cells v1 and it should be self-healing.
Kris Lindgren noted that the instance.usage.exists periodic notification
from the computes hammers their notification bus; we suggested he report
a bug so we can fix that.

It was also noted that if data is corrupted in ElasticSearch or is out
of sync, you could re-sync that from nova to searchlight, however,
searchlight syncs up with nova via the compute REST API, which if the
compute REST API is using searchlight in the backend, you end up getting
into an infinite loop of broken. This could probably be fixed with
bypass query options in the compute API, but it's not a fun problem.

It was also suggested that we store a minimal set of data about
instances in the top-level nova API database's instance_mappings table,
where all we have today is the uuid. Anything that is set in the API
would probably be OK for this, but operators in the room noted that they
frequently need to filter instances by an IP, which is set in the
compute. So this option turns into a slippery slope, and is potentially
not inter-operable across clouds.

Matt Booth is also skeptical that we can't have a multi-cell query
perform well, and he's proposed a POC here [3]. If that works out, then
it defeats the main purpose for using Searchlight for listing instances
in the compute API.

Since sorting instances across cells is the main issue, it was also
suggested that we allow a config option to disable sorting in the API.
It was stated this would be without a microversion, and filtering/paging
would still be supported. I'm personally skeptical about how this could
be consider inter-operable or discoverable for API users, and would need
more thought and input from users like Monty Taylor and Clark Boylan.

Next steps are going to be fleshing out Matt Booth's POC for efficiently
listing instances across cells. I think we can still continue working on
the versioned notifications changes we're making for searchlight as
those are useful on their own. And we should still work on enabling
searchlight in the nova-next CI job so we can get an idea for how the
versioned notifications are working by a consumer. However, any major
development for actually integrating searchlight into Nova is probably
on hold at the moment until we know how Matt's POC works.

[1]
https://etherpad.openstack.org/p/BOS-forum-using-searchlight-to-list-instances
[2]
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/list-instances

[openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-18 Thread Matt Riedemann

Hi everyone,

After previous summits where we had vertical tracks for Nova sessions I 
would provide a recap for each session.


The Forum in Boston was a bit different, so here I'm only attempting to 
recap the Forum sessions that I ran. Dan Smith led a session on Cells 
v2, John Garbutt led several sessions on the VM and Baremetal platform 
concept, and Sean Dague led sessions on hierarchical quotas and API 
microversions, and I'm going to leave recaps for those sessions to them.


I'll do these one at a time in separate emails.


Using Searchlight to list instances across cells in nova-api


The etherpad for this session is here [1]. The goal for this session was 
to explain the problem and proposed plan from the spec [2] to the 
operators in the room and get feedback.


Polling the room we found that not many people are deploying Searchlight 
but most everyone was using ElasticSearch.


An immediate concern that came up was the complexity involved with 
integrating Searchlight, especially around issues with latency for state 
changes and questioning how this does not redo the top-level cells v1 
sync issue. It admittedly does to an extent, but we don't have all of 
the weird side code paths with cells v1 and it should be self-healing. 
Kris Lindgren noted that the instance.usage.exists periodic notification 
from the computes hammers their notification bus; we suggested he report 
a bug so we can fix that.


It was also noted that if data is corrupted in ElasticSearch or is out 
of sync, you could re-sync that from nova to searchlight, however, 
searchlight syncs up with nova via the compute REST API, which if the 
compute REST API is using searchlight in the backend, you end up getting 
into an infinite loop of broken. This could probably be fixed with 
bypass query options in the compute API, but it's not a fun problem.


It was also suggested that we store a minimal set of data about 
instances in the top-level nova API database's instance_mappings table, 
where all we have today is the uuid. Anything that is set in the API 
would probably be OK for this, but operators in the room noted that they 
frequently need to filter instances by an IP, which is set in the 
compute. So this option turns into a slippery slope, and is potentially 
not inter-operable across clouds.


Matt Booth is also skeptical that we can't have a multi-cell query 
perform well, and he's proposed a POC here [3]. If that works out, then 
it defeats the main purpose for using Searchlight for listing instances 
in the compute API.


Since sorting instances across cells is the main issue, it was also 
suggested that we allow a config option to disable sorting in the API. 
It was stated this would be without a microversion, and filtering/paging 
would still be supported. I'm personally skeptical about how this could 
be consider inter-operable or discoverable for API users, and would need 
more thought and input from users like Monty Taylor and Clark Boylan.


Next steps are going to be fleshing out Matt Booth's POC for efficiently 
listing instances across cells. I think we can still continue working on 
the versioned notifications changes we're making for searchlight as 
those are useful on their own. And we should still work on enabling 
searchlight in the nova-next CI job so we can get an idea for how the 
versioned notifications are working by a consumer. However, any major 
development for actually integrating searchlight into Nova is probably 
on hold at the moment until we know how Matt's POC works.


[1] 
https://etherpad.openstack.org/p/BOS-forum-using-searchlight-to-list-instances
[2] 
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/list-instances-using-searchlight.html

[3] https://review.openstack.org/#/c/463618/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev