Hi Matt, if by "incomplete results" you mean retrieve the instances UUIDs (in the cell_api) for the cells that failed to answer, I would prefer to have incomplete results than a failed operation.
Belmiro On Mon, May 22, 2017 at 11:39 AM, Matthew Booth <mbo...@redhat.com> wrote: > On 19 May 2017 at 20:07, Mike Bayer <mba...@redhat.com> wrote: > >> >> >> On 05/19/2017 02:46 AM, joehuang wrote: >> >>> Support sort and pagination together will be the biggest challenge: it's >>> up to how many cells will be involved in the query, 3,5 may be OK, you can >>> search each cells, and cached data. But how about 20, 50 or more, and how >>> many data will be cached? >>> >> >> >> I've talked to Matthew in Boston and I am also a little concerned about >> this. The approach involves trying to fetch just the smallest number of >> records possible from each backend, merging them as they come in, and then >> discarding the rest (unfetched) once there's enough for a page. But there >> is latency around invoking query before any results are received, and the >> database driver really wants to send out all the rows as well, not to >> mention the ORM (with configurability) wants to convert the whole set of >> rows received to objects, all has overhead. >> > > There was always going to come a point where there are too many cells for > this approach to be viable. After our chat, I now think that point is > considerably lower than I thought before, as I didn't appreciate that the > ORM is also doing its own batching. > > >> To at least handle the problem of 50 connections that have all executed a >> statement and waiting on results, to parallelize that means there needs to >> be a threadpool , greenlet pool, or explicit non-blocking approach put in >> place. The "thread pool" would be the approach that's possible, which with >> eventlet monkeypatching transparently becomes a greenlet pool. But that's >> where this starts getting a little intense for something you want to do in >> the context of "a web request". So I think the DB-based solution here is >> feasible but I'm a little skeptical of it at higher scale. Usually, the >> search engine would be something pluggable, like, "SQL" or "searchlight". >> > > I'm not overly concerned about the threading aspect. I understood from our > chat that the remote query overhead (being the only part we can actually > parallelise anyway) is incurred entirely before returning the first row > from SQLA. My plan is simply to fetch the first row of each query using > concurrent.futures to allow all the remote queries to run in parallel, and > all subsequent rows with blocking IO in the main thread. This will be > relatively uncomplicated, and after the initial queries have run won't > involve a whole lot of thread switching. > > There are also a couple of optimisations to make which I won't bother with > up front. Dan suggested in his CellsV2 talk that we would only query cells > where the user actually has instances. If we find users tend to clump in a > small number of cells this would be a significant optimisation, although > the overhead on the api node for a query returning no rows is probably very > little. Also, I think you mentioned that there's an option to tell SQLA not > to batch-process rows, but that it is less efficient for total throughput? > I suspect there would be a point at which we'd want that. If there's a > reasonable way to calculate a tipping point, that might give us some > additional life. > > Bear in mind that the principal advantages to not using Searchlight are: > > * It is simpler to implement > * It is simpler to manage > * It will return accurate results > > Following the principal of 'as simple as possible, but no simpler', I > think there's enormous benefit to this much simpler approach for anybody > who doesn't need a more complex approach. However, while it reduces the > urgency of something like the Searchlight solution, I expect there are > going to be deployments which need that. > > >>> More over, during the query there are instances operation( create, >>> delete) in parallel during the pagination/sort query, there is situation >>> some cells may not provide response in time, or network connection broken, >>> etc, many abnormal cases may happen. How to deal with some of cells >>> abnormal query response is also one great factor to be considered. >>> >> > Aside: For a query operation, what's the better user experience when a > single cell is failing: > > 1. The whole query fails. > 2. The user gets incomplete results. > > Either of these are simple to implement. Incomplete results would also > additionally be logged as an ERROR, but I can't think of any way to also > return to the user that there's a problem with the data we returned without > throwing an error. > > Thoughts? > > Matt > > >> >>> It's not good idea to support pagination and sort at the same time (may >>> not provide exactly the result end user want) if searchlight should not be >>> integrated. >>> >>> In fact in Tricircle, when query ports from neutron where tricircle >>> central plugin is installed, the tricircle central plugin do the similar >>> cross local Neutron ports query, and not support pagination/sort together. >>> >>> Best Regards >>> Chaoyi Huang (joehuang) >>> >>> ________________________________________ >>> From: Matt Riedemann [mriede...@gmail.com] >>> Sent: 19 May 2017 5:21 >>> To: openstack-...@lists.openstack.org >>> Subject: [openstack-dev] [nova] Boston Forum session recap - >>> searchlight integration >>> >>> Hi everyone, >>> >>> After previous summits where we had vertical tracks for Nova sessions I >>> would provide a recap for each session. >>> >>> The Forum in Boston was a bit different, so here I'm only attempting to >>> recap the Forum sessions that I ran. Dan Smith led a session on Cells >>> v2, John Garbutt led several sessions on the VM and Baremetal platform >>> concept, and Sean Dague led sessions on hierarchical quotas and API >>> microversions, and I'm going to leave recaps for those sessions to them. >>> >>> I'll do these one at a time in separate emails. >>> >>> >>> Using Searchlight to list instances across cells in nova-api >>> ------------------------------------------------------------ >>> >>> The etherpad for this session is here [1]. The goal for this session was >>> to explain the problem and proposed plan from the spec [2] to the >>> operators in the room and get feedback. >>> >>> Polling the room we found that not many people are deploying Searchlight >>> but most everyone was using ElasticSearch. >>> >>> An immediate concern that came up was the complexity involved with >>> integrating Searchlight, especially around issues with latency for state >>> changes and questioning how this does not redo the top-level cells v1 >>> sync issue. It admittedly does to an extent, but we don't have all of >>> the weird side code paths with cells v1 and it should be self-healing. >>> Kris Lindgren noted that the instance.usage.exists periodic notification >>> from the computes hammers their notification bus; we suggested he report >>> a bug so we can fix that. >>> >>> It was also noted that if data is corrupted in ElasticSearch or is out >>> of sync, you could re-sync that from nova to searchlight, however, >>> searchlight syncs up with nova via the compute REST API, which if the >>> compute REST API is using searchlight in the backend, you end up getting >>> into an infinite loop of broken. This could probably be fixed with >>> bypass query options in the compute API, but it's not a fun problem. >>> >>> It was also suggested that we store a minimal set of data about >>> instances in the top-level nova API database's instance_mappings table, >>> where all we have today is the uuid. Anything that is set in the API >>> would probably be OK for this, but operators in the room noted that they >>> frequently need to filter instances by an IP, which is set in the >>> compute. So this option turns into a slippery slope, and is potentially >>> not inter-operable across clouds. >>> >>> Matt Booth is also skeptical that we can't have a multi-cell query >>> perform well, and he's proposed a POC here [3]. If that works out, then >>> it defeats the main purpose for using Searchlight for listing instances >>> in the compute API. >>> >>> Since sorting instances across cells is the main issue, it was also >>> suggested that we allow a config option to disable sorting in the API. >>> It was stated this would be without a microversion, and filtering/paging >>> would still be supported. I'm personally skeptical about how this could >>> be consider inter-operable or discoverable for API users, and would need >>> more thought and input from users like Monty Taylor and Clark Boylan. >>> >>> Next steps are going to be fleshing out Matt Booth's POC for efficiently >>> listing instances across cells. I think we can still continue working on >>> the versioned notifications changes we're making for searchlight as >>> those are useful on their own. And we should still work on enabling >>> searchlight in the nova-next CI job so we can get an idea for how the >>> versioned notifications are working by a consumer. However, any major >>> development for actually integrating searchlight into Nova is probably >>> on hold at the moment until we know how Matt's POC works. >>> >>> [1] >>> https://etherpad.openstack.org/p/BOS-forum-using-searchlight >>> -to-list-instances >>> [2] >>> https://specs.openstack.org/openstack/nova-specs/specs/pike/ >>> approved/list-instances-using-searchlight.html >>> [3] https://review.openstack.org/#/c/463618/ >>> >>> -- >>> >>> Thanks, >>> >>> Matt >>> >>> ____________________________________________________________ >>> ______________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: openstack-dev-requ...@lists.op >>> enstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> ____________________________________________________________ >>> ______________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: openstack-dev-requ...@lists.op >>> enstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >> ____________________________________________________________ >> ______________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > > -- > Matthew Booth > Red Hat Engineering, Virtualisation Team > > Phone: +442070094448 <+44%2020%207009%204448> (UK) > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators