Re: [DISCUSS] - QueryIndex selection

Tommaso Teofili Tue, 27 May 2014 06:52:17 -0700

2014-05-27 11:21 GMT+02:00 Davide Giannella <[email protected]>:


> On 26/05/2014 09:25, Tommaso Teofili wrote:
> > ...
> > Also the efficiency is not evaluated on a "cost model", each QueryIndex
> > implementation can return an arbitrary different number; on one hand this
> > is ok as it allows to take very index specific constraint into account:
> on
> > the other hand if one has to write a new QueryIndex implementation he/she
> > will have to look into each other query index implementation to
> understand
> > (and design) if / when its index is picked up; and even with already
> > existing indexes it's not easy to say upfront which one will be selected
> > (e.g. for debugging purposes).
>
> I don't know if by using the IndexPlans it will be possible to says
> beforehand which index will be pick up. It really depends by the query
> engine logics and if there are other indexes that could perform faster
> why not choose them?
>

for full text queries for example, one may be interested in having a higher
recall (more documents matching the query) which may eventually lead to a
slightly slower query execution / higher cost evaluation, then if we only
select the fastest that use case cannot be addressed.


>
> > With the AdvancedQueryIndex, if I understood it correctly (I just had a
> > look at it on Friday), a QueryIndex is selected upon its IndexPlan, which
> > is supposed to address better both the cost (as it explicitly exposes the
> > cost per execution, cost per entry and estimated entry count metrics) and
> > the query index capability to handle a certain query (e.g. this is used
> for
> > ordered property index).
> > However, at the moment, only the OrderedPropertyIndex is using it so I
> > think it'd be good to decide if we want to go further with the
> > AdvancedQueryIndex also for the other QueryIndex implementations (and get
> > rid for example of the FullTextQueryIndex interface as it seems useless
> to
> > me) or not.
> >
> > One final question on query index selection, should we always select the
> > fastest index ?
> > Especially for full text ones this should be in some way configurable.
>
> Yes we discussed off-line last time. It seems that it would be good for
> the query engine to expose an API in which the client application could
> state: run with this index. Something like
> queryengine.forceIndex("path/to/oak:QueryIndexDefinition").  If
> specified the query engine will know about it and skip the index
> selection by forcing it.
>
> Useful definitely for debugging as well as in edge cases where the
> client application will know that the query has to be run with a
> specific index for a reason or another.
> > ...
> > As discussed also offline last week with some other folks maybe one
> further
> > metric to be taken into consideration for the index selection is if the
> > index is synchronous or not
> >
> The current index plan of the AdvancedQueryIndex expose the sync/async
> aspect by IndexPlan.isDelayed(). If that is taken into account yet I
> don't know :)
>

ok


>
> Another aspect of improvement we were discussing is on the query engine
> side. As the index selections and plan are expensive if the query engine
> is asked to execute a query with bindings[0] it could cache the selected
> index for either a fixed amount of time or other logics.
>
> (0) http://goo.gl/LDQw1Y
>
> Last one as we where discussing about the possibility of serving queries
> for "all" the properties from Lucene/Solr a metrics of evaluation could
> be that in case of the same property served by a sync property index and
> a Lucene, the first one will have should be chosen as it would be local.
>

right.

Thanks,
Tommaso


>
> D.
>
>
>

Re: [DISCUSS] - QueryIndex selection

Reply via email to