[ 
https://issues.apache.org/jira/browse/OAK-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13579087#comment-13579087
 ] 

Tommaso Teofili commented on OAK-622:
-------------------------------------

something like this:
{code}
public interface QueryIndex {

    /**
     * Start a query by applying a given execution plan.
     *
     * @param plan the {@link ExecutionPlan} chosen
     * @param rootState root state of the current repository snapshot
     * @return a cursor to iterate over the result
     */
    Cursor query(ExecutionPlan plan, NodeState rootState);

    /**
     * Get the query available plans for the given filter.
     *
     * @param filter the filter
     * @param rootState root state of the current repository snapshot
     * @return the query plan
     */
    List<ExecutionPlan> getPlans(Filter filter, NodeState rootState);

    /**
     * Get the unique index name.
     *
     * @return the index name
     */
    String getIndexName();
{code}

where the _ExecutionPlan_ may contain all the information needed (cost, result 
sorting, advanced full text coordinates) to evaluate which query should be 
executed.
Such information may need then dedicated interfaces, such as Cost:
{code}
public interface Cost {

  public int getEstimatedNodesCount();

  public double getQueryCost();

  public double getNodeRetrievalCost();

}
{code}
However I'm not sure if it's actually a good idea to put the estimated nodes 
count as a cost measure as calculating that may have an heavier impact than 
running the actual evaluated query (and in most common implementations 
estimating the no. of nodes is only possible by running the same query but 
asking for no results).

Side note: one thing we could do is also introducing the concept of cost model 
(e.g. _costmodel = cost.getQueryCost()*0.5+cost.getNodeRetrievalCost()*0.1)_ , 
together with such cost coordinates, so that all the indexes can be evaluated 
at query time with the same (pluggable) cost model as a numeric value extracted 
consistently on the given coordinates.

                
> Improve QueryIndex interface
> ----------------------------
>
>                 Key: OAK-622
>                 URL: https://issues.apache.org/jira/browse/OAK-622
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>
> The current QueryIndex interface is quite simple, but doesn't address some of 
> the required features and more advanced optimizations that are possible:
> - For fulltext queries, it doesn't address the case where the index 
> implementation has a different understanding of the fulltext condition than 
> what is described in the JCR spec (the basic features).
> - For queries with "order by" it would be good to know if the index supports 
> returning the data in sorted order, and if yes, how much slower that would be 
> (if it is slower). So a index might have multiple strategies with different 
> costs.
> - It's quite easy to misunderstand what getCost is supposed to do exactly. 
> The new API should have a clearer solution here.
> - Even if the query doesn't have "order by", the index might return the data 
> in a sorted way, which might help improving query performance (using a merge 
> join)
> - The cost is currently a single value, it might be better to estimate the 
> number of nodes, the cost to run a query, and the cost per node. That way we 
> could optimize to quickly return the first few nodes (versus optimize for 
> thoughput).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to