Re: [DISCUSS] - QueryIndex selection

2014-06-28 Thread Michael Marth
Hi, I looked a bit into how MongoDB selects indexes (query plans) and think we could take some inspiration. So, the way MongoDB does it afaiu: * query gets parsed into Abstract Syntax Tree (so that parameters can get stripped out) * the first time this query is performed then the query is execu

Re: [DISCUSS] - QueryIndex selection

2014-06-26 Thread Jukka Zitting
Hi, On Thu, Jun 26, 2014 at 2:55 AM, Davide Giannella wrote: > Can't we do the ACL check lazily? Instead of the query engine looping > through the nodes and check, if there's no need of doing so already (IE > sorting), why not returning the set and then filter out the ACLs while > the user load t

Re: [DISCUSS] - QueryIndex selection

2014-06-26 Thread Jukka Zitting
Hi, On Thu, Jun 26, 2014 at 4:10 AM, Angela Schreiber wrote: > however, please be aware that one key feature of oak (compared to > jackrabbit which only allowed permission evaluation by path) is that > it always needs to be clear if the target for the permission evaluation > is a node or a proper

Re: [DISCUSS] - QueryIndex selection

2014-06-26 Thread Thomas Mueller
Hi, >Can't we do the ACL check lazily? That's what we do right now. Regards, Thomas

Re: [DISCUSS] - QueryIndex selection

2014-06-26 Thread Angela Schreiber
hi jukka this is not quite true. as i will explain below. first i would strongly recommend not to rely on the current implementation. if we have the requirement to evaluated permissions based on the path we may extend the permissionprovider which IMO is the key API for these cases; not the treepe

Re: [DISCUSS] - QueryIndex selection

2014-06-25 Thread Davide Giannella
On 25/06/2014 16:48, Jukka Zitting wrote: > The TreePermission interface is the key API here, and the way we've > designed it requires loading the nodes being accessed (see the > getChildPermission method). The current implementation of the API > actually *doesn't* strictly require the loading of t

Re: [DISCUSS] - QueryIndex selection

2014-06-25 Thread Jukka Zitting
Hi, On Wed, Jun 25, 2014 at 10:16 AM, Thomas Mueller wrote: > Yes, we would need to use a different access control API. The ability to > check whether a session has access to a path/node/property, without > actually loading the node from the storage backend. Maybe that API is > already there? Th

Re: [DISCUSS] - QueryIndex selection

2014-06-25 Thread Thomas Mueller
Hi, >But getting >to that point may be a bit tricky, especially because of access >control. Yes, we would need to use a different access control API. The ability to check whether a session has access to a path/node/property, without actually loading the node from the storage backend. Maybe that A

Re: [DISCUSS] - QueryIndex selection

2014-06-25 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 4:23 PM, Thomas Mueller wrote: > Sorry, sure, the condition is verified again. But this might be an > in-memory operation. The index may return the property value for each > entry as part of running the query (QueryIndex - Cursor - IndexRow). I > think the index implem

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, >>>It's more than access control. The query engine needs to double-check >>>the constraints of the query for each matching path before passing >>>that node to the client (see the constraint.evaluate() call in [1]). I >>>don't see any easy way to avoid that step without major refactoring. >> >>

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 1:58 PM, Thomas Mueller wrote: >>It's more than access control. The query engine needs to double-check >>the constraints of the query for each matching path before passing >>that node to the client (see the constraint.evaluate() call in [1]). I >>don't see any easy way

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, >It's more than access control. The query engine needs to double-check >the constraints of the query for each matching path before passing >that node to the client (see the constraint.evaluate() call in [1]). I >don't see any easy way to avoid that step without major refactoring. If there is

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 11:18 AM, Thomas Mueller wrote: >>Sure, but we don't use a covered index. > > Yes, we are not there yet. The node is currently loaded to check access > rights, but that's an implementation detail of access control part. And > it's not needed for the admin. If (when) th

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, >The problem with that assumption is that typically a single disk read >to the index would return n paths, whereas loading those n nodes might >well take n more disk reads. Ideally, the cost returned of the index would reflect that. For single-property indexes (all property indexes are single

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 3:30 AM, Thomas Mueller wrote: >>Right. I don't believe the cost of the index lookup is significant (at >>least in the asymptotic sense) compared to the overall cost of >>executing a query. > > Sorry, I don't understand. The cost of the index lookup *is* significant >

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, >should we just return the number of estimated entries for the cost? For Lucene, the property index, the ordered index, and the node type index: yes. For Solr, the cost per index lookup (not per entry) is probably a bit higher, because there is a network round trip. Specially if Solr is rem

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Jukka Zitting
Hi, On Wed, Jun 18, 2014 at 11:31 AM, Tommaso Teofili wrote: > 2014-06-18 16:02 GMT+02:00 Jukka Zitting : >> On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili >> wrote: >> > should we just return the number of estimated entries for the cost? >> >> Yes, that's what I think the contract should be.

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Tommaso Teofili
Hi, 2014-06-18 16:02 GMT+02:00 Jukka Zitting : > Hi, > > On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili > wrote: > > should we just return the number of estimated entries for the cost? > > Yes, that's what I think the contract should be. > ok, that's different from what Thomas suggests, right

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Tommaso Teofili
Hi, 2014-06-18 13:44 GMT+02:00 Thomas Mueller : > Hi, > > >>QueryIndex.getCost > > >my doubt is what > >this heuristic function to estimate the "traversed entries" should look > >like in general > > Relational databases typically know the number of entries in the index > (total indexed entries),

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Tommaso Teofili
ok, thanks Davide for the pointers. Regards, Tommaso 2014-06-18 13:36 GMT+02:00 Davide Giannella : > On 18/06/2014 10:26, Tommaso Teofili wrote: > > it would be ok for me to either deprecate it or improve the semantics > > of the cost calculation (e.g. explicitly introduce other metrics to be >

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Jukka Zitting
Hi, On Wed, Jun 18, 2014 at 7:44 AM, Thomas Mueller wrote: >>My other concern on this point is that it's not granted, in my opinion, >>that the index returning less entries would be the faster. > > Yes, it's not that much about less entries or more entries, it's about > lower or higher cost. If t

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Jukka Zitting
Hi, On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili wrote: > should we just return the number of estimated entries for the cost? Yes, that's what I think the contract should be. > My other concern on this point is that it's not granted, in my opinion, > that the index returning less entries wo

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Thomas Mueller
Hi, >>QueryIndex.getCost >my doubt is what >this heuristic function to estimate the "traversed entries" should look >like in general Relational databases typically know the number of entries in the index (total indexed entries), plus the selectivity of a column. See also http://www.akadia.com/se

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Davide Giannella
On 18/06/2014 10:26, Tommaso Teofili wrote: > it would be ok for me to either deprecate it or improve the semantics > of the cost calculation (e.g. explicitly introduce other metrics to be > taken into account in the cost calculation: local / remote index, With the IndexPlan.isDelayed() we instruc

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Tommaso Teofili
2014-06-04 9:36 GMT+02:00 Thomas Mueller : > Hi, > > QueryIndex.getCost: this is actually quite well documented (see the > Javadocs). But the implementations might not fully follow the contract :-) > this is probably just my opinion but the contract is not much clear; to me finding "the worst-cas

Re: [DISCUSS] - QueryIndex selection

2014-06-04 Thread Thomas Mueller
>> >>We could let the >> user decide if using an asynchronous index is OK or not. > >Another option is if there is no synch index available but an asynch >index is available then QueryEngine should use that instead of >resorting to traversal. Well, this is the current behavior. The query engine do

Re: [DISCUSS] - QueryIndex selection

2014-06-04 Thread Chetan Mehrotra
On Wed, Jun 4, 2014 at 1:06 PM, Thomas Mueller wrote: > We could let the > user decide if using an asynchronous index is OK or not. Another option is if there is no synch index available but an asynch index is available then QueryEngine should use that instead of resorting to traversal. This shou

Re: [DISCUSS] - QueryIndex selection

2014-06-04 Thread Thomas Mueller
Hi, QueryIndex.getCost: this is actually quite well documented (see the Javadocs). But the implementations might not fully follow the contract :-) But anyway, I think it's anyway the better to deprecate it and use AdvancedQueryIndex, as it has more features (specially important for ordered indexes

Re: [DISCUSS] - QueryIndex selection

2014-05-27 Thread Tommaso Teofili
2014-05-27 11:21 GMT+02:00 Davide Giannella : > On 26/05/2014 09:25, Tommaso Teofili wrote: > > ... > > Also the efficiency is not evaluated on a "cost model", each QueryIndex > > implementation can return an arbitrary different number; on one hand this > > is ok as it allows to take very index sp

Re: [DISCUSS] - QueryIndex selection

2014-05-27 Thread Davide Giannella
On 26/05/2014 09:25, Tommaso Teofili wrote: > ... > Also the efficiency is not evaluated on a "cost model", each QueryIndex > implementation can return an arbitrary different number; on one hand this > is ok as it allows to take very index specific constraint into account: on > the other hand if on

[DISCUSS] - QueryIndex selection

2014-05-26 Thread Tommaso Teofili
Hi all, I'd like to start discussing how we may improve / simplify current way of selecting a query engine to use for a certain query. In the QueryIndex interface we have the plain old getCost method which selects the index returning the lower cost for the given query but, recently, also an Advan