Re: Name index

2014-06-23 Thread Davide Giannella
On 20/06/2014 18:06, Jukka Zitting wrote: Hi, Here's an idea for an index structure (for now somewhat specific to SegmentMK) for speeding up node name and property existence queries. ... INDEX UPDATES The index would be maintained by a normal index editor that for each added/removed

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, should we just return the number of estimated entries for the cost? For Lucene, the property index, the ordered index, and the node type index: yes. For Solr, the cost per index lookup (not per entry) is probably a bit higher, because there is a network round trip. Specially if Solr is

Re: Name index

2014-06-23 Thread Thomas Mueller
Hi, What if a node contains millions of (direct) child nodes, how would one do an efficient lookup? We have quite many (property) indexes, what would be the storage overhead? (I think it would be quite significant with about 100 property indexes.) for speeding up node name and property

Re: Adding a timer in commons

2014-06-23 Thread Michael Dürig
+1 in general. However, - although it results in nice code on the client side, I'm a bit reluctant about putting all the code into the instance initialiser. - how about reusing org.apache.jackrabbit.oak.stats.Clock instead of using Guava's Stopwatch? If necessary we could still implement

Re: Name index

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 3:07 AM, Davide Giannella giannella.dav...@gmail.com wrote: What concern me most is the update part. AFAIU doing a node count it's not that cheap so I guess you were thinking something around getCount(MAX) and if the count == max do some estimation around what

Re: Name index

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 3:46 AM, Thomas Mueller muel...@adobe.com wrote: What if a node contains millions of (direct) child nodes, how would one do an efficient lookup? The index structure contains the names of the matching subtrees, which allows us to avoid iterating over all the child

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 3:30 AM, Thomas Mueller muel...@adobe.com wrote: Right. I don't believe the cost of the index lookup is significant (at least in the asymptotic sense) compared to the overall cost of executing a query. Sorry, I don't understand. The cost of the index lookup *is*

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, The problem with that assumption is that typically a single disk read to the index would return n paths, whereas loading those n nodes might well take n more disk reads. Ideally, the cost returned of the index would reflect that. For single-property indexes (all property indexes are single

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 11:18 AM, Thomas Mueller muel...@adobe.com wrote: Sure, but we don't use a covered index. Yes, we are not there yet. The node is currently loaded to check access rights, but that's an implementation detail of access control part. And it's not needed for the admin.

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, It's more than access control. The query engine needs to double-check the constraints of the query for each matching path before passing that node to the client (see the constraint.evaluate() call in [1]). I don't see any easy way to avoid that step without major refactoring. If there is no

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 1:58 PM, Thomas Mueller muel...@adobe.com wrote: It's more than access control. The query engine needs to double-check the constraints of the query for each matching path before passing that node to the client (see the constraint.evaluate() call in [1]). I don't see

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, It's more than access control. The query engine needs to double-check the constraints of the query for each matching path before passing that node to the client (see the constraint.evaluate() call in [1]). I don't see any easy way to avoid that step without major refactoring. If there is no