On 20/06/2014 18:06, Jukka Zitting wrote:
Hi,
Here's an idea for an index structure (for now somewhat specific to
SegmentMK) for speeding up node name and property existence queries.
...
INDEX UPDATES
The index would be maintained by a normal index editor that for each
added/removed
Hi,
should we just return the number of estimated entries for the cost?
For Lucene, the property index, the ordered index, and the node type
index: yes.
For Solr, the cost per index lookup (not per entry) is probably a bit
higher, because there is a network round trip. Specially if Solr is
Hi,
What if a node contains millions of (direct) child nodes, how would one do
an efficient lookup?
We have quite many (property) indexes, what would be the storage overhead?
(I think it would be quite significant with about 100 property indexes.)
for speeding up node name and property
+1 in general. However,
- although it results in nice code on the client side, I'm a bit
reluctant about putting all the code into the instance initialiser.
- how about reusing org.apache.jackrabbit.oak.stats.Clock instead of
using Guava's Stopwatch? If necessary we could still implement
Hi,
On Mon, Jun 23, 2014 at 3:07 AM, Davide Giannella
giannella.dav...@gmail.com wrote:
What concern me most is the update part. AFAIU doing a node count it's
not that cheap so I guess you were thinking something around
getCount(MAX) and if the count == max do some estimation around what
Hi,
On Mon, Jun 23, 2014 at 3:46 AM, Thomas Mueller muel...@adobe.com wrote:
What if a node contains millions of (direct) child nodes, how would one do
an efficient lookup?
The index structure contains the names of the matching subtrees, which
allows us to avoid iterating over all the child
Hi,
On Mon, Jun 23, 2014 at 3:30 AM, Thomas Mueller muel...@adobe.com wrote:
Right. I don't believe the cost of the index lookup is significant (at
least in the asymptotic sense) compared to the overall cost of
executing a query.
Sorry, I don't understand. The cost of the index lookup *is*
Hi,
The problem with that assumption is that typically a single disk read
to the index would return n paths, whereas loading those n nodes might
well take n more disk reads.
Ideally, the cost returned of the index would reflect that. For
single-property indexes (all property indexes are single
Hi,
On Mon, Jun 23, 2014 at 11:18 AM, Thomas Mueller muel...@adobe.com wrote:
Sure, but we don't use a covered index.
Yes, we are not there yet. The node is currently loaded to check access
rights, but that's an implementation detail of access control part. And
it's not needed for the admin.
Hi,
It's more than access control. The query engine needs to double-check
the constraints of the query for each matching path before passing
that node to the client (see the constraint.evaluate() call in [1]). I
don't see any easy way to avoid that step without major refactoring.
If there is no
Hi,
On Mon, Jun 23, 2014 at 1:58 PM, Thomas Mueller muel...@adobe.com wrote:
It's more than access control. The query engine needs to double-check
the constraints of the query for each matching path before passing
that node to the client (see the constraint.evaluate() call in [1]). I
don't see
Hi,
It's more than access control. The query engine needs to double-check
the constraints of the query for each matching path before passing
that node to the client (see the constraint.evaluate() call in [1]). I
don't see any easy way to avoid that step without major refactoring.
If there is no
12 matches
Mail list logo