Hi,
I looked a bit into how MongoDB selects indexes (query plans) and think we
could take some inspiration.
So, the way MongoDB does it afaiu:
* query gets parsed into Abstract Syntax Tree (so that parameters can get
stripped out)
* the first time this query is performed then the query is execu
Hi,
On Thu, Jun 26, 2014 at 2:55 AM, Davide Giannella wrote:
> Can't we do the ACL check lazily? Instead of the query engine looping
> through the nodes and check, if there's no need of doing so already (IE
> sorting), why not returning the set and then filter out the ACLs while
> the user load t
Hi,
On Thu, Jun 26, 2014 at 4:10 AM, Angela Schreiber wrote:
> however, please be aware that one key feature of oak (compared to
> jackrabbit which only allowed permission evaluation by path) is that
> it always needs to be clear if the target for the permission evaluation
> is a node or a proper
Hi,
>Can't we do the ACL check lazily?
That's what we do right now.
Regards,
Thomas
hi jukka
this is not quite true. as i will explain below.
first i would strongly recommend not to rely on the current implementation.
if we have the requirement to evaluated permissions based on the path
we may extend the permissionprovider which IMO is the key API for these
cases; not the treepe
On 25/06/2014 16:48, Jukka Zitting wrote:
> The TreePermission interface is the key API here, and the way we've
> designed it requires loading the nodes being accessed (see the
> getChildPermission method). The current implementation of the API
> actually *doesn't* strictly require the loading of t
Hi,
On Wed, Jun 25, 2014 at 10:16 AM, Thomas Mueller wrote:
> Yes, we would need to use a different access control API. The ability to
> check whether a session has access to a path/node/property, without
> actually loading the node from the storage backend. Maybe that API is
> already there?
Th
Hi,
>But getting
>to that point may be a bit tricky, especially because of access
>control.
Yes, we would need to use a different access control API. The ability to
check whether a session has access to a path/node/property, without
actually loading the node from the storage backend. Maybe that A
Hi,
On Mon, Jun 23, 2014 at 4:23 PM, Thomas Mueller wrote:
> Sorry, sure, the condition is verified again. But this might be an
> in-memory operation. The index may return the property value for each
> entry as part of running the query (QueryIndex - Cursor - IndexRow). I
> think the index implem
Hi,
>>>It's more than access control. The query engine needs to double-check
>>>the constraints of the query for each matching path before passing
>>>that node to the client (see the constraint.evaluate() call in [1]). I
>>>don't see any easy way to avoid that step without major refactoring.
>>
>>
Hi,
On Mon, Jun 23, 2014 at 1:58 PM, Thomas Mueller wrote:
>>It's more than access control. The query engine needs to double-check
>>the constraints of the query for each matching path before passing
>>that node to the client (see the constraint.evaluate() call in [1]). I
>>don't see any easy way
Hi,
>It's more than access control. The query engine needs to double-check
>the constraints of the query for each matching path before passing
>that node to the client (see the constraint.evaluate() call in [1]). I
>don't see any easy way to avoid that step without major refactoring.
If there is
Hi,
On Mon, Jun 23, 2014 at 11:18 AM, Thomas Mueller wrote:
>>Sure, but we don't use a covered index.
>
> Yes, we are not there yet. The node is currently loaded to check access
> rights, but that's an implementation detail of access control part. And
> it's not needed for the admin. If (when) th
Hi,
>The problem with that assumption is that typically a single disk read
>to the index would return n paths, whereas loading those n nodes might
>well take n more disk reads.
Ideally, the cost returned of the index would reflect that. For
single-property indexes (all property indexes are single
Hi,
On Mon, Jun 23, 2014 at 3:30 AM, Thomas Mueller wrote:
>>Right. I don't believe the cost of the index lookup is significant (at
>>least in the asymptotic sense) compared to the overall cost of
>>executing a query.
>
> Sorry, I don't understand. The cost of the index lookup *is* significant
>
Hi,
>should we just return the number of estimated entries for the cost?
For Lucene, the property index, the ordered index, and the node type
index: yes.
For Solr, the cost per index lookup (not per entry) is probably a bit
higher, because there is a network round trip. Specially if Solr is
rem
Hi,
On Wed, Jun 18, 2014 at 11:31 AM, Tommaso Teofili
wrote:
> 2014-06-18 16:02 GMT+02:00 Jukka Zitting :
>> On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili
>> wrote:
>> > should we just return the number of estimated entries for the cost?
>>
>> Yes, that's what I think the contract should be.
Hi,
2014-06-18 16:02 GMT+02:00 Jukka Zitting :
> Hi,
>
> On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili
> wrote:
> > should we just return the number of estimated entries for the cost?
>
> Yes, that's what I think the contract should be.
>
ok, that's different from what Thomas suggests, right
Hi,
2014-06-18 13:44 GMT+02:00 Thomas Mueller :
> Hi,
>
> >>QueryIndex.getCost
>
> >my doubt is what
> >this heuristic function to estimate the "traversed entries" should look
> >like in general
>
> Relational databases typically know the number of entries in the index
> (total indexed entries),
ok, thanks Davide for the pointers.
Regards,
Tommaso
2014-06-18 13:36 GMT+02:00 Davide Giannella :
> On 18/06/2014 10:26, Tommaso Teofili wrote:
> > it would be ok for me to either deprecate it or improve the semantics
> > of the cost calculation (e.g. explicitly introduce other metrics to be
>
Hi,
On Wed, Jun 18, 2014 at 7:44 AM, Thomas Mueller wrote:
>>My other concern on this point is that it's not granted, in my opinion,
>>that the index returning less entries would be the faster.
>
> Yes, it's not that much about less entries or more entries, it's about
> lower or higher cost. If t
Hi,
On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili
wrote:
> should we just return the number of estimated entries for the cost?
Yes, that's what I think the contract should be.
> My other concern on this point is that it's not granted, in my opinion,
> that the index returning less entries wo
Hi,
>>QueryIndex.getCost
>my doubt is what
>this heuristic function to estimate the "traversed entries" should look
>like in general
Relational databases typically know the number of entries in the index
(total indexed entries), plus the selectivity of a column. See also
http://www.akadia.com/se
On 18/06/2014 10:26, Tommaso Teofili wrote:
> it would be ok for me to either deprecate it or improve the semantics
> of the cost calculation (e.g. explicitly introduce other metrics to be
> taken into account in the cost calculation: local / remote index,
With the IndexPlan.isDelayed() we instruc
2014-06-04 9:36 GMT+02:00 Thomas Mueller :
> Hi,
>
> QueryIndex.getCost: this is actually quite well documented (see the
> Javadocs). But the implementations might not fully follow the contract :-)
>
this is probably just my opinion but the contract is not much clear; to me
finding "the worst-cas
>>
>>We could let the
>> user decide if using an asynchronous index is OK or not.
>
>Another option is if there is no synch index available but an asynch
>index is available then QueryEngine should use that instead of
>resorting to traversal.
Well, this is the current behavior. The query engine do
On Wed, Jun 4, 2014 at 1:06 PM, Thomas Mueller wrote:
> We could let the
> user decide if using an asynchronous index is OK or not.
Another option is if there is no synch index available but an asynch
index is available then QueryEngine should use that instead of
resorting to traversal. This shou
Hi,
QueryIndex.getCost: this is actually quite well documented (see the
Javadocs). But the implementations might not fully follow the contract :-)
But anyway, I think it's anyway the better to deprecate it and use
AdvancedQueryIndex, as it has more features (specially important for
ordered indexes
2014-05-27 11:21 GMT+02:00 Davide Giannella :
> On 26/05/2014 09:25, Tommaso Teofili wrote:
> > ...
> > Also the efficiency is not evaluated on a "cost model", each QueryIndex
> > implementation can return an arbitrary different number; on one hand this
> > is ok as it allows to take very index sp
On 26/05/2014 09:25, Tommaso Teofili wrote:
> ...
> Also the efficiency is not evaluated on a "cost model", each QueryIndex
> implementation can return an arbitrary different number; on one hand this
> is ok as it allows to take very index specific constraint into account: on
> the other hand if on
Hi all,
I'd like to start discussing how we may improve / simplify current way of
selecting a query engine to use for a certain query.
In the QueryIndex interface we have the plain old getCost method which
selects the index returning the lower cost for the given query but,
recently, also an Advan
31 matches
Mail list logo