Re: Querying vs iterating

Julian Sedding Mon, 20 Jun 2016 07:01:45 -0700

Hi Roy

Yes, I would expect that you cannot measure any meaningful difference.
Using a query may be marginally faster, because it can traverse using
internal Oak APIs. On the other hand it may be slightly slower,
because of possible QueryEngine overhead.


Personally I would test whether it works sufficiently well with a
query, because it is less code.

Note also that Sling Query
(https://sling.apache.org/documentation/bundles/sling-query.html)
allows you to express a query and choose traversal vs query as a
strategy. This may or may not help.

Regards
Julian


On Mon, Jun 20, 2016 at 3:52 PM, Roy Teeuwen <r...@teeuwen.be> wrote:
> Hey Julian,
>
> Ok cool, for me the context is querying on a page in AEM, so I am creating a 
> query for one cq:Page node, so that will be most of the times max like 10-20 
> nodes.
> So what you are saying then is that it shouldn’t really matter in performance 
> to choose either for manually traverse myself or doing a query when looking 
> to see if a specific property name exists on the page,
> because behind the scene it will most likely traverse itself then anyway, 
> right?
>
> Thanks!
> Roy
>> On 20 Jun 2016, at 15:43, Julian Sedding <jsedd...@gmail.com> wrote:
>>
>> Hi Roy
>>
>> From you question ("hard to put an index to it") I assume that you are
>> running on an Oak repository. If that is incorrect, my answer does not
>> apply.
>>
>> Oak will always consider traversal as an alternative to existing
>> indexes. For most queries the cost of traversal is so high that an
>> index is chosen. However, if no suitable index exists (and
>> theoretically also if the traversal is cheaper than a lookup in a
>> matching index), it will do a traversal behind the scenes. Note that
>> traversal logs a warning every 10000 traversed nodes. So if you plan
>> to traverse more than that you should really consider creating an
>> index.
>>
>> In short: with Oak using a query on a small subtree should give you
>> what you want, even without an index.
>>
>> Regards
>> Julian
>>
>>
>> On Thu, Jun 16, 2016 at 4:44 PM, Steven Walters <kemu...@gmail.com> wrote:
>>> Hopefully other people chime in here, I've only had bad experiences
>>> with utilizing queries and have often resulted in personally never
>>> using them - so I always end up iterating/navigating myself.
>>>
>>> Theoretically if you have a REALLY GOOD index then you may get some
>>> similar performances, but if your index(es) are inefficient, then it's
>>> just wasted CPU cycles (you'd wish those CPU cycles were going to a
>>> good cause, but they're not).
>>>
>>> the transition of Sling (and AEM) to Oak from Jackrabbit 2.x made this
>>> experience worse with the awkward indexing policies/process in Oak,
>>> and the fact that Oak never seemed to ever use multiple indexes.
>>> Oak always seemed to calculates the costs of the entire query against
>>> all the available indexes and only chooses the ONE best index.
>>> This sounds like a good idea in theory, but then most DBMS I've used
>>> in the past utilize ALL the indexes they can - not just one.
>>>
>>> So basically i guess this comes to be "If you have a good index (in
>>> that it can apply to ALL the conditions/attributes/properties of your
>>> query) then using a query should be fine, otherwise iterate yourself"
>>> having any condition missing from the index can be fatal in
>>> performance, such as lacking the evaluatePathRestrictions = true,
>>> which without it is basically death of the system if you have a lot of
>>> content.
>>>
>>> But really, I hope some other people with more positive experiences
>>> can provide some better advice.
>>>
>>> On Thu, Jun 16, 2016 at 11:08 PM, Roy Teeuwen <r...@teeuwen.be> wrote:
>>>> Ok, it would be handy to have an estimate on the approximate amount / 
>>>> levels of resources when to go for iterating vs querying :).
>>>>
>>>> Greets
>>>> Roy
>>>>> On 16 Jun 2016, at 16:06, Steven Walters <kemu...@gmail.com> wrote:
>>>>>
>>>>> if you know there are that few resources, then I say iterating would be
>>>>> better performing than XPath / JCR-SQL2 queries.
>>>>> This is primarily from past experience speaking in that queries have
>>>>> generally turned out (often MUCH) slower than directly iterating if you
>>>>> know what you're actually looking for.
>>>>>
>>>>>
>>>>> On Thu, Jun 16, 2016 at 10:28 PM, Roy Teeuwen <r...@teeuwen.be> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> Lets say I got a resource with around 10-20 child/grand-child resources,
>>>>>> not going deeper than 3 levels max. What is the most performant when
>>>>>> searching for the child resources containing a specific property (the
>>>>>> property is configurable with OSGi, so hard to put an index on it).
>>>>>> Iterating the child / grand-child resources until you find it or making 
>>>>>> an
>>>>>> xpath/jcr-sql2 query? When would one option start to be more performant
>>>>>> than the other.
>>>>>>
>>>>>> Thanks!
>>>>>> Roy
>>>>
>

Re: Querying vs iterating

Reply via email to