I have backported the fix for OAK-6333 to 1.2 branch also. Would be part of Oak 1.2.28 Chetan Mehrotra
On Wed, Aug 30, 2017 at 8:18 AM, Jason Bailey <jason.bai...@sas.com> wrote: > Yes, that's how I patched the problem. You've answered my question by > explaining that the index value is not the correct value and that issue is > being addressed. > > Thank you. > Jason > > -----Original Message----- > From: Chetan Mehrotra [mailto:chetan.mehro...@gmail.com] > Sent: Wednesday, August 30, 2017 10:56 AM > To: users@jackrabbit.apache.org > Subject: Re: slowness accessing lucene results > > EXTERNAL > >> "cost for lucene-property[/oak:index/lucene] is 1001.0", "property >> cost for cqMaster is 1584.0", > > Problem here is global fultext index /oak:index/lucene is getting picked up > instead of cqMaster. This has been fixed with OAK-6333 (not backported to 1.2 > yet). As a workaround you can set 'entryCount' > property to some high value such that this lucene index is not preferred > Chetan Mehrotra > > > On Wed, Aug 30, 2017 at 7:05 PM, Jason Bailey <jason.bai...@sas.com> wrote: >> Hi Chetan, >> >> For the lucene index provider we have the following configured >> Enabled: >> Copy on Read >> Copy on Write >> Open Index asynchronously >> Prefetch Index Files >> >> Thread pool is 20 >> Extracted text cache size is 0 >> Extracted text cache expiry 300 >> Boolean clause limit 1024 >> >> Additional notes: >> The timing is coming from a backup of our prod environment that I'm using >> for testing. Running the query in our production environment actually >> results in a response time that is closer to an hour. When we use a query >> that does not use Lucene and utilizes another index like cqMaster it is >> magnitudes faster, in the range of 400 milliseconds to count. >> >> I'm investigating the usage of ACL now. >> >> The following is the text from the explain query >> >> { >> "statement": "SELECT * FROM [nt:base] WHERE ISDESCENDANTNODE([/content]) >> AND [cq:master] IS NOT NULL", >> "language": "JCR-SQL2", >> "explain": { >> "logs": [ >> "cost using filter Filter(query=explain SELECT * FROM [nt:base] >> WHERE ISDESCENDANTNODE([/content]) AND [cq:master] IS NOT NULL, >> path=/content//*, property=[cq:master=[is not null]])", >> "cost for aggregate lucene is Infinity", >> "Evaluating plan with index definition Lucene Index : >> cqTag(/oak:index/cqTagLucene)", >> "Evaluating plan with index definition Lucene Index : >> workflow(/oak:index/workflowDataLucene)", >> "Evaluating plan with index definition Lucene Index : >> authorizables(/oak:index/authorizables)", >> "Evaluating plan with index definition Lucene Index : >> /oak:index/damAssetLucene", >> "Evaluating plan with index definition Lucene Index : >> tags(/oak:index/ntBaseLucene)", >> "Evaluating plan with index definition Lucene Index : >> /oak:index/lucene", >> "Evaluating plan with index definition Lucene Index : >> cq:Page(/oak:index/cqPageLucene)", >> "cost for lucene-property[/oak:index/lucene] is 1001.0", >> "cost for reference is Infinity", >> "getPlans(Filter, List<OrderEntry>, NodeState)", >> "getPlans() - filter: Filter(query=explain SELECT * FROM >> [nt:base] WHERE ISDESCENDANTNODE([/content]) AND [cq:master] IS NOT NULL, >> path=/content//*, property=[cq:master=[is not null]]) - ", >> "getPlans() - sortOrder: null - ", >> "getPlans() - rootState: { jcr:primaryType = rep:root, >> jcr:mixinTypes = [rep:RepoAccessControllable, rep:AccessControllable], >> sling:target = /index.html, sling:resourceType = sling:redirect, :async : { >> async-temp = [fe9031ca-3935-4a24-b9bd-f6a6e4d8d035, >> f3ec0fb4-6f32-488d-b751-be3179dd0a40], async = >> fe9031ca-3935-4a24-b9bd-f6a6e4d8d035, async-LastIndexedTo = >> 2017-08-24T14:57:29.853-04:00 }, tmp = { ... }, home = { ... }, oak:index = >> { ... }, conf = { ... }, jcr:system = { ... }, etc = { ... }, system = { ... >> }, content = { ... }, bin : { jcr:primaryType = nt:folder, jcr:mixinTypes = >> [mix:versionable], jcr:createdBy = admin, jcr:versionHistory = >> e2777f7a-8fc5-4f23-9fc0-4478f1f063f6, jcr:predecessors = [], jcr:created = >> 2016-01-18T15:28:50.055-05:00, jcr:baseVersion = >> 777eb8f9-e606-4f29-bd6a-205994c9cd64, jcr:isCheckedOut = false, jcr:uuid = >> 40ed38c9-cb13-41dc-8602-0249e2c1b4c2 }, rep:policy = { ... }, libs = { ... >> }, rep:repoPolicy = { ... }, var = { ... }, apps = { ... } } - ", >> "cost for ordered is Infinity", >> "cost for nodeType is Infinity", >> "property cost for cqMaster is 1584.0", >> "cost for property is 1584.0", >> "cost for traverse is 1.0588708E7" >> ], >> "plan": "[nt:base] as [nt:base] /* lucene:lucene(/oak:index/lucene) >> :ancestors:/content where (isdescendantnode([nt:base], [/content])) and >> ([nt:base].[cq:master] is not null) */" >> }, >> "heuristics": { >> "executeTime": 0, >> "getNodesTime": 3279, >> "count": 2949, >> "countTime": 349807, >> "totalTime": 353086 >> } >> } >> >> -Jason >> >> -----Original Message----- >> From: Chetan Mehrotra [mailto:chetan.mehro...@gmail.com] >> Sent: Wednesday, August 30, 2017 1:03 AM >> To: users@jackrabbit.apache.org >> Subject: Re: slowness accessing lucene results >> >> EXTERNAL >> >> The query can be slow due to multiple reasons >> >> 1. Copy-On-Write support is not enabled >> >> 2. Some of the JCR query constraints may not be getting mapped to >> Lucene index i.e. index definition does not have full coverage of >> query constraints causing quite a few paths to get filtered out. So >> actual number of rows accessed may be large but only few of them >> qualify all criteria at query engine level. This can be confirmed by >> query explanation output >> >> 3. Or lots of filtering happening due to access control. Does the >> performance remains same with admin user? >> >> Chetan Mehrotra >> >> >> On Mon, Aug 28, 2017 at 10:02 AM, Jason Bailey <jason.bai...@sas.com> wrote: >>> Hi all, >>> >>> I'm tracking down a problem I'm currently experiencing when my queries use >>> the lucene index on oak 1.2.18. Here are the figures that I get from the >>> acs commons tool that breaks down queries. >>> >>> "executeTime": 0 >>> "getNodesTime": 3279 >>> "count": 2949 >>> "countTime": 349807 >>> "totalTime": 353086 >>> >>> In essence, it's taking me about 6 minutes to iterate through less than >>> 3000 results. Does anyone have any ideas on root cause? >>> >>> -Jason