Hi devs, Vikas Saurabh in particular,
In OAK-7109 Vikas asked to provide some further infos about the query we use in
our project and which requires facet counting. So here we go:
select s.[jcr:score], s.[jcr:path], [rep:facet(jcr:content/editorial/cq:tags)],
[rep:facet(jcr:content/maincategories/cq:tags)] from [cq:Page] as s where
(contains(s.[*],'news') and (isdescendantnode(s,'/content/mam/web/de/en') or
(s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/newslibrary') and
isdescendantnode(s,'/content/mam/web/gc/news/en') and
((s.[jcr:content/countries/selectedCountries] = 'true' and
s.[jcr:content/countries/cq:tags] in('web:system/countries/de')) or
(s.[jcr:content/countries/selectedCountries] = 'false' and
not(s.[jcr:content/countries/cq:tags] in('web:system/countries/de'))) or
(s.[jcr:content/countries/selectedCountries] = 'true' and
s.[jcr:content/countries/cq:tags] is null))) or
(isdescendantnode(s,'/content/mam/web/gc/help/en') and
s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/helplibrary')) or
((isdescendantnode(s,'/content/mam/web/gc/partners/air/en') or
isdescendantnode(s,'/content/mam/web/gc/partners/non-air/en')) and
s.[jcr:content/sling:resourceType]
in('mam/web/pagetypes/partnerlibrary','mam/web/pagetypes/airline-partnerlibrary')
and ((s.[jcr:content/countries/selectedCountries] = 'true' and
s.[jcr:content/countries/cq:tags] in('web:system/countries/de')) or
(s.[jcr:content/countries/selectedCountries] = 'false' and
not(s.[jcr:content/countries/cq:tags] in('web:system/countries/de'))) or
(s.[jcr:content/countries/selectedCountries] = 'true' and
s.[jcr:content/countries/cq:tags] is null))))) order by s.[jcr:score] desc
Resulting in the following execution plan:
[cq:Page] as [s] /* lucene:my_lucene(/oak:index/my_lucene) :fulltext:news
ordering:[{ propertyName : jcr:score, propertyType : UNDEFINED, order :
DESCENDING }] ft:("news") where contains([s].[*], 'news') */
With the following stored index definition
/{jcr:primaryType = oak:QueryIndexDefinition, compatVersion = 2, :version = 2,
:source-path = /oak:index/Copy of cqPageLucene, costPerExecution = 0, type =
lucene, async = [async, nrt], evaluatePathRestrictions = true, excludedPaths =
[/var, /etc/replication, /etc/workflow/instances, /jcr:system], reindex = true,
reindexCount = 13}
aggregates{jcr:primaryType = nt:unstructured, :childOrder = [cq:Page,
nt:file, cq:PageContent]}
nt:file{jcr:primaryType = nt:unstructured, :childOrder = [include0]}
include0{jcr:primaryType = nt:unstructured, path = jcr:content,
:childOrder = []}
cq:Page{jcr:primaryType = nt:unstructured, :childOrder = [include0]}
include0{jcr:primaryType = nt:unstructured, relativeNode = true, path =
jcr:content, :childOrder = []}
cq:PageContent{jcr:primaryType = nt:unstructured, :childOrder =
[include0, include1, include2, include3]}
include3{jcr:primaryType = nt:unstructured, path = */*/*/*, :childOrder
= []}
include0{jcr:primaryType = nt:unstructured, path = *, :childOrder = []}
include1{jcr:primaryType = nt:unstructured, path = */*, :childOrder =
[]}
include2{jcr:primaryType = nt:unstructured, path = */*/*, :childOrder =
[]}
facets{jcr:primaryType = nt:unstructured, topChildren = 1000, secure =
false}
jcr:content{jcr:primaryType = nt:unstructured, multivalued = true}
editorial{jcr:primaryType = nt:unstructured, multivalued = true}
cq:tags{jcr:primaryType = nt:unstructured, multivalued = true}
maincategories{jcr:primaryType = nt:unstructured, multivalued = true}
cq:tags{jcr:primaryType = nt:unstructured, multivalued = true}
indexRules{jcr:primaryType = nt:unstructured, :childOrder = [cq:Page]}
cq:Page{jcr:primaryType = nt:unstructured, :childOrder = [properties]}
properties{jcr:primaryType = nt:unstructured, :childOrder =
[slingResourceType, editorialTags, mainCategoriesTags, jcrTitle,
jcrDescription, systemprops, props, selectedCountries, countryTags]}
mainCategoriesTags{jcr:primaryType = nt:unstructured, facets = true,
:source-path = /oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of
editorialTags, propertyIndex = true, stored = true, name =
jcr:content/maincategories/cq:tags, :childOrder = []}
systemprops{jcr:primaryType = nt:unstructured, :source-path =
/oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of props, isRegexp
= true, name = ^(cq|jcr|sling):.+$, index = false, :childOrder = []}
jcrDescription{jcr:primaryType = nt:unstructured, nodeScopeIndex =
true, :source-path =
/oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of jcrTitle, name =
jcr:content/jcr:title, type = String, :childOrder = []}
jcrTitle{jcr:primaryType = nt:unstructured, nodeScopeIndex = true,
name = jcr:content/jcr:title, type = String, :childOrder = []}
countryTags{jcr:primaryType = nt:unstructured, propertyIndex = true,
name = jcr:content/countries/cq:tags, :childOrder = []}
props{jcr:primaryType = nt:unstructured, nodeScopeIndex = true,
analyzed = true, isRegexp = true, name = ^[^\/]*$, :childOrder = []}
selectedCountries{jcr:primaryType = nt:unstructured, propertyIndex =
true, name = jcr:content/countries/selectedCountries, :childOrder = []}
slingResourceType{jcr:primaryType = nt:unstructured, propertyIndex =
true, name = jcr:content/sling:resourceType, :childOrder = []}
editorialTags{jcr:primaryType = nt:unstructured, facets = true,
propertyIndex = true, stored = true, name = jcr:content/editorial/cq:tags,
:childOrder = []}
Thanks Vikas for your investigation so far. I agree in all what you wrote so
far - post filtering for counting facets will probably be expensive. I don’t
know why in that case not all constraints are passed to the index. Form what I
have seen, the deep combinations of disjunctions, conjunctions and path
constraints might be causing that. Unfortunately this query formulates some
business logic we agreed on with the customer - so they are not target to be
changed.
In my naive assumption I would say that the fulltext constraint, if splitting
into multiple queries will be part of any on the disjunctive statements (or
unions) and with that the queryNorm(q) according to
https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
<https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html>
will be the same for each of the queries. Property constraints and even path
constraints could potentially be boosted to 0 to not have any impact on the
score - anyway from what I could observe in our tests scores are, if coming
from the same index, comparable across (similar) queries with the same fulltext
constraint but different property constraints.
Cheers,
Dirk