Hi devs, Vikas Saurabh in particular, 

In OAK-7109 Vikas asked to provide some further infos about the query we use in 
our project and which requires facet counting. So here we go:

select s.[jcr:score], s.[jcr:path], [rep:facet(jcr:content/editorial/cq:tags)], 
[rep:facet(jcr:content/maincategories/cq:tags)]  from [cq:Page] as s where 
(contains(s.[*],'news') and (isdescendantnode(s,'/content/mam/web/de/en') or 
(s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/newslibrary') and 
isdescendantnode(s,'/content/mam/web/gc/news/en') and 
((s.[jcr:content/countries/selectedCountries] = 'true' and 
s.[jcr:content/countries/cq:tags] in('web:system/countries/de')) or 
(s.[jcr:content/countries/selectedCountries] = 'false' and 
not(s.[jcr:content/countries/cq:tags] in('web:system/countries/de'))) or 
(s.[jcr:content/countries/selectedCountries] = 'true' and 
s.[jcr:content/countries/cq:tags] is null))) or 
(isdescendantnode(s,'/content/mam/web/gc/help/en') and 
s.[jcr:content/sling:resourceType] in('mam/web/pagetypes/helplibrary')) or 
((isdescendantnode(s,'/content/mam/web/gc/partners/air/en') or 
isdescendantnode(s,'/content/mam/web/gc/partners/non-air/en')) and 
s.[jcr:content/sling:resourceType] 
in('mam/web/pagetypes/partnerlibrary','mam/web/pagetypes/airline-partnerlibrary')
 and ((s.[jcr:content/countries/selectedCountries] = 'true' and 
s.[jcr:content/countries/cq:tags] in('web:system/countries/de')) or 
(s.[jcr:content/countries/selectedCountries] = 'false' and 
not(s.[jcr:content/countries/cq:tags] in('web:system/countries/de'))) or 
(s.[jcr:content/countries/selectedCountries] = 'true' and 
s.[jcr:content/countries/cq:tags] is null))))) order by s.[jcr:score] desc

Resulting in the following execution plan:

[cq:Page] as [s] /* lucene:my_lucene(/oak:index/my_lucene) :fulltext:news 
ordering:[{ propertyName : jcr:score, propertyType : UNDEFINED, order : 
DESCENDING }] ft:("news") where contains([s].[*], 'news') */

With the following stored index definition

/{jcr:primaryType = oak:QueryIndexDefinition, compatVersion = 2, :version = 2, 
:source-path = /oak:index/Copy of cqPageLucene, costPerExecution = 0, type = 
lucene, async = [async, nrt], evaluatePathRestrictions = true, excludedPaths = 
[/var, /etc/replication, /etc/workflow/instances, /jcr:system], reindex = true, 
reindexCount = 13}
    aggregates{jcr:primaryType = nt:unstructured, :childOrder = [cq:Page, 
nt:file, cq:PageContent]}
      nt:file{jcr:primaryType = nt:unstructured, :childOrder = [include0]}
        include0{jcr:primaryType = nt:unstructured, path = jcr:content, 
:childOrder = []}
      cq:Page{jcr:primaryType = nt:unstructured, :childOrder = [include0]}
        include0{jcr:primaryType = nt:unstructured, relativeNode = true, path = 
jcr:content, :childOrder = []}
      cq:PageContent{jcr:primaryType = nt:unstructured, :childOrder = 
[include0, include1, include2, include3]}
        include3{jcr:primaryType = nt:unstructured, path = */*/*/*, :childOrder 
= []}
        include0{jcr:primaryType = nt:unstructured, path = *, :childOrder = []}
        include1{jcr:primaryType = nt:unstructured, path = */*, :childOrder = 
[]}
        include2{jcr:primaryType = nt:unstructured, path = */*/*, :childOrder = 
[]}
    facets{jcr:primaryType = nt:unstructured, topChildren = 1000, secure = 
false}
      jcr:content{jcr:primaryType = nt:unstructured, multivalued = true}
        editorial{jcr:primaryType = nt:unstructured, multivalued = true}
          cq:tags{jcr:primaryType = nt:unstructured, multivalued = true}
        maincategories{jcr:primaryType = nt:unstructured, multivalued = true}
          cq:tags{jcr:primaryType = nt:unstructured, multivalued = true}
    indexRules{jcr:primaryType = nt:unstructured, :childOrder = [cq:Page]}
      cq:Page{jcr:primaryType = nt:unstructured, :childOrder = [properties]}
        properties{jcr:primaryType = nt:unstructured, :childOrder = 
[slingResourceType, editorialTags, mainCategoriesTags, jcrTitle, 
jcrDescription, systemprops, props, selectedCountries, countryTags]}
          mainCategoriesTags{jcr:primaryType = nt:unstructured, facets = true, 
:source-path = /oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of 
editorialTags, propertyIndex = true, stored = true, name = 
jcr:content/maincategories/cq:tags, :childOrder = []}
          systemprops{jcr:primaryType = nt:unstructured, :source-path = 
/oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of props, isRegexp 
= true, name = ^(cq|jcr|sling):.+$, index = false, :childOrder = []}
          jcrDescription{jcr:primaryType = nt:unstructured, nodeScopeIndex = 
true, :source-path = 
/oak:index/mamcom_lucene/indexRules/cq:Page/properties/Copy of jcrTitle, name = 
jcr:content/jcr:title, type = String, :childOrder = []}
          jcrTitle{jcr:primaryType = nt:unstructured, nodeScopeIndex = true, 
name = jcr:content/jcr:title, type = String, :childOrder = []}
          countryTags{jcr:primaryType = nt:unstructured, propertyIndex = true, 
name = jcr:content/countries/cq:tags, :childOrder = []}
          props{jcr:primaryType = nt:unstructured, nodeScopeIndex = true, 
analyzed = true, isRegexp = true, name = ^[^\/]*$, :childOrder = []}
          selectedCountries{jcr:primaryType = nt:unstructured, propertyIndex = 
true, name = jcr:content/countries/selectedCountries, :childOrder = []}
          slingResourceType{jcr:primaryType = nt:unstructured, propertyIndex = 
true, name = jcr:content/sling:resourceType, :childOrder = []}
          editorialTags{jcr:primaryType = nt:unstructured, facets = true, 
propertyIndex = true, stored = true, name = jcr:content/editorial/cq:tags, 
:childOrder = []}

Thanks Vikas for your investigation so far. I agree in all what you wrote so 
far - post filtering for counting facets will probably be expensive. I don’t 
know why in that case not all constraints are passed to the index. Form what I 
have seen, the deep combinations of disjunctions, conjunctions and path 
constraints might be causing that. Unfortunately this query formulates some 
business logic we agreed on with the customer - so they are not target to be 
changed. 

In my naive assumption I would say that the fulltext constraint, if splitting 
into multiple queries will be part of any on the disjunctive statements (or 
unions) and with that the queryNorm(q) according to 
https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
 
<https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html>
 will be the same for each of the queries. Property constraints and even path 
constraints could potentially be boosted to 0 to not have any impact on the 
score - anyway from what I could observe in our tests scores are, if coming 
from the same index, comparable across (similar) queries with the same fulltext 
constraint but different property constraints.

Cheers,
Dirk

Reply via email to