[
https://issues.apache.org/jira/browse/OAK-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734510#comment-14734510
]
Chetan Mehrotra commented on OAK-3367:
--------------------------------------
Based on internal discussion with [~tmueller] and [~teofili] we have following
options
h4. Approach A - Index time boost with boost information stored in payload via
custom tokenizer
Implementation wise this feature is similar to what is [supported in
Elasticsearch|http://jontai.me/blog/2012/10/lucene-scoring-and-elasticsearch-_all-field/].
*Work Required*
# Need to provide a custom tokenize similar to what [Elasticsearch has
done|https://github.com/elastic/elasticsearch/issues/63]
*Pros*
# Simpler to use for end user as the user does not have to mention all possible
such field in the query
*Cons*
# Index time boosting has its drawback. See
[here|https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html#index-boost]
# It makes more sense with conditional indexing support. See
[http://stackoverflow.com/a/9398823/1035417]
{quote}
Index time field boosts are a way to express things like "this document's title
is worth twice as much as the title of most documents". Query time boosts are a
way to express "I care about matches on this clause of my query twice as much
as I do about matches on other clauses of my query".
Index time field boosts are worthless if you set them on every document.
{quote}
h4. Approach B - Index time boost with query involving multiple clauses
Compared to above usecase Jackrabbit 2.x so far supported boosting in a
different way. See [1] for details. This requires the user to phrase the query
in a different way i.e. explicitly have OR clauses for multiple fields
bq. Note: The boost in this case is respected only if a jcr:contains() is done
on the corresponding property, for example jcr:contains(@jcr:title, 'find
this'). If there is only a jcr:contains(., 'find this'), the boosts at
indexing time have no effect.
{code}
/jcr:root/content/geometrixx-outdoors/en//element(*, cq:Page)
[
jcr:contains(@jcr:title, 'Keyword')
OR jcr:contains(@jcr:description, 'Keyword')
OR jcr:contains(., 'Keyword')
] order by @jcr:score descending
{code}
*Work Required*
# Boosting would need to be done on a per field basis and can be applied at
query time (suggested by Tommaso). {{LucenePropertyIndex}} can check if the
query is being applied against specific field and then can boost that query
clause based on property definition.
# OR we fix the editor to create field with boost level set
# In addition we would need to ensure that when results for multiple OR clauses
are combined then results are merge sorted based on jcr:score (OAK-2944 to be
merged to branches for this)
*Pros*
# Boost logic more apparent and can be changed without requiring reindexing
# Behavior compatible with JR2
*Cons*
# Queries need to be written in ways explained above
h4. Approach C - Query time boost with query expanded by LucenePropertyIndex
* User would still specify the normal query i.e. just search on node
* On index config side he would mark the field which needs to be given special
boost with {{analyzed}} and {{nodeScopeIndex}} set to true and boost specified.
However we would not add the boost at indexing time yet!
* On query side {{LucenePropertyIndex}} would translate the search on node to
multiple OR clauses with TermQuery for all configured field having {{analyzed}}
and {{nodeScopeIndex}} set to true in addition to TermQuery on node level
fulltext field
This approach combines best of both above approach query time boosting and not
let user rephrase the query!
[1] https://helpx.adobe.com/experience-manager/kb/BoostInSearch.html
> Boosting fields not working as expected
> ---------------------------------------
>
> Key: OAK-3367
> URL: https://issues.apache.org/jira/browse/OAK-3367
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: lucene
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.3.6
>
>
> When the boost support was added the intention was to support a usecase like
> {quote}
> For the fulltext search on a node where the fulltext content is derived from
> multiple field it should be possible to boost specific text contributed by
> individual field. Meaning that if a title field is boosted more than
> description, the title (part) in the fulltext field will mean more than the
> description (part) in the fulltext field.
> {quote}
> This would enable a user to perform a search like
> _/jcr:root/content/geometrixx-outdoors/en//element(*,
> cq:Page)\[jcr:contains(., 'Keyword')\]_ and get a result where pages having
> 'Keyword' in title come above in search result compared to those where
> Keyword is found in description.
> Current implementation just sets the boost while add the field value to
> fulltext field with the intention that Lucene would use the boost as
> explained above. However it does not work like that and boost value gets
> multiplies with other field and hence boosting does not work as expected
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)