[
https://issues.apache.org/jira/browse/OAK-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999387#comment-14999387
]
Vikas Saurabh commented on OAK-3276:
------------------------------------
I get following test case failures in oak-lucene by trivial switch to
{{StandardAnalyzer}}.
{noformat}
Failed tests:
testFulltext(org.apache.jackrabbit.oak.jcr.query.QueryFulltextTest):
expected:<[]> but was:<[/testroot/node3]>
testSpellcheckMultipleWords(org.apache.jackrabbit.oak.jcr.query.SpellcheckTest):
expected:<[[voting in ontario]]> but was:<[[]]>
testSpellcheckSql(org.apache.jackrabbit.oak.jcr.query.SpellcheckTest):
expected:<[hello[, hold]]> but was:<[hello[]]>
testSpellcheckXPath(org.apache.jackrabbit.oak.jcr.query.SpellcheckTest):
expected:<[hello[, hold]]> but was:<[hello[]]>
containsPathStrict(org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexQueryTest):
Expected path /match_on_path not found, got []
containsPathStrictNum(org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexQueryTest):
Expected path /match_on_path1234 not found, got []
analyzerWithStopWords(org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexTest):
Result set size is different (..)
testTokens(org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexTest):
expected:<[first, second]> but was:<[first_second]>
Tests in error:
sql1(org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexQueryTest):
Results in target\oajopi.lucene.LuceneIndexQueryTest_sql1.txt don't match
expected results in
C:\Users\vsaurabh\Documents\Projects\CQ-misc\jackrabbit-oak\oak-lucene\target\test-classes\org\apache\jackrabbit\oak\query\sql1.txt;
compare the files for details; got=(..)
{noformat}
There are 2 categories for failures here:
* Stop words getting analyzed out by {{StandardAnalyzer}} but not by
{{OakAnalyzer}}.
* {{OakAnalyzer}} uses a {{WordDelimiterFilter}} which splits on {{:}}, {{_}},
{{.}} while {{StandardAnalyzer}} (internally {{StandardTokenizer}}) doesn't
(possibly the set is bigger which isn't covered by tests here)
Stop word failures
||Test case||Error||Comment||
|QueryFulltextTest#testFulltext|expected:<\[]> but
was:<\[/testroot/node3]>|stop word 'or'|
|SpellcheckTest#testSpellcheckMultipleWords|expected:<\[\[voting in
ontario]]>|stop word 'in'|
|LuceneIndexTest#analyzerWithStopWords|Result set size is different (..)|stop
word 'was'|
WordDelimiterFilter failures
||Test case||Error||Comment||
|SpellcheckTest#testSpellcheckSql|expected:<\[hello\[, hold]]> but
was:<\[hello\[]]>|delimiter {{:}}. 'hold' is getting suggested due to
'rep:hold'|
|SpellcheckTest#testSpellcheckXPath|expected:<\[hello\[, hold]]> but
was:<\[hello\[]]>|delimiter {{:}}. 'hold' is getting suggested due to
'rep:hold'|
|LuceneIndexQueryTest#containsPathStrict|Expected path /match_on_path not
found, got \[]|delimiter {{_}}|
|LuceneIndexQueryTest#containsPathStrictNum|Expected path /match_on_path1234
not found, got \[]|delimiter {{_}}|
|LuceneIndexTest#testTokens|expected:<\[first, second]> but
was:<\[first_second]>|delimiter {{_}}|
|LuceneIndexQueryTest#sql1| |delimiter {{.}}. Picking 'jackrabbit' as spell
check option due to jackrabbit.apache.org available on namespaces node.|
Failures with stop words can be trivially fixed by using EMPTY stop word set
for {{StandardAnalyzer}} 's constructor (I'm assuming that explicit test case
against stop words imply that ootb we are not supposed to have any stop words)
For the word delimiting issues, I couldn't find a way to use
{{WordDelimiterFilter}} to be used by {{StandardAnalyzer}}. Also, since we have
explicit test case for delimiting on ':' and '-', I'm assuming that we are
required to delimit on that. '.' is undefined from test case perspective but
feels like it should be used as a delimiter. [~teofili], how can we do this?
> Make StandardAnalyzer as the default search analyzer
> ----------------------------------------------------
>
> Key: OAK-3276
> URL: https://issues.apache.org/jira/browse/OAK-3276
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene
> Reporter: Satya Deep Maheshwari
> Assignee: Tommaso Teofili
> Fix For: 1.4
>
>
> Please vote on RTC for making
> org.apache.lucene.analysis.standard.StandardAnalyzer
> as the default OOTB analyzer. This analyzer is capable of handling
> surrogate characters unlike the current default analyzer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)