LargeDocHighlighter - another span highlighter optimized for large documents
Key: LUCENE-1286
URL: https://issues.apache.org/jira/browse/LUCENE-1286
Project: Lucene - Java
Mike, yeah, I just got the test to fail a few more times and then pass once. I
*think* the expected vs. was output was slightly different in different failed
runs, so it might be that random bit that's the culprit
java.nio.charset.Charset.defaultCharset().name() gives me "UTF-8" on my JVM:
$ ja
I think the original use-case is in LUCENE-1224 where Hiroaki wrote this:
With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef"
string into an index, but I can't query it with "abc". If I query with
"ab", I can get a hit result.
The reason is that the NGramTokenFilter generates bad
[
https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596717#action_12596717
]
gsingers edited comment on LUCENE-1166 at 5/15/08 1:15 PM:
-
The conventional use of ngrams when searching is not to treat them as a
set but a sequence. Thus, for "foola" you could index the sequence
["_f", "fo", "oo", "ol", "la", "a_"], and then search for the phrase
["oo", "ol"] to find all occurences of "ool". This is useful in
languages that use l
[
https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597178#action_12597178
]
Otis Gospodnetic commented on LUCENE-1216:
--
Aha, that makes sense - thanks for cl
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597174#action_12597174
]
otis edited comment on LUCENE-1224 at 5/15/08 9:07 AM:
---
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597174#action_12597174
]
Otis Gospodnetic commented on LUCENE-1224:
--
Hiroaki:
I agree with Grant about uni
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597161#action_12597161
]
Grant Ingersoll commented on LUCENE-1224:
-
I think the right way is simply to
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597156#action_12597156
]
Hiroaki Kawai commented on LUCENE-1224:
---
About test code: I'm not going to say that
[
https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-1285:
Attachment: highlighter-test.patch
Test that exposes the problem. The posted patch makes the test
[
https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597130#action_12597130
]
Mark Miller commented on LUCENE-1285:
-
Nice catch and the fix looks great.
Thanks And
[
https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated LUCENE-1285:
--
Attachment: highlighter.patch
A patch to fix the issue.
> WeightedSpanTermExtractor i
Andrzej Bialecki (JIRA) wrote:
WeightedSpanTermExtractor doesn'
Key: LUCENE-1285
URL: https://issues.apache.org/jira/browse/LUCENE-1285
Project: Lucene - Java
Issue Type: Bug
Components: contrib/h
[
https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated LUCENE-1285:
--
Description:
Given a BooleanQuery with multiple clauses, if a term occurs both in a Sp
WeightedSpanTermExtractor doesn'
Key: LUCENE-1285
URL: https://issues.apache.org/jira/browse/LUCENE-1285
Project: Lucene - Java
Issue Type: Bug
Components: contrib/highlighter
Affects Versions:
See https://issues.apache.org/jira/browse/LUCENE-1224
Do people have an opinion on what positions ngrams should be output
at? For instance, given 1-grams on "abc fgh", these are currently
output as: a, b, c, f, g,h all with a position increment of 1. That
seems somewhat reasonable, but it
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597112#action_12597112
]
DM Smith commented on LUCENE-1224:
--
My take as a user:
Maybe, I don't understand the app
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597107#action_12597107
]
Grant Ingersoll commented on LUCENE-1224:
-
{quote}Umm..., if you don't like indexi
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597098#action_12597098
]
Hiroaki Kawai commented on LUCENE-1224:
---
In my understanding,
--
The sequenc
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597094#action_12597094
]
Hiroaki Kawai commented on LUCENE-1224:
---
Umm..., if you don't like indexing and quer
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597080#action_12597080
]
Grant Ingersoll commented on LUCENE-1224:
-
FWIW, I also think we should address th
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597079#action_12597079
]
Grant Ingersoll commented on LUCENE-1224:
-
OK, let me change the comment. You can
[
https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597078#action_12597078
]
Michael McCandless commented on LUCENE-1282:
OK: running with -client prevents
[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597069#action_12597069
]
Hiroaki Kawai commented on LUCENE-1224:
---
Q: Why it is necessary to index
A: Because
[
https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597052#action_12597052
]
Michael McCandless commented on LUCENE-1282:
bq. In my 100% reproducible case
[
https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597048#action_12597048
]
Michael McCandless commented on LUCENE-1282:
{quote}
One of the classic proble
[
https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597042#action_12597042
]
Michael McCandless commented on LUCENE-1282:
bq. Has it been reported to Sun y
[
https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597036#action_12597036
]
Hiroaki Kawai commented on LUCENE-1216:
---
The reason of setWhitespaceDelimiter():
Of
29 matches
Mail list logo