[jira] Created: (LUCENE-1286) LargeDocHighlighter - another span highlighter optimized for large documents

2008-05-15 Thread Mark Miller (JIRA)
LargeDocHighlighter - another span highlighter optimized for large documents Key: LUCENE-1286 URL: https://issues.apache.org/jira/browse/LUCENE-1286 Project: Lucene - Java

Re: TestPayloads FAILED?

2008-05-15 Thread Otis Gospodnetic
Mike, yeah, I just got the test to fail a few more times and then pass once. I *think* the expected vs. was output was slightly different in different failed runs, so it might be that random bit that's the culprit java.nio.charset.Charset.defaultCharset().name() gives me "UTF-8" on my JVM: $ ja

Re: NGrams and positions

2008-05-15 Thread Otis Gospodnetic
I think the original use-case is in LUCENE-1224 where Hiroaki wrote this: With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string into an index, but I can't query it with "abc". If I query with "ab", I can get a hit result. The reason is that the NGramTokenFilter generates bad

[jira] Issue Comment Edited: (LUCENE-1166) A tokenfilter to decompose compound words

2008-05-15 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596717#action_12596717 ] gsingers edited comment on LUCENE-1166 at 5/15/08 1:15 PM: -

Re: NGrams and positions

2008-05-15 Thread Doug Cutting
The conventional use of ngrams when searching is not to treat them as a set but a sequence. Thus, for "foola" you could index the sequence ["_f", "fo", "oo", "ol", "la", "a_"], and then search for the phrase ["oo", "ol"] to find all occurences of "ool". This is useful in languages that use l

[jira] Commented: (LUCENE-1216) CharDelimiterTokenizer

2008-05-15 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597178#action_12597178 ] Otis Gospodnetic commented on LUCENE-1216: -- Aha, that makes sense - thanks for cl

[jira] Issue Comment Edited: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597174#action_12597174 ] otis edited comment on LUCENE-1224 at 5/15/08 9:07 AM: ---

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597174#action_12597174 ] Otis Gospodnetic commented on LUCENE-1224: -- Hiroaki: I agree with Grant about uni

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597161#action_12597161 ] Grant Ingersoll commented on LUCENE-1224: - I think the right way is simply to

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597156#action_12597156 ] Hiroaki Kawai commented on LUCENE-1224: --- About test code: I'm not going to say that

[jira] Updated: (LUCENE-1285) WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

2008-05-15 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1285: Attachment: highlighter-test.patch Test that exposes the problem. The posted patch makes the test

[jira] Commented: (LUCENE-1285) WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

2008-05-15 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597130#action_12597130 ] Mark Miller commented on LUCENE-1285: - Nice catch and the fix looks great. Thanks And

[jira] Updated: (LUCENE-1285) WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

2008-05-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1285: -- Attachment: highlighter.patch A patch to fix the issue. > WeightedSpanTermExtractor i

Re: [jira] Created: (LUCENE-1285) WeightedSpanTermExtractor doesn'

2008-05-15 Thread Andrzej Bialecki
Andrzej Bialecki (JIRA) wrote: WeightedSpanTermExtractor doesn' Key: LUCENE-1285 URL: https://issues.apache.org/jira/browse/LUCENE-1285 Project: Lucene - Java Issue Type: Bug Components: contrib/h

[jira] Updated: (LUCENE-1285) WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

2008-05-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1285: -- Description: Given a BooleanQuery with multiple clauses, if a term occurs both in a Sp

[jira] Created: (LUCENE-1285) WeightedSpanTermExtractor doesn'

2008-05-15 Thread Andrzej Bialecki (JIRA)
WeightedSpanTermExtractor doesn' Key: LUCENE-1285 URL: https://issues.apache.org/jira/browse/LUCENE-1285 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions:

NGrams and positions

2008-05-15 Thread Grant Ingersoll
See https://issues.apache.org/jira/browse/LUCENE-1224 Do people have an opinion on what positions ngrams should be output at? For instance, given 1-grams on "abc fgh", these are currently output as: a, b, c, f, g,h all with a position increment of 1. That seems somewhat reasonable, but it

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597112#action_12597112 ] DM Smith commented on LUCENE-1224: -- My take as a user: Maybe, I don't understand the app

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597107#action_12597107 ] Grant Ingersoll commented on LUCENE-1224: - {quote}Umm..., if you don't like indexi

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597098#action_12597098 ] Hiroaki Kawai commented on LUCENE-1224: --- In my understanding, -- The sequenc

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597094#action_12597094 ] Hiroaki Kawai commented on LUCENE-1224: --- Umm..., if you don't like indexing and quer

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597080#action_12597080 ] Grant Ingersoll commented on LUCENE-1224: - FWIW, I also think we should address th

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597079#action_12597079 ] Grant Ingersoll commented on LUCENE-1224: - OK, let me change the comment. You can

[jira] Commented: (LUCENE-1282) Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene

2008-05-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597078#action_12597078 ] Michael McCandless commented on LUCENE-1282: OK: running with -client prevents

[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-05-15 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597069#action_12597069 ] Hiroaki Kawai commented on LUCENE-1224: --- Q: Why it is necessary to index A: Because

[jira] Commented: (LUCENE-1282) Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene

2008-05-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597052#action_12597052 ] Michael McCandless commented on LUCENE-1282: bq. In my 100% reproducible case

[jira] Commented: (LUCENE-1282) Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene

2008-05-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597048#action_12597048 ] Michael McCandless commented on LUCENE-1282: {quote} One of the classic proble

[jira] Commented: (LUCENE-1282) Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene

2008-05-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597042#action_12597042 ] Michael McCandless commented on LUCENE-1282: bq. Has it been reported to Sun y

[jira] Commented: (LUCENE-1216) CharDelimiterTokenizer

2008-05-15 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597036#action_12597036 ] Hiroaki Kawai commented on LUCENE-1216: --- The reason of setWhitespaceDelimiter(): Of