[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708479#comment-13708479 ] Michael McCandless commented on LUCENE-4845: bq. I guess, there should be an AnalyzingInfixLookupFactory in Solr as well? I agree ... but this can be done separately. Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708484#comment-13708484 ] Shai Erera commented on LUCENE-4845: Mike, will you still commit it to 4.4? I think that the branch was created prematurely as there's still no resolution on whether to release or not. And this feature is pretty isolated to cause any instability ... it'd be a petty to have to wait with releasing it another 3-4 months just because of technicalities... Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708618#comment-13708618 ] Michael McCandless commented on LUCENE-4845: bq. Mike, will you still commit it to 4.4? OK I'll commit shortly backport to 4.4 branch... Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708640#comment-13708640 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503340 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1503340 ] LUCENE-4845: add AnalyzingInfixSuggester Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708674#comment-13708674 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503356 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503356 ] LUCENE-4845: add AnalyzingInfixSuggester Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708682#comment-13708682 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503359 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503359 ] LUCENE-4845: add AnalyzingInfixSuggester Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708937#comment-13708937 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503459 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503459 ] LUCENE-4845: close tmp directory; fix test to catch un-closed files; add missing suggester.close() Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708939#comment-13708939 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503460 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503460 ] LUCENE-4845: close tmp directory; fix test to catch un-closed files; add missing suggester.close() Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708934#comment-13708934 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503458 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1503458 ] LUCENE-4845: close tmp directory; fix test to catch un-closed files; add missing suggester.close() Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708989#comment-13708989 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503477 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1503477 ] LUCENE-4845: Maven and IntelliJ config (merged trunk r1503476) Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708987#comment-13708987 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503476 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1503476 ] LUCENE-4845: Maven and IntelliJ config Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708992#comment-13708992 ] ASF subversion and git services commented on LUCENE-4845: - Commit 1503478 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_4' [ https://svn.apache.org/r1503478 ] LUCENE-4845: Maven and IntelliJ config (merged trunk r1503476) Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701814#comment-13701814 ] Artem Lukanin commented on LUCENE-4845: --- I guess, there should be an AnalyzingInfixLookupFactory in Solr as well? Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697170#comment-13697170 ] Michael McCandless commented on LUCENE-4845: I think the last patch is ready ... I'll commit this soon if there are no objections! Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667207#comment-13667207 ] Shai Erera commented on LUCENE-4845: Oops, I hit something on the keyboard while reading the issue and it just assigned it to me :). Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607539#comment-13607539 ] Michael McCandless commented on LUCENE-4845: bq. I think its because your FreeDB has a lot more words than my place names? I think so. Song titles are longer than place names :) bq. But really there must be a infixing limit for relevance reasons alone. I think the app can decide this. bq. Why is it so bad, but the edge-ngrams limit ok? I don't think either limit is OK! In the ideal world we wouldn't require such limits due to performance/RAM issues. But no suggester is perfect, this is why we offer multiple options. These two approaches have different tradeoffs... Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607552#comment-13607552 ] Robert Muir commented on LUCENE-4845: - {quote} I don't think either limit is OK! In the ideal world we wouldn't require such limits due to performance/RAM issues. {quote} You still misunderstand me. I dont want the limit for performance/RAM reasons. I want it for relevance reasons. It just also gives better performance and memory for free. this is a really simple thing to do mike. Its a win/win On the other hand your edge-ngrams limit is completely different. When exceeded, it causes that suggester to work in linear time! Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606405#comment-13606405 ] Robert Muir commented on LUCENE-4845: - This seems to not blow up for title-like fields: I did a quick test of geonames (8.3M place names, just using ID as the weight) {noformat} AnalyzingSuggester: 117444563 bytes, 74887ms build time InfixingSuggester: 302127665 bytes, 125895ms build time {noformat} I think realistically an N limit can work well here. After such a limit, the infixing is pretty crazy anyway, and really infixing should punish the weight in some way since its a very scary edit operation to do to the user. Plus you get optional fuzziness and real phrasing works too :) Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607085#comment-13607085 ] Michael McCandless commented on LUCENE-4845: I like this approach! (Add epsilon transitions after the automaton is built). I managed to build the FreeDB suggest using this but ... it required a lot of RAM: it OOM'd at 14 GB heap but finished successfully at 20 GB heap. Took a longish time to build too, and made a biggish FST (more than 2X larger than the index): * 2466 sec to build * FST is 8.6 GB * Prefix 2: 2527.5 lookups/sec * Prefix 4: 1681.7 lookups/sec * Prefix 6: 1948.3 lookups/sec * Prefix 8: 2050.9 lookups/sec * Prefix 10: 2076.0 lookups/sec We should try the N prefix limit ... but I don't really like that. Maybe we should just offer both approaches ... Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607120#comment-13607120 ] Robert Muir commented on LUCENE-4845: - {quote} I managed to build the FreeDB suggest using this but ... it required a lot of RAM: it OOM'd at 14 GB heap but finished successfully at 20 GB heap. Took a longish time to build too, and made a biggish FST (more than 2X larger than the index): {quote} I think its because your FreeDB has a lot more words than my place names? But really there must be a infixing limit for relevance reasons alone. {quote} We should try the N prefix limit ... but I don't really like that. Maybe we should just offer both approaches ... {quote} Why is it so bad, but the edge-ngrams limit ok? Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605769#comment-13605769 ] Michael McCandless commented on LUCENE-4845: bq. Wouldnt the straightforward impl be to put the suffixes of the suggestions into the FST? I think so ... but then I worry about the FST blowing up. I guess if we limit how deep the infixing can work that would limit the FST size ... but I'd rather not have that limit. We should definitely try it ... it should be a lot faster. I wonder how we could get highlighting working with AnalyzingSuggester. Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605868#comment-13605868 ] Robert Muir commented on LUCENE-4845: - {quote} I think so ... but then I worry about the FST blowing up. I guess if we limit how deep the infixing can work that would limit the FST size ... but I'd rather not have that limit. {quote} But how is this any different than edge-ngrams up to a limit? With words of = 4 chars, this suggester avoids the typical bad complexity you would get from an inverted index because the docids are pre-sorted in weight-order, so it can early terminate. But as soon as you type that 5th character: it can blow up. I'm not saying its likely, but can happen due to particulars of the content, for example if you had place names and you typed Shangh... and this prefix matches millions and millions of terms. Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606089#comment-13606089 ] Robert Muir commented on LUCENE-4845: - And this one is FuzzyAnalyzingInfixSuggester so you have to top that with your perf tests :) Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604774#comment-13604774 ] Michael McCandless commented on LUCENE-4845: This is an example of the infix suggestions: !infixSuggest.png! Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604777#comment-13604777 ] Robert Muir commented on LUCENE-4845: - Wouldnt the straightforward impl be to put the suffixes of the suggestions into the FST? so for this is a test you also add is a test, a test, ... I feel like this could be done with just a tokenfilter used only at build-time + analyzingsuggester, and would be more performant. Add AnalyzingInfixSuggester --- Key: LUCENE-4845 URL: https://issues.apache.org/jira/browse/LUCENE-4845 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: infixSuggest.png, LUCENE-4845.patch Our current suggester impls do prefix matching of the incoming text against all compiled suggestions, but in some cases it's useful to allow infix matching. E.g, Netflix does infix suggestions in their search box. I did a straightforward impl, just using a normal Lucene index, and using PostingsHighlighter to highlight matching tokens in the suggestions. I think this likely only works well when your suggestions have a strong prior ranking (weight input to build), eg Netflix knows the popularity of movies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org