[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708479#comment-13708479
 ] 

Michael McCandless commented on LUCENE-4845:


bq. I guess, there should be an AnalyzingInfixLookupFactory in Solr as well?

I agree ... but this can be done separately.

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708484#comment-13708484
 ] 

Shai Erera commented on LUCENE-4845:


Mike, will you still commit it to 4.4? I think that the branch was created 
prematurely as there's still no resolution on whether to release or not. And 
this feature is pretty isolated to cause any instability ... it'd be a petty to 
have to wait with releasing it another 3-4 months just because of 
technicalities...

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708618#comment-13708618
 ] 

Michael McCandless commented on LUCENE-4845:


bq. Mike, will you still commit it to 4.4?

OK I'll commit shortly  backport to 4.4 branch...

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708640#comment-13708640
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503340 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1503340 ]

LUCENE-4845: add AnalyzingInfixSuggester

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708674#comment-13708674
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503356 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503356 ]

LUCENE-4845: add AnalyzingInfixSuggester

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708682#comment-13708682
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503359 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503359 ]

LUCENE-4845: add AnalyzingInfixSuggester

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708937#comment-13708937
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503459 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503459 ]

LUCENE-4845: close tmp directory; fix test to catch un-closed files; add 
missing suggester.close()

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708939#comment-13708939
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503460 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503460 ]

LUCENE-4845: close tmp directory; fix test to catch un-closed files; add 
missing suggester.close()

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708934#comment-13708934
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503458 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1503458 ]

LUCENE-4845: close tmp directory; fix test to catch un-closed files; add 
missing suggester.close()

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708989#comment-13708989
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503477 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1503477 ]

LUCENE-4845: Maven and IntelliJ config (merged trunk r1503476)

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708987#comment-13708987
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503476 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1503476 ]

LUCENE-4845: Maven and IntelliJ config

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708992#comment-13708992
 ] 

ASF subversion and git services commented on LUCENE-4845:
-

Commit 1503478 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1503478 ]

LUCENE-4845: Maven and IntelliJ config (merged trunk r1503476)

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-08 Thread Artem Lukanin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701814#comment-13701814
 ] 

Artem Lukanin commented on LUCENE-4845:
---

I guess, there should be an AnalyzingInfixLookupFactory in Solr as well?

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697170#comment-13697170
 ] 

Michael McCandless commented on LUCENE-4845:


I think the last patch is ready ... I'll commit this soon if there are no 
objections!

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-05-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667207#comment-13667207
 ] 

Shai Erera commented on LUCENE-4845:


Oops, I hit something on the keyboard while reading the issue and it just 
assigned it to me :).

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607539#comment-13607539
 ] 

Michael McCandless commented on LUCENE-4845:


bq. I think its because your FreeDB has a lot more words than my place names?

I think so.  Song titles are longer than place names :)

bq. But really there must be a infixing limit for relevance reasons alone.

I think the app can decide this.

bq. Why is it so bad, but the edge-ngrams limit ok?

I don't think either limit is OK!  In the ideal world we wouldn't require such 
limits due to performance/RAM issues.

But no suggester is perfect, this is why we offer multiple options.  These two 
approaches have different tradeoffs...

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607552#comment-13607552
 ] 

Robert Muir commented on LUCENE-4845:
-

{quote}
I don't think either limit is OK! In the ideal world we wouldn't require such 
limits due to performance/RAM issues.
{quote}

You still misunderstand me. I dont want the limit for performance/RAM reasons. 
I want it for relevance reasons. It
just also gives better performance and memory for free. this is a really simple 
thing to do mike. Its a win/win

On the other hand your edge-ngrams limit is completely different. When 
exceeded, it causes that suggester to work
in linear time!

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606405#comment-13606405
 ] 

Robert Muir commented on LUCENE-4845:
-

This seems to not blow up for title-like fields:
I did a quick test of geonames (8.3M place names, just using ID as the weight)

{noformat}
AnalyzingSuggester: 117444563 bytes, 74887ms build time
InfixingSuggester: 302127665 bytes, 125895ms build time
{noformat}

I think realistically an N limit can work well here. After such a limit, the 
infixing is
pretty crazy anyway, and really infixing should punish the weight in some way 
since its
a very scary edit operation to do to the user.

Plus you get optional fuzziness and real phrasing works too :)

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607085#comment-13607085
 ] 

Michael McCandless commented on LUCENE-4845:


I like this approach!  (Add epsilon transitions after the automaton is built).

I managed to build the FreeDB suggest using this but ... it required a lot of 
RAM: it OOM'd at 14 GB heap but finished successfully at 20 GB heap.

Took a longish time to build too, and made a biggish FST (more than 2X larger 
than the index):

  * 2466 sec to build
  * FST is 8.6 GB
  * Prefix 2: 2527.5 lookups/sec
  * Prefix 4: 1681.7 lookups/sec
  * Prefix 6: 1948.3 lookups/sec
  * Prefix 8: 2050.9 lookups/sec
  * Prefix 10: 2076.0 lookups/sec

We should try the N prefix limit ... but I don't really like that.  Maybe we 
should just offer both approaches ...

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607120#comment-13607120
 ] 

Robert Muir commented on LUCENE-4845:
-

{quote}
I managed to build the FreeDB suggest using this but ... it required a lot of 
RAM: it OOM'd at 14 GB heap but finished successfully at 20 GB heap.

Took a longish time to build too, and made a biggish FST (more than 2X larger 
than the index):
{quote}

I think its because your FreeDB has a lot more words than my place names?

But really there must be a infixing limit for relevance reasons alone.

{quote}
We should try the N prefix limit ... but I don't really like that. Maybe we 
should just offer both approaches ...
{quote}

Why is it so bad, but the edge-ngrams limit ok?


 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605769#comment-13605769
 ] 

Michael McCandless commented on LUCENE-4845:


bq. Wouldnt the straightforward impl be to put the suffixes of the suggestions 
into the FST?

I think so ... but then I worry about the FST blowing up.  I guess if we limit 
how deep the infixing can work that would limit the FST size ... but I'd 
rather not have that limit.

We should definitely try it ... it should be a lot faster.  I wonder how we 
could get highlighting working with AnalyzingSuggester.

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605868#comment-13605868
 ] 

Robert Muir commented on LUCENE-4845:
-

{quote}
I think so ... but then I worry about the FST blowing up. I guess if we limit 
how deep the infixing can work that would limit the FST size ... but I'd 
rather not have that limit.
{quote}

But how is this any different than edge-ngrams up to a limit?

With words of = 4 chars, this suggester avoids the typical bad complexity you 
would get from an inverted index because the docids are pre-sorted in 
weight-order, so it can early terminate.

But as soon as you type that 5th character: it can blow up. I'm not saying its 
likely, but can happen due to particulars of the content, for example if you 
had place names and you typed Shangh... and this prefix matches millions and 
millions of terms.


 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606089#comment-13606089
 ] 

Robert Muir commented on LUCENE-4845:
-

And this one is FuzzyAnalyzingInfixSuggester so you have to top that with your 
perf tests :)

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
 LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604774#comment-13604774
 ] 

Michael McCandless commented on LUCENE-4845:


This is an example of the infix suggestions: !infixSuggest.png!


 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604777#comment-13604777
 ] 

Robert Muir commented on LUCENE-4845:
-

Wouldnt the straightforward impl be to put the suffixes of the suggestions into 
the FST?

so for this is a test 
you also add is a test, a test, ...

I feel like this could be done with just a tokenfilter used only at build-time 
+ analyzingsuggester, and would be more performant.

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org