[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12610691#action_12610691 ] Otis Gospodnetic commented on SOLR-572: --- Here are 2 more bugs: 1) Search for: united states of America Suggests: united states oft America It looks like the SC doesn't check stopwords, and of is a stopword. Thus, it does not exist in the index, but oft does, so SC suggests oft and thinks of is misspelled. I think the SC component should check the list of stopwords, too, no? 2) Search for: united states of America Suggests: united states oftAmericaa The of-oft is described above. But note how SC suggested America-Americaa, but it didn't do that for america. This looks like case-sensitivity problem. Shouldn't the SC be case-insensitive? I can't produce a patch now (no src handy), so I'm hoping Grant or somebody else can do it based on this report. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: solr-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch http://wiki.apache.org/solr/SpellCheckComponent Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12610276#action_12610276 ] Grant Ingersoll commented on SOLR-572: -- Hi Bojan, Thanks for the patch. I think it would be best to open a new issue for it. However, I'm not sure what is going on here. When I look at the Lucene code, it has this: {code} final int freq = (ir != null field != null) ? ir.docFreq(new Term(field, word)) : 0; final int goalFreq = (morePopular ir != null field != null) ? freq : 0; // if the word exists in the real index and we don't care for word frequency, return the word itself if (!morePopular freq 0) { return new String[] { word }; } {code} The comment says it all, so maybe we have something else going on wrong. At a minimum, your patch at least needs to account for when you want to get more popular suggestions even if the word exists. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: solr-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch http://wiki.apache.org/solr/SpellCheckComponent Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608487#action_12608487 ] Geoffrey Young commented on SOLR-572: - I'm seeing random weirdness in the collation results. the same query shift-refreshed sometimes yields (in json) {noformat} { responseHeader:{ params:{ spellcheck:true, q:redbull air show, qf:search-en, spellcheck.collate:true, qt:dismax, wt:json, rows:0}}, response:{numFound:0,start:0,docs:[] }, spellcheck:{ suggestions:[ redbull,[ numFound,1, startOffset,0, endOffset,7, suggestion,[redbelly]], show,[ numFound,1, startOffset,12, endOffset,16, suggestion,[shot]], collation,redbelly airshotw]}} {noformat} note the collation spacing and extraneous 'w'. a refresh toggles between that and what you might expect : {noformat} collation,redbelly air shot] {noformat} --Geoff Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch http://wiki.apache.org/solr/SpellCheckComponent Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608519#action_12608519 ] Grant Ingersoll commented on SOLR-572: -- Can you open a new issue to track this? Looks like a string replace issue on the offsets. We probably should do the collation a bit differently to make sure the words fit right. We'll probably have to right pad or something like that. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch http://wiki.apache.org/solr/SpellCheckComponent Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608527#action_12608527 ] Sean Timm commented on SOLR-572: For what it is worth, here is the code that I used client side before the collation feature was available. I haven't looked at how it is done in this patch. It has some nice features such as delimiting the spelling correction, e.g., with HTML bold tags, and preserving the users initial case on each word. {code} StringBuilder buff = new StringBuilder(); StringBuilder rawBuff = new StringBuilder(); int last = 0; String userStr = null; // for each suggestion for( Suggestion s : suggestions ) { // add part before the mispelling userStr = userQuery.substring( last, s.startOffset ); buff.append( userStr ); rawBuff.append( userStr ); String suggestion = s.suggestion; if( _spellCheckPreserveUserCase ) { userStr = userQuery.substring( s.startOffset, s.endOffset ); char[] userCh = userStr.toCharArray(); boolean initialUpper = Character.isUpperCase( userCh[0] ); boolean allUpper = true; for( char c : userCh ) { if( Character.isLowerCase( c ) ) { allUpper = false; break; } } if( allUpper ) { suggestion = suggestion.toUpperCase(); } else if( initialUpper ) { userCh = suggestion.toCharArray(); userCh[0] = Character.toUpperCase( userCh[0] ); suggestion = new String( userCh ); } } buff.append( _spellCheckStartHighlight ).append( suggestion ) .append( _spellCheckEndHighlight ); rawBuff.append( suggestion ); last = s.endOffset; } // add part after all mispellings userStr = userQuery.substring( last ); buff.append( userStr ); rawBuff.append( userStr ); if( log().isDebugEnabled() ) { log().debug( Did you mean: + buff ); log().debug( Did you mean link: + rawBuff ); } {code} Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch http://wiki.apache.org/solr/SpellCheckComponent Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12606719#action_12606719 ] Grant Ingersoll commented on SOLR-572: -- Because of the stupid way it gets initialized as a NamedListInitializerWhateverWhatever. I'm open to alternate suggestions on how to do it and take advantage of the resource loader, etc. Every time I go to do initialization stuff in Solr these days I pine for Spring, since we are basically re-inventing it, albeit not as nicely. -Grant -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch http://wiki.apache.org/solr/SpellCheckComponent Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12606634#action_12606634 ] Noble Paul commented on SOLR-572: - Why do we need to add the queryConverter definition outside of the speallcheck search component? Is it going to be used by any other component other than this? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch http://wiki.apache.org/solr/SpellCheckComponent Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605635#action_12605635 ] Shalin Shekhar Mangar commented on SOLR-572: A few questions/comments: # Why is a WhiteSpaceTokenizer being used for tokenizing the value for a spellcheck.q parameter? Wouldn't it be more correct to use the query analyzer if the index is being built from a Solr field? # The above argument also applies to queryAnalyzerFieldType which is being used for QueryConverter. # I see that we can specify our own query converter through the queryConverter section in solrconfig.xml. But the SpellCheckComponent uses SpellingQueryConverter directly instead of an interface. We should add a QueryConvertor interface if this needs to be pluggable. # If name is omitted from two dictionaries in solrconfig.xml then both get named as Default from the SolrSpellChecker#init method and they overwrite each other in the spellCheckers map # How about building the index in the inform() method? I understand that the users can build the index using spellcheck.build=true and they can also use QuerySenderListener to build the index but this limits the user to use FSDirectory because if we use RAMDirectory and solr is restarted, the QuerySenderListener never fires and spell checker is left with no index. It's not a major inconvenience to use FSDirectory always but then RAMDirectory doesn't bring much to the table. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605645#action_12605645 ] Grant Ingersoll commented on SOLR-572: -- {quote} Why is a WhiteSpaceTokenizer being used for tokenizing the value for a spellcheck.q parameter? Wouldn't it be more correct to use the query analyzer if the index is being built from a Solr field? The above argument also applies to queryAnalyzerFieldType which is being used for QueryConverter {quote} My understanding was that the sc.q parameter was already analyzed and ready to be checked, thus all it needed was a conversion to tokens. As for the queryAnalyzerFieldType, that assumes the implementation is the IndexBasedSpellChecker or some other field based one that the SpellCheckComponent doesn't have access to, thus my reasoning that it needs to be handled separately and explicitly, which is why it isn't a part of the spellchecker configuration. {quote} I see that we can specify our own query converter through the queryConverter section in solrconfig.xml. But the SpellCheckComponent uses SpellingQueryConverter directly instead of an interface. We should add a QueryConvertor interface if this needs to be pluggable. {quote} I thought about making it an abstract base class, but in my mind it is really easy to override the SpellingQueryConverter and the component should know how to deal with it. {quote} If name is omitted from two dictionaries in solrconfig.xml then both get named as Default from the SolrSpellChecker#init method and they overwrite each other in the spellCheckers map {quote} Hmm, not good. I will fix. {quote} How about building the index in the inform() method? I understand that the users can build the index using spellcheck.build=true and they can also use QuerySenderListener to build the index but this limits the user to use FSDirectory because if we use RAMDirectory and solr is restarted, the QuerySenderListener never fires and spell checker is left with no index. It's not a major inconvenience to use FSDirectory always but then RAMDirectory doesn't bring much to the table. {quote} I think this gets back to our early discussions about it not working in inform b/c we don't have the reader at that point, or something like that. I really don't know the right answer, but do feel free to try it out. I do think it belongs in inform, but not sure if Solr is ready at that point. As for the QuerySenderListener, seems like it should fire if it is restarted, but I admit I don't know a whole lot about that functionality. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605293#action_12605293 ] Grant Ingersoll commented on SOLR-572: -- OK, I'd like to commit this tomorrow or Wednesday. I am going to open another issue to bring in LUCENE-1297 to the configuration Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605367#action_12605367 ] Yonik Seeley commented on SOLR-572: --- For those who are just casually following this issue, is there a good summary of current input options and example output? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
Grant created a wiki page at http://wiki.apache.org/solr/SpellCheckComponentwhich has some documentation on the configuration. I'll try to add more documentation when I try this out tomorrow. On Tue, Jun 17, 2008 at 12:03 AM, Yonik Seeley (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605367#action_12605367] Yonik Seeley commented on SOLR-572: --- For those who are just casually following this issue, is there a good summary of current input options and example output? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604818#action_12604818 ] Grant Ingersoll commented on SOLR-572: -- {quote} the spell checker component handling build/reload seems highly awkward to me. suggestion component really should just do that... and wrap the other operations as a /spellchecker/rebuild kinda thing and not even necessarily componentize those operations since they don't really necessarily need to be hooked together with other operations as a single request. {quote} I've thought about a bit, too, as it bothers me, too, but I think the initialization, etc. gets a bit tricky, like all Solr initialization. Not sure what to do. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604830#action_12604830 ] Grant Ingersoll commented on SOLR-572: -- Sean, I see the issue and am working on it. Good catch. I'll have a patch shortly. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604617#action_12604617 ] Sean Timm commented on SOLR-572: It doesn't appear that you can get both extendedResults and count 1. With the below URL, I get 1 suggestion for each misspelled term regardless of the value of spellcheck.count. If I set spellcheck.extendedResults=false, then I get the requested three suggestions for each term. {noformat} /solr/spellCheckCompRH/?q=waz+designatd+two+bee+Arvil+25+bye+Pres.+it+wazversion=2.2start=0rows=2indent=onspellcheck=truefl=title,url,id,categories,scorehl=onhl.fl=bodyqt=dismaxspellcheck.extendedResults=truespellcheck.count=3 {noformat} Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604639#action_12604639 ] Erik Hatcher commented on SOLR-572: --- the spell checker component handling build/reload seems highly awkward to me. suggestion component really should just do that... and wrap the other operations as a /spellchecker/rebuild kinda thing and not even necessarily componentize those operations since they don't really necessarily need to be hooked together with other operations as a single request. anyway, just the overloading of a component to do managerial operations seems awkward. food for thought. not a -1 kinda thing though. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604052#action_12604052 ] Grant Ingersoll commented on SOLR-572: -- {quote} I don't think it should give me that suggestion. If a word is in the dictionary it should not give any suggestions. Am I right? {quote} Possibly. I think it should give a better suggestion if one exists (i.e. more frequent) but otherwise, yes, it shouldn't give any suggestion. For your example, I would argree that it should not return a suggestion (assuming golf is in the dictionary). For example, the index could contain the words gilf and golf, with gilf having a freq. of 1 and golf having a freq of 10. If the user enters gilf, I think it is reasonable to assume that the suggestion should be golf, even though gilf exists. Not saying this is supported yet, or anything, but just laying out the case. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604056#action_12604056 ] Oleg Gnatovskiy commented on SOLR-572: -- I think that lower frequency suggestions should be optional. Some users might only want to offer suggestions for misspelled words (words not in the dictionary). Would it be hard to check if the query term exists in the dictionary before returning a suggestion? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604062#action_12604062 ] Otis Gospodnetic commented on SOLR-572: --- I think the frequency awareness may be interesting. What happens if gilf has a frequency of 95K and golf a freq of 100K? Do we need this to become a SCRH config setting expressed as a percentage? (e.g. Show alternative word suggestions even if the input word exists in the index iff freq(input word)/freq(suggested word)*100 N%?) Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603070#action_12603070 ] Oleg Gnatovskiy commented on SOLR-572: -- Do these latest patches require Lucene 2.4? Would it be better to stay with 2.3.1? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603075#action_12603075 ] Grant Ingersoll commented on SOLR-572: -- {quote} Do these latest patches require Lucene 2.4? Would it be better to stay with 2.3.1? {quote} They require what is checked into Solr's lib directory, which is Lucene's trunk as of yesterday. There are actually a few changes in Lucene's spell checker that I think are worth having in 2.4. Additionally, I think we will want LUCENE-1297 before we are through, which is probably another configuration item. However, that can be added later, unless Otis commits it fairly soon. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603220#action_12603220 ] Swarag Segu commented on SOLR-572: -- Hey guys. Installed the latest patch. Old problem is still there. For example if I do q=piza I get: lst name=spellcheck lst name=suggestions lst name=pizzza int name=numFound1/int int name=startOffset0/int int name=endOffset6/int arr name=suggestion strpizza/str /arr /lst /lst /lst Which is good. Then I do q=golf (golf is in the dictionary) lst name=spellcheck lst name=spellcheck − lst name=suggestions − lst name=golf int name=numFound1/int int name=startOffset0/int int name=endOffset4/int − arr name=suggestion strroof/str /arr /lst /lst /lst I don't think it should give me that suggestion. If a word is in the dictionary it should not give any suggestions. Am I right? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602645#action_12602645 ] Grant Ingersoll commented on SOLR-572: -- One thing I haven't quite settled in my mind is the use of the File based spell checker. It seems to me, that the use case for this is as an override where one feels the index based spelling is not correct. Is that right? Or am I missing something? If it is the case, shouldn't we allow the option, at least, of it truly acting as an override? Currently, the only way to get at it is by passing the dictionary name as the param. The only way I can see this as useful is if you are making several round trips to the server, which means you might as well be using a request handler and not a search component. Thoughts? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602651#action_12602651 ] Bojan Smid commented on SOLR-572: - File based spell checker would probably be used in cases when Solr index is too small or too young. So a user would compile a dictionary file (for instance, UNIX words file) and use it as a dictionary. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602654#action_12602654 ] Grant Ingersoll commented on SOLR-572: -- {quote} File based spell checker would probably be used in cases when Solr index is too small or too young. So a user would compile a dictionary file (for instance, UNIX words file) and use it as a dictionary. {quote} But how is it useful to return results that aren't in the index? It's not like querying on them results in anything useful. Seems to me, that in this case, you just need to rebuild your dictionary on a regular basis. Or is it that people are using Solr as a spelling server? Now, I can see it as an override situation. i.e. one wishes to override certain results from the index based one with ones that are in known to be in the dictionary, but are lower down. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602687#action_12602687 ] Grant Ingersoll commented on SOLR-572: -- Oleg, Can you try specifying a field value anyway for your bug up above? I think this is actually a bug in the Lucene Spell checker. Namely, the docs say that the field value can be null, but, it is trying to construct a Term, which requires a non-null field name. Just give it the name word, perhaps Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602706#action_12602706 ] Shalin Shekhar Mangar commented on SOLR-572: Grant -- The exception is happening because the SpellCheckComponent always passes Solr's own IndexReader when calling the AbstractLuceneSpellChecker#getSuggestions method even when the underlying spell checker is a FileBasedSpellChecker. In that case, since a non-null IndexReader is passed onto Lucene, it tries to create a term on the null field name. That is when the NullPointerException comes up. Another problem will occur when using IndexBasedSpellChecker with an arbitary Lucene index, because then too, the Solr's IndexReader would be passed to Lucene SpellChecker instead of the actual index's reader. I think a possible solution can be to add another abstract method with the same signature as Lucene's SpellChecker to the AbstractLuceneSpellChecker and let each sub-class get suggestions on it's own. That way FileBasedSpellChecker will pass the correct IndexReader or a null IndexReader into Lucene appropriately. The AbstractLuceneSpellChecker#getSuggestion will just call the underlying suggest method, get the String[] back and process as it does right now. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602712#action_12602712 ] Otis Gospodnetic commented on SOLR-572: --- By collate you mean that the SCRH would not only return suggestions/corrections for individual token, but it would also try to glue together an already corrected query string based on its suggestions? Example: Query: cogito ega sum SCRH returns this correction: erga - ergo But also tries to give you the whole thing corrected: cogito ergo sum That? Sounds useful - less work for the client app, should the app developers decide that SCRH's collated suggestions are what they would have to do themselves anyway. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602743#action_12602743 ] Oleg Gnatovskiy commented on SOLR-572: -- Hey guys. Installed the latest patch. Old problem is still there. For example if I do q=piza I get: lst name=spellcheck lst name=suggestions lst name=pizzza int name=numFound1/int int name=startOffset0/int int name=endOffset6/int arr name=suggestion strpizza/str /arr /lst /lst /lst Which is good. Then I do q=pizza (pizza is in the dictionary) lst name=spellcheck lst name=suggestions lst name=pizza int name=numFound1/int int name=startOffset0/int int name=endOffset5/int arr name=suggestion strplaza/str /arr /lst /lst /lst I don't think it should give me that suggestion. If a word is in the dictionary it should not give any suggestions. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602745#action_12602745 ] Grant Ingersoll commented on SOLR-572: -- {quote} Grant - The exception is happening because the SpellCheckComponent always passes Solr's own IndexReader when calling the AbstractLuceneSpellChecker#getSuggestions method even when the underlying spell checker is a FileBasedSpellChecker. In that case, since a non-null IndexReader is passed onto Lucene, it tries to create a term on the null field name. That is when the NullPointerException comes up. {quote} Yep, I think I fixed this piece. See also LUCENE-1299 {quote} I think a possible solution can be to add another abstract method with the same signature as Lucene's SpellChecker to the AbstractLuceneSpellChecker and let each sub-class get suggestions on it's own. That way FileBasedSpellChecker will pass the correct IndexReader or a null IndexReader into Lucene appropriately. The AbstractLuceneSpellChecker#getSuggestion will just call the underlying suggest method, get the String[] back and process as it does right now. {quote} Not sure I follow the solution (I understand the problem) Which signature are you suggesting? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602828#action_12602828 ] Mike Klaas commented on SOLR-572: - [quote]Another use case is where Solr is used with indices that are not indices for a narrow domain or that don't have nice, clean, short fields that can be used for populating the SC index. For example, if the index consists of a pile of web pages, I don't think I'd want to use their data (not even their titles) to populate the SC index. I'd really want just a plain dictionary-powered SCRH.[/quote] It works great, actually. That was you get all the abbreviations, jargon, proper names, etc. Thresholding help prevent most of the cruft from appearing in the index. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602856#action_12602856 ] Swarag Segu commented on SOLR-572: -- Hey Guys, I installed the latest patch and it gives me compile errors : compile: [mkdir] Created dir: C:\Documents and Settings\Swarag Segu\workspace\solrSrc\build\core [javac] Compiling 324 source files to C:\Documents and Settings\Swarag Segu\workspace\solrSrc\build\core [javac] C:\Documents and Settings\Swarag Segu\workspace\solrSrc\src\java\org\apache\solr\spelling\FileBasedSpellChecker.java:97: cannot find symbol [javac] symbol : variable MaxFieldLength [javac] location: class org.apache.lucene.index.IndexWriter [javac] true, IndexWriter.MaxFieldLength.UNLIMITED); [javac] ^ [javac] C:\Documents and Settings\Swarag Segu\workspace\solrSrc\src\java\org\apache\solr\spelling\FileBasedSpellChecker.java:96: internal error; cannot instantiate org.apache.lucene.index.IndexWriter.init at org.apache.lucene.index.IndexWriter to () [javac] IndexWriter writer = new IndexWriter(ramDir, fieldType.getAnalyzer(), [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors Am I missing something? Thanks, Swarag. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
There are working patches available on the issue without the advanced features and everyone is free to fix the current one. It's not like it is that far off from being able to have proper spellchecking, pluggability, and context information about where the mistakes are. I frankly don't get what all the fuss is about. Is it that you disagree with the approach? That hasn't come across in the discussions, but if it is, say so. I thought we were working on it quite well together and made some good progress and are pretty darn close. I don't see that I've taken away any functionality that the original patch offers, but I did change it so that it fits a broader audience, namely those who are interested in other spell checkers and those who want info about where in the query the problem occurs. Which, is what the comments suggest people are interested in and also what I am interested in for 1.3. And, I'm sorry, but I said I'd have to let it lie for a few days and then I would be back to it. Cut me some slack. I don't get paid to work on Solr full time. Is it truly that important that someone can't wait a few days for a patch on the trunk version for something they never had before? It ain't like we're talking some core bug here that has everyone broken. Besides, others are perfectly welcome to work on it in the meantime. Sorry for the rant, but I am not going to be pressured into committing a patch that I don't think is ready and one that I said I am going to be working on to see it through so that we all are happy. -Grant On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: I will be back on it tomorrow and will see this through before 1.3 with the abstractions. In other words, -1 on cutting this off prematurely. :-) Since I don't think this is the only thing holding up 1.3, let's just play it out and get it right so all of us are happy. This feature may not be holding back 1.3 release. The potential users of this issue are very much interested in a basic working version. They may be able to live without these advanced features. May be we can have another jira issue for enhancements which may/may not go into 1.3 (depending on when it happens). -Grant On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote: The current patch has been broken for some days now and implementing a correct query parsing logic may take time to get right. Let's not aim for everything to get into the 1.3 release. I would like to cut down the scope of this issue to a implementation that indexes files and Lucene indices (both Solr and arbitary) and gives suggestions while using the correct analyzer for multi-word queries. Let's get a spell checker working and commit it. We can deal with more enhancements like abstractions for custom spellcheckers and query parsing etc. in another issue which can be dealt with separately (in 1.3 or after). Thoughts? If there is a general consensus, I can give a new patch which can be good enough to go in. On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256 #action_12601256] Oleg Gnatovskiy commented on SOLR-572: -- I installed the latest patch. Still getting a NPE. Here is my config: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=defaults !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/ str str name=nameexternal/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=fieldTypetext_ws/str str name=indexDir/usr/local/apache/lucene/solr2home/solr/data/ spellIndex/str /lst /searchComponent Here is the URL I am hitting: http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true Here is the error: HTTP Status 500 - null java.lang.NullPointerException at org.apache.lucene.index.Term.init(Term.java:39) at org.apache.lucene.index.Term.init(Term.java:36) at org .apache .lucene .search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228) at org .apache .solr .spelling .AbstractLuceneSpellChecker .getSuggestions(AbstractLuceneSpellChecker.java:71) at org .apache .solr .handler .component.SpellCheckComponent.process(SpellCheckComponent.java: 177) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:153) at org
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
Hi Grant, I did not intend to offend you or put pressure on you in any way. Please accept my apologies if I came off as rude. In fact, I've been having a lot of fun working with you and Bojan on this issue. We've definitely covered a lot of ground very fast. I completely in favor of the goals for this piece. I was merely suggesting that with the 1.3 release being a priority, we should go one step at a time and commit per the initial scope for this issue as written in the issue's description and then handle the enhancements in another issue. But I'm all for it if you want to add extra functionality within the same issue. Once again, I'm deeply sorry if you found my comment offending in any way. Regards, Shalin On Wed, Jun 4, 2008 at 4:33 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: There are working patches available on the issue without the advanced features and everyone is free to fix the current one. It's not like it is that far off from being able to have proper spellchecking, pluggability, and context information about where the mistakes are. I frankly don't get what all the fuss is about. Is it that you disagree with the approach? That hasn't come across in the discussions, but if it is, say so. I thought we were working on it quite well together and made some good progress and are pretty darn close. I don't see that I've taken away any functionality that the original patch offers, but I did change it so that it fits a broader audience, namely those who are interested in other spell checkers and those who want info about where in the query the problem occurs. Which, is what the comments suggest people are interested in and also what I am interested in for 1.3. And, I'm sorry, but I said I'd have to let it lie for a few days and then I would be back to it. Cut me some slack. I don't get paid to work on Solr full time. Is it truly that important that someone can't wait a few days for a patch on the trunk version for something they never had before? It ain't like we're talking some core bug here that has everyone broken. Besides, others are perfectly welcome to work on it in the meantime. Sorry for the rant, but I am not going to be pressured into committing a patch that I don't think is ready and one that I said I am going to be working on to see it through so that we all are happy. -Grant On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: I will be back on it tomorrow and will see this through before 1.3 with the abstractions. In other words, -1 on cutting this off prematurely. :-) Since I don't think this is the only thing holding up 1.3, let's just play it out and get it right so all of us are happy. This feature may not be holding back 1.3 release. The potential users of this issue are very much interested in a basic working version. They may be able to live without these advanced features. May be we can have another jira issue for enhancements which may/may not go into 1.3 (depending on when it happens). -Grant On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote: The current patch has been broken for some days now and implementing a correct query parsing logic may take time to get right. Let's not aim for everything to get into the 1.3 release. I would like to cut down the scope of this issue to a implementation that indexes files and Lucene indices (both Solr and arbitary) and gives suggestions while using the correct analyzer for multi-word queries. Let's get a spell checker working and commit it. We can deal with more enhancements like abstractions for custom spellcheckers and query parsing etc. in another issue which can be dealt with separately (in 1.3 or after). Thoughts? If there is a general consensus, I can give a new patch which can be good enough to go in. On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256 #action_12601256] Oleg Gnatovskiy commented on SOLR-572: -- I installed the latest patch. Still getting a NPE. Here is my config: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=defaults !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str str name=nameexternal/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=fieldTypetext_ws/str str
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
Yeah, as an observer I sensed no bad intentions here. Anyhow, 1.3 is not scheduled yet, my guess is we are still at least a few weeks away from 1.3 (and if I had to bet I'd bet at 1.3 being released close to the end of summer). Grant is very eager about this and will get it all in. Case closed, I think. Nothing to see here, move along. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Shalin Shekhar Mangar [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Wednesday, June 4, 2008 1:32:48 PM Subject: Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component Hi Grant, I did not intend to offend you or put pressure on you in any way. Please accept my apologies if I came off as rude. In fact, I've been having a lot of fun working with you and Bojan on this issue. We've definitely covered a lot of ground very fast. I completely in favor of the goals for this piece. I was merely suggesting that with the 1.3 release being a priority, we should go one step at a time and commit per the initial scope for this issue as written in the issue's description and then handle the enhancements in another issue. But I'm all for it if you want to add extra functionality within the same issue. Once again, I'm deeply sorry if you found my comment offending in any way. Regards, Shalin On Wed, Jun 4, 2008 at 4:33 PM, Grant Ingersoll wrote: There are working patches available on the issue without the advanced features and everyone is free to fix the current one. It's not like it is that far off from being able to have proper spellchecking, pluggability, and context information about where the mistakes are. I frankly don't get what all the fuss is about. Is it that you disagree with the approach? That hasn't come across in the discussions, but if it is, say so. I thought we were working on it quite well together and made some good progress and are pretty darn close. I don't see that I've taken away any functionality that the original patch offers, but I did change it so that it fits a broader audience, namely those who are interested in other spell checkers and those who want info about where in the query the problem occurs. Which, is what the comments suggest people are interested in and also what I am interested in for 1.3. And, I'm sorry, but I said I'd have to let it lie for a few days and then I would be back to it. Cut me some slack. I don't get paid to work on Solr full time. Is it truly that important that someone can't wait a few days for a patch on the trunk version for something they never had before? It ain't like we're talking some core bug here that has everyone broken. Besides, others are perfectly welcome to work on it in the meantime. Sorry for the rant, but I am not going to be pressured into committing a patch that I don't think is ready and one that I said I am going to be working on to see it through so that we all are happy. -Grant On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll wrote: I will be back on it tomorrow and will see this through before 1.3 with the abstractions. In other words, -1 on cutting this off prematurely. :-) Since I don't think this is the only thing holding up 1.3, let's just play it out and get it right so all of us are happy. This feature may not be holding back 1.3 release. The potential users of this issue are very much interested in a basic working version. They may be able to live without these advanced features. May be we can have another jira issue for enhancements which may/may not go into 1.3 (depending on when it happens). -Grant On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote: The current patch has been broken for some days now and implementing a correct query parsing logic may take time to get right. Let's not aim for everything to get into the 1.3 release. I would like to cut down the scope of this issue to a implementation that indexes files and Lucene indices (both Solr and arbitary) and gives suggestions while using the correct analyzer for multi-word queries. Let's get a spell checker working and commit it. We can deal with more enhancements like abstractions for custom spellcheckers and query parsing etc. in another issue which can be dealt with separately (in 1.3 or after). Thoughts? If there is a general consensus, I can give a new patch which can be good enough to go in. On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256 #action_12601256] Oleg Gnatovskiy commented on SOLR-572: -- I installed
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
The current patch has been broken for some days now and implementing a correct query parsing logic may take time to get right. Let's not aim for everything to get into the 1.3 release. I would like to cut down the scope of this issue to a implementation that indexes files and Lucene indices (both Solr and arbitary) and gives suggestions while using the correct analyzer for multi-word queries. Let's get a spell checker working and commit it. We can deal with more enhancements like abstractions for custom spellcheckers and query parsing etc. in another issue which can be dealt with separately (in 1.3 or after). Thoughts? If there is a general consensus, I can give a new patch which can be good enough to go in. On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256] Oleg Gnatovskiy commented on SOLR-572: -- I installed the latest patch. Still getting a NPE. Here is my config: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=defaults !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str str name=nameexternal/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=fieldTypetext_ws/str str name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str /lst /searchComponent Here is the URL I am hitting: http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true Here is the error: HTTP Status 500 - null java.lang.NullPointerException at org.apache.lucene.index.Term.init(Term.java:39) at org.apache.lucene.index.Term.init(Term.java:36) at org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228) at org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) spelling.txt is in my solr/home/conf. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
I'm +1 on getting the basic stuff done and committed for 1.3. If Grant is hot on getting the abstractions in for 1.3, he will do so, but I think it's OK to get this done in 2 parts: 1) core working and committed for 1.3 2) abstractions working and committed after 1.3 if Grant doesn't finish them before 1.3 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Shalin Shekhar Mangar [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Tuesday, June 3, 2008 3:53:10 PM Subject: Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component The current patch has been broken for some days now and implementing a correct query parsing logic may take time to get right. Let's not aim for everything to get into the 1.3 release. I would like to cut down the scope of this issue to a implementation that indexes files and Lucene indices (both Solr and arbitary) and gives suggestions while using the correct analyzer for multi-word queries. Let's get a spell checker working and commit it. We can deal with more enhancements like abstractions for custom spellcheckers and query parsing etc. in another issue which can be dealt with separately (in 1.3 or after). Thoughts? If there is a general consensus, I can give a new patch which can be good enough to go in. On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256] Oleg Gnatovskiy commented on SOLR-572: -- I installed the latest patch. Still getting a NPE. Here is my config: class=org.apache.solr.handler.component.SpellCheckComponent false false 1 name=classnameorg.apache.solr.spelling.FileBasedSpellChecker external spellings.txt UTF-8 text_ws name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex Here is the URL I am hitting: http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true Here is the error: HTTP Status 500 - null java.lang.NullPointerException at org.apache.lucene.index.Term.(Term.java:39) at org.apache.lucene.index.Term.(Term.java:36) at org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228) at org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) spelling.txt is in my solr/home/conf. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
I will be back on it tomorrow and will see this through before 1.3 with the abstractions. In other words, -1 on cutting this off prematurely. :-) Since I don't think this is the only thing holding up 1.3, let's just play it out and get it right so all of us are happy. -Grant On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote: The current patch has been broken for some days now and implementing a correct query parsing logic may take time to get right. Let's not aim for everything to get into the 1.3 release. I would like to cut down the scope of this issue to a implementation that indexes files and Lucene indices (both Solr and arbitary) and gives suggestions while using the correct analyzer for multi-word queries. Let's get a spell checker working and commit it. We can deal with more enhancements like abstractions for custom spellcheckers and query parsing etc. in another issue which can be dealt with separately (in 1.3 or after). Thoughts? If there is a general consensus, I can give a new patch which can be good enough to go in. On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256 #action_12601256] Oleg Gnatovskiy commented on SOLR-572: -- I installed the latest patch. Still getting a NPE. Here is my config: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=defaults !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str str name=nameexternal/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=fieldTypetext_ws/str str name=indexDir/usr/local/apache/lucene/solr2home/solr/data/ spellIndex/str /lst /searchComponent Here is the URL I am hitting: http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true Here is the error: HTTP Status 500 - null java.lang.NullPointerException at org.apache.lucene.index.Term.init(Term.java:39) at org.apache.lucene.index.Term.init(Term.java:36) at org .apache .lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java: 228) at org .apache .solr .spelling .AbstractLuceneSpellChecker .getSuggestions(AbstractLuceneSpellChecker.java:71) at org .apache .solr .handler .component.SpellCheckComponent.process(SpellCheckComponent.java:177) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:153) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 274) at org .apache .catalina .core .ApplicationFilterChain .internalDoFilter(ApplicationFilterChain.java:235) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 206) at org .apache .catalina .core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org .apache .catalina .core.StandardContextValve.invoke(StandardContextValve.java:175) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 844) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint $Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) spelling.txt is in my solr/home/conf. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: I will be back on it tomorrow and will see this through before 1.3 with the abstractions. In other words, -1 on cutting this off prematurely. :-) Since I don't think this is the only thing holding up 1.3, let's just play it out and get it right so all of us are happy. This feature may not be holding back 1.3 release. The potential users of this issue are very much interested in a basic working version. They may be able to live without these advanced features. May be we can have another jira issue for enhancements which may/may not go into 1.3 (depending on when it happens). -Grant On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote: The current patch has been broken for some days now and implementing a correct query parsing logic may take time to get right. Let's not aim for everything to get into the 1.3 release. I would like to cut down the scope of this issue to a implementation that indexes files and Lucene indices (both Solr and arbitary) and gives suggestions while using the correct analyzer for multi-word queries. Let's get a spell checker working and commit it. We can deal with more enhancements like abstractions for custom spellcheckers and query parsing etc. in another issue which can be dealt with separately (in 1.3 or after). Thoughts? If there is a general consensus, I can give a new patch which can be good enough to go in. On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256] Oleg Gnatovskiy commented on SOLR-572: -- I installed the latest patch. Still getting a NPE. Here is my config: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=defaults !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str str name=nameexternal/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=fieldTypetext_ws/str str name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str /lst /searchComponent Here is the URL I am hitting: http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true Here is the error: HTTP Status 500 - null java.lang.NullPointerException at org.apache.lucene.index.Term.init(Term.java:39) at org.apache.lucene.index.Term.init(Term.java:36) at org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228) at org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) spelling.txt is in my solr/home/conf. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601031#action_12601031 ] Noble Paul commented on SOLR-572: - We must consider committing a basic version of spellchecker without the intelligent query parsing etc. Most of the users need will be met . Adding enhancements later is not a bad idea. (as long as we are not breaking backward compatibility) Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256 ] Oleg Gnatovskiy commented on SOLR-572: -- I installed the latest patch. Still getting a NPE. Here is my config: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=defaults !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str str name=nameexternal/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=fieldTypetext_ws/str str name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str /lst /searchComponent Here is the URL I am hitting: http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true Here is the error: HTTP Status 500 - null java.lang.NullPointerException at org.apache.lucene.index.Term.init(Term.java:39) at org.apache.lucene.index.Term.init(Term.java:36) at org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228) at org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) spelling.txt is in my solr/home/conf. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600520#action_12600520 ] Shalin Shekhar Mangar commented on SOLR-572: Grant, unless I'm mistaken, the reason to add spellcheck.q parameter was to avoid the tedious query parsing logic that may be needed to extract spellcheckable terms from the q parameter. Do we really need to do this? All the extra things in the q parameter are usually added by the frontend itself, isn't it? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600524#action_12600524 ] Grant Ingersoll commented on SOLR-572: -- {quote} Grant, unless I'm mistaken, the reason to add spellcheck.q parameter was to avoid the tedious query parsing logic that may be needed to extract spellcheckable terms from the q parameter. Do we really need to do this? All the extra things in the q parameter are usually added by the frontend itself, isn't it? {quote} Is that practical? How would an application even know how to generate spellcheck.q without parsing, etc.? I think the component should just work on the input query. I guess I hadn't really thought about the need for spellcheck.q before, but now that you put it in that light, I am not sure I see the need for it. I don't think all the extra things are necessarily added by the application. Users can input range queries, etc. The point is, it all depends on the application. At any rate, it is trivial to override the SpellingQueryConverter to not do the original REGEX and just apply the analyzer to produce the tokens. I suppose, we could offer two converters, one w/ the regex, and one without, or it could just have a flag. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600563#action_12600563 ] Oleg Gnatovskiy commented on SOLR-572: -- I still have some issues. Here is my config: lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str str name=nameexternal/str str name=sourceLocation/usr/local/apache/lucene/solr1home/conf/spellings.txt/str str name=fieldword/str str name=characterEncodingUTF-8/str !--str name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str-- /lst But why do I need a field for a filebased dictionary? Also is the correct way to call this URL: http://wil1devsch1.cs.tmcs:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.builld=true ? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600582#action_12600582 ] Shalin Shekhar Mangar commented on SOLR-572: Oleg -- You shouldn't need field for a file-based dictionary. fieldType is optional for file-based dictionary. field is necessary only when you're using a IndexBasedSpellChecker. If you're running into a problem it's a bug. Except for the double L in spellcheck.build in your URL, everything else looks Ok. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600590#action_12600590 ] Oleg Gnatovskiy commented on SOLR-572: -- Here is what I am getting: HTTP Status 500 - null java.lang.NullPointerException at org.apache.lucene.index.Term.init(Term.java:39) at org.apache.lucene.index.Term.init(Term.java:36) at org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:67) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600217#action_12600217 ] Oleg Gnatovskiy commented on SOLR-572: -- Did you guys change the required URL parameters structure? I am hitting the following URL: http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=default and I am getting a nullpointer exception. The config is the one from the sample, and I am using the latest patch. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600269#action_12600269 ] Otis Gospodnetic commented on SOLR-572: --- I haven't applied/tried the latest patch yet, but maybe it's quicker/better to ask here. I'm wondering/worried about the case where the input is a multi-term query string and a subset (e.g. 2 of 5 terms) of the query terms is misspelled. For example, what happens when the query is: london brigge is fallinge down (my 2 year old's current hit) In this case the suggestions should be: # brigge = bridge # fallinge = falling (or fall, more likely) Is there something in the response that will allow the client to figure out the positioning of the spelling suggestions and piece together the ideal alternative query, in this case london bridge is falling/fall down? Ideally, the client could piece the new query string, so that it can, for example, italicize the misspelled words (see Google's DYM). If the current SCRH returns the final corrected string, e.g. london bridge is falling down the client has no easy/accurate way of figuring out what was changed, I think. If the SCRH returned some mark-up that told the client which word(s) changed, then the client could do something with those changed words, e.g. london bridge{was:brigge} Or, if that has problems, maybe each word should be returned separately and sequentially: word=london/ !-- unchanged -- word=briggebridge/word or maybe with offset info: word=london offset=0/ !-- unchanged -- word=brigge offset=6bridge/word Thoughts? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600272#action_12600272 ] Oleg Gnatovskiy commented on SOLR-572: -- Hello. I am hitting http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=defaultspellcheck.build=true when trying to build the dictionary. My config looks this this: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=defaults !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.IndexBasedSpellChecker/str str name=namedefault/str str name=fieldTypetext_ws/str str name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str str name=nameexternal/str str name=sourceLocationspellings.txt/str str name=fieldTypetext_ws/str str name=characterEncodingUTF-8/str str name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str /lst /searchComponent And the NPE is: SEVERE: java.lang.NullPointerException at org.apache.solr.util.HighFrequencyDictionary.init(HighFrequencyDictionary.java:48) at org.apache.solr.spelling.IndexBasedSpellChecker.loadLuceneDictionary(IndexBasedSpellChecker.java:103) at org.apache.solr.spelling.IndexBasedSpellChecker.build(IndexBasedSpellChecker.java:84) at org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:133) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:132) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600275#action_12600275 ] Grant Ingersoll commented on SOLR-572: -- I'm working on it. Will have a new patch soon. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600277#action_12600277 ] Oleg Gnatovskiy commented on SOLR-572: -- Is it an actual error, or was I missing something? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600279#action_12600279 ] Oleg Gnatovskiy commented on SOLR-572: -- In response to Ottis, I don't think each word should be returned individually. In fact it should probably return the entire phrase, with the suggestions inserted. I believe that is what google does. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600283#action_12600283 ] Grant Ingersoll commented on SOLR-572: -- All you see from Googs is their frontend, so who knows what their spell checker does. I think we should return the words individually, the application is responsible for doing the sewing together of the new string, IMO. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600284#action_12600284 ] Oleg Gnatovskiy commented on SOLR-572: -- Should we return suggestions only for the misspelled words, or should we echo the correctly spelled ones as well? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600294#action_12600294 ] Otis Gospodnetic commented on SOLR-572: --- Right, Google only shows you the final output, not what they do in the backend. But the fact that they italicize misspelled words tells us they have a mechanism that allows the front end to identify them. So I think our task here is to figure out the best/easiest way for the client to identify misspelled words and offer the alternative query to the end user. I think what I outlined above will do that for us: * output all words sequentially * mark the words that are misspelled - it may be best to return the original word plus corrected word: word=london/ !-- unchanged -- word=briggebridge/word or maybe with offset info: word=london offset=0/ !-- unchanged -- word=brigge offset=6bridge/word It's also fine to (*also*) return the final corrected string that doesn't mark the corrected words in any way, and let the lazy clients just use that. Grant or Shalin, will either of you be adding this? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600320#action_12600320 ] Grant Ingersoll commented on SOLR-572: -- {quote} Grant or Shalin, will either of you be adding this? {quote} Yes, I am working on it. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600323#action_12600323 ] Oleg Gnatovskiy commented on SOLR-572: -- I am still confused about my NPE. Was that a config issue on my part, or was it a bug? The way Grant said he was working on it, I assumed that it was a bug :-) Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
On May 27, 2008, at 8:25 PM, Oleg Gnatovskiy (JIRA) wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600323 #action_12600323 ] Oleg Gnatovskiy commented on SOLR-572: -- I am still confused about my NPE. Was that a config issue on my part, or was it a bug? The way Grant said he was working on it, I assumed that it was a bug :-) Sorry, I meant I was working on the token alignment issue. I will look at this, too, though. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600326#action_12600326 ] Grant Ingersoll commented on SOLR-572: -- Your field is null for your Lucene configuration. You need to specify: str name=fieldfieldName/str You have fieldType instead. -Grant Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599403#action_12599403 ] Grant Ingersoll commented on SOLR-572: -- Is the prepare thread-safe for dictionary creation? Seems like there is a race-condition on the construction of the dictionaries. I suppose we need a synchronize in there. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599426#action_12599426 ] Grant Ingersoll commented on SOLR-572: -- Otis, What's the use case behind: {quote} Oh, I see, you are reading field values from the index of the current core. I think that is fine, but wouldn't it also be good to be able to read field values from a vanilla Lucene index? {quote} Seems kind of strange based on what I know of index-based spelling, but I don't know everything about it. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599478#action_12599478 ] Grant Ingersoll commented on SOLR-572: -- Also included in that last patch is a (proposed) sample configuration: {code} searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=spellchecker str name=classnameorg.apache.solr.spelling.IndexBasedSpellChecker/str lst name=dictionary str name=namedefault/str str name=fieldword/str str name=indexDirc:/temp/spellindex/str /lst /lst lst name=spellchecker str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str lst name=dictionary str name=nameexternal/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir ./spellchecker/str /lst /lst /searchComponent {code} Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599494#action_12599494 ] Otis Gospodnetic commented on SOLR-572: --- I'm still confused with some of the names in that config. indexDir looks like the path to the spellchecker index. But there is also spellcheckInexDir. Is there a functonal difference? Regarding the wouldn't it also be good to be able to read field values from a vanilla Lucene index? - the use case is that not all source indices should have to be Solr indices. What if I have a vanilla Lucene index on the machine and I want the SCRH to build a SC index from that index's title field? That is, I want the functionality of SCRH, but I don't have my Lucene index under Solr. Is that doable? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599498#action_12599498 ] Otis Gospodnetic commented on SOLR-572: --- I think the choice of appropriate suggestions should be left to the user of this service. If it's easily doable, let's make it possible and put information about frequencies in an appropriate place. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599558#action_12599558 ] Otis Gospodnetic commented on SOLR-572: --- Shalin/Grant: I think Bojan brings up some good questions: https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12598752#action_12598752 It looks like the call to SpellChecker.exist(...) really got lost: $ curl --silent https://issues.apache.org/jira/secure/attachment/12382691/SOLR-572.patch | grep 'exist(' Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599166#action_12599166 ] Grant Ingersoll commented on SOLR-572: -- OK, I'm working on this. Some thoughts: 1. Why is the initialization done in prepare? Just to be a little more lazy than in init? 2. In FieldSpellChecker, the getSuggestion method goes through and creates the suggested map, but then the loop over the entry set at the end only uses the value. I think our response should return the associated correction with the original token. 3. I'm working on the abstraction notion. Basic idea is to pass off and create something like AbstractSpellChecker with a LuceneSpellChecker instantiation and handles the loading, etc. like is currently in the SpellCheckComponent and implements getSuggestion() like you have. The goal is to have a common response, no matter the spell checker, so that we can plug and play spell checkers. I hope to have a patch soon. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599168#action_12599168 ] Shalin Shekhar Mangar commented on SOLR-572: Grant, please hold on a bit. I'm working on the patch too and it has some refactorings which may make merging two patches difficult. I'll post my patch in a few minutes and then you can take over. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599170#action_12599170 ] Grant Ingersoll commented on SOLR-572: -- OK. Kind of too late, but no worries, I will manage the merge. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599202#action_12599202 ] Shalin Shekhar Mangar commented on SOLR-572: Otis -- Sorry, I missed your post earlier. I can't think of a use-case for adding frequency information to plain text files. Spell checker's utility comes from the fact that it can suggest keywords for which Solr can return documents. That is possible only when the tokens (or synonyms) are present in the Solr index. Plain text dictionaries will be used to add additional common keywords which may not be in the Solr fields used for suggestions but may be present in huge fields which you don't want to add to spell checker. For example, I may build my index only on vehicle brands but I may like to include terms such as cars, manufacturer, make from plain text files, which may be present in my huge default search field. Since the intent would be just to match some document with the given suggestion, frequency may not play a significant role here, IMHO. What do you think? Bojan -- I think we should include an exists flag in the response. As for your point of queries with non-simple tokens, we can introduce another param like spellcheck.q to which the application can set the simple query. End users almost never know that Solr is running behind the scenes and the Solr queries are constructed by the application itself which can send the simple query in this way. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599222#action_12599222 ] Otis Gospodnetic commented on SOLR-572: --- Shalin -- I think you are right. I looked at SpellChecker again and see that the frequency in the main/searchable index is checked at suggest time, regardless of what the source of dictionary words (index or file), so frequency will be accounted for even when words are loaded from plain-text dictionary files. Unless I'm still missing something, that means that onlyMorePopular *can* (or *should*!) be used even when words are loaded from plain-text dictionary files. No? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598716#action_12598716 ] Oleg Gnatovskiy commented on SOLR-572: -- Hey guys, I am having trouble creating a file-based dictionary. The file looks like this: american mexican clothes shoes and it is in my solr.home/conf directory. The solrConfig has the following: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=dictionary str name=nameexternal/str str name=typefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir/home/csweb/index/str /lst /searchComponent I hit it with the following URL: http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=external and I get the following stacktrace: SEVERE: java.lang.NullPointerException at org.apache.lucene.search.spell.SpellChecker.indexDictionary(SpellChecker.java:321) at org.apache.solr.handler.component.SpellCheckComponent$FieldSpellChecker.init(SpellCheckComponent.java:391) at org.apache.solr.handler.component.SpellCheckComponent.loadExternalFileDictionary(SpellCheckComponent.java:204) at org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:133) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:966) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Any idea what I am doing wrong? Thanks! Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598728#action_12598728 ] Bojan Smid commented on SOLR-572: - I already found the same problem, made a fix and sent it to Shalin, he will incorporate it into next patch when it's ready. If you specify field field type for that dictionary (and that field type can be found in Solr schema), you'll avoid the problem for now. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598727#action_12598727 ] Otis Gospodnetic commented on SOLR-572: --- Haven't looked at the code, but the first thing I'd try is using a full/absolute path to your dictionary file. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598733#action_12598733 ] Otis Gospodnetic commented on SOLR-572: --- Just got an idea. File-based dictionaries don't have word frequency information and with that we use certain value (e.g. so onlyMorePopular cannot be used). What if we (also) accepted plain-text field dictionaries that included word frequency information? e.g. ball,100 boil,44 bowl,77 ... I'm not looking at sources now, but could we not feed this word frequency information into Lucene SC, so it makes use of that when figuring out top-N best words to suggest? And how would we figure out the frequency of each word to begin with? I imagine we can have a tool/class that, given a path to a dictionary file with words and a path to a Lucene/Solr index, looks up each dictionary word's frequency in the given index and outputs word,freq for each word. This class could live in Lucene SC, but could be used by SCRH when rebuilding the SC index for example. Does this sound useful and implementable? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598735#action_12598735 ] Oleg Gnatovskiy commented on SOLR-572: -- Do you mean adding something like str name=fieldword/str to the definition for the file-based dictionary? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598738#action_12598738 ] Bojan Smid commented on SOLR-572: - Oleg, that field is now called fieldType, so something like str name=fieldTypeword/str should work for you as long as you have fileType with name word defined in your schema.xml. Let me know if this works. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598752#action_12598752 ] Bojan Smid commented on SOLR-572: - I noticed that when searching for suggestion for a word which exists in dictionary, SC returns some similar word instead of returning that same word. Old SCRH had field exist which returned true if word exists in the dictionary (so the client can treat it as correct word that doesn't need suggestion). We can't have exactly the same functionality here (since multi-word queries should be supported), but we can make SC return field spellingCorrect in case all words from the query exist in the dictionary. Otherwise, there is no way to know if spelling was correct or we should display suggestion. There is a method in Lucene's SC to check if word exists in the index, so it's easy to check if word is correct. However, I'm also thinking of situation when we don't have just simple words in the query, for instance : toyata AND miles:[1 to 1], we want to check just toyata in the index, and return suggestion toyota AND miles:[1 to 1]. Other query types which might pose a problem are: - fuzzy query - wildcard query - prefix query ... Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598801#action_12598801 ] Oleg Gnatovskiy commented on SOLR-572: -- Yes, I've actually run into that problem too. Do you think this is something that you will be able to solve? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598835#action_12598835 ] Bojan Smid commented on SOLR-572: - Sure, a quick fix can be done easily, but it probably wouldn't cover all possibilities, hence my post... Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598512#action_12598512 ] Oleg Gnatovskiy commented on SOLR-572: -- Hey guys I created a dictionary index from the following XML file: add doc field name=id10/field field name=wordpizza/field /doc doc field name=id11/field field name=wordclub/field /doc doc field name=id12/field field name=wordbar/field /doc /add My config is the following: searchComponent name=spellcheck class=org.apache.solr.handler.component.SpellCheckComponent lst name=dictionary str name=namedefault/str str name=typeindex/str str name=fieldword/str !--str name=indexDirc:/temp/spellindex/str-- /lst /searchComponent and word is defined in schema.xml as: field name=word type=stringindex=true stored=true required=false/ When I run a query with the following URL: http://localhost:8983/solr/select/?q=barrspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10 I get the following response: lst name=spellcheck lst name=suggestions int name=numFound1/int arr name=barr strbar/str /arr /lst /lst which is what I expect. However with this URL: http://wil1devsch1.cs.tmcs:8983/solr/select/?q=barspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10 where bar is correctly spelled, I get the following: lst name=spellcheck lst name=suggestions int name=numFound1/int arr name=bar strbarr/str /arr /lst /lst Could you please tell me where the word barr is coming from, and why it is being suggested? Thanks! Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598533#action_12598533 ] Oleg Gnatovskiy commented on SOLR-572: -- Hey guys please disregard my last comment, I had a configuration issue that caused the problem. I was just wondering if there is a way to get the suggestions not to echo the query if there are no suggestions. For example a query where q=food probably should return a suggestion of food. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598549#action_12598549 ] Shalin Shekhar Mangar commented on SOLR-572: Oleg -- Thanks for trying out the patch. No, currently it does not signal if suggestions are not found, it just returns the query terms themselves. I'll add that feature. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597878#action_12597878 ] Shalin Shekhar Mangar commented on SOLR-572: Ok, onlyMorePopular and extendedResults will only be supported for dictionaries built from Solr fields. Yes, the Lucene SpellChecker does create n-grams but think about lowercasing, stemming etc. All this analysis can potentially change the word which eventually gets n-grammed by Lucene. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597913#action_12597913 ] Bojan Smid commented on SOLR-572: - I would like to add support for different character encodings in file-based dictionaries (current implementation will take system's default settings). I'm not sure how we'll synchronize your work with my fix? Can you let me know when/how can I start my work? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597917#action_12597917 ] Shalin Shekhar Mangar commented on SOLR-572: Bojan -- I don't want to hold you up so I've uploaded the current state of my work. Please go ahead with your changes. I can continue after you're done. Another issue I noticed with the SCRH is that it accepts the accuracy as a request parameter and calls Lucene SpellChecker.setAccuracy before getting the suggestion. However, this is neither thread-safe nor can we guarantee that the accuracy is actually enforced for the suggestion. Therefore, I think we should only have accuracy configurable in the solrconfig.xml and not as a request parameter. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597814#action_12597814 ] Shalin Shekhar Mangar commented on SOLR-572: Grant - I was trying to implement the onlyMorePopular and extendedResults format of SCRH when I realized that supporting such a response is not possible for text file based dictionaries in the current implementation. Currently, we use Lucene's PlainTextDictionary to load such text files and we don't maintain any frequency information. What do you suggest? Bojan/Otis - The terms loaded from the text files are passed onto Lucene's SpellChecker as it is. As per Noble's suggestion, I've added support for a optional fieldType attribute (this type must be defined in schema.xml). This type's query analyzer is used for queries. Wouldn't it be more consistent to apply the index-analyzer during index time also? Both the above problems can be solved if we keep the words loaded from the text files in a Lucene index but I'm not sure if we want to go that way. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597816#action_12597816 ] Otis Gospodnetic commented on SOLR-572: --- Shalin I think the onlyMorePopular and extendedResults should be optional, so in case of plain text dictionaries this information would just not be present if we cannot derive it. Even if we take words from plain text files and index them into a Lucene index their frequency will remain 1. Does the index-time analyzer make sense? I don't have the sources handy, but doesn't Lucene SC take the input word and chop it up into 2- and 3-grams before indexing? If so, how would index-time analyzer come into play? In principal, if taking plain text files and indexing words in them into a Lucene SC index solves problems, I think that's acceptable - such indices are likely to be relatively small, so they should be quick to build and not require a lot of memory. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597453#action_12597453 ] Noble Paul commented on SOLR-572: - Adding a 'field' attribute is not intuitive. If your data needs custom analyzers create an extra 'type' in the schema and let us dd an extra attribute 'dataType' eg: {code:xml} str name=fieldTypemy_new_data_type/str {code} Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597455#action_12597455 ] Grant Ingersoll commented on SOLR-572: -- Patch applies cleanly. Very cool that we have something concrete finally Some thoughts: 1. I don't believe we use author tags (is this a Solr policy? I know it is a Lucene Java convention) 2. There needs to be unit tests 3. I think it makes sense to have the option to return extended results 4. I don't think it should be a default search component, but will defer to others. 5. numFound should be returned when count 1 as well, right? In other words, the structure should be the same for the response no matter what in: {code} if (count 1) { response.add(suggestions, spellChecker.getSuggestions(q, count)); } else { NamedList suggestions = new NamedList(); suggestions.add(numFound, 1); suggestions.add(q, spellChecker.getSuggestion(q)); response.add(suggestions, suggestions); } {code} That way it can be handled uniformly on the client Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597466#action_12597466 ] Bojan Smid commented on SOLR-572: - The field attribute for file-based dictionary is basically the same field attribute as in default dictionary (in both cases they are used to obtain query analyzer), so that is the reason why I used the same name. My question was is it ok for default dictionary to use the same field to build dictionary from solr index and to obtain query analyzer for extracting tokens? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597472#action_12597472 ] Shalin Shekhar Mangar commented on SOLR-572: Bojan -- Thanks for adding this functionality. I'll work on making things more configurable like SCRH and add a few tests. I think it is OK and may even be needed for a few cases. Though I prefer Noble's suggestion on having fieldType instead of field since it gives more freedom to the user. Grant -- Thanks for looking into the patch. My comments below: # Right, those were generated by my IDE, I'll remove it in the next patch # Agree # Agree, both 2 and 3 are on my todo list # I don't understand what you mean by defer to others but on making this default or not, I'm fine either way. # Actually, the spellChecker.getSuggestion(q, count) returns a complete named list, which already has the numFound element. If you don't specify the count, then it gives back only a String for which we need to create a NamedList ourselves. In other words, the response format is actually the same both ways. Noble -- I your suggestion on keeping a fieldType attribute in the configuration for non-Solr dictionaries. We can use the QueryAnalyzer defined for the given fieldType in Solr's schema. If this attribute is not present, we can default to WhitespaceAnalyzer or StandardAnalyzer. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597481#action_12597481 ] Grant Ingersoll commented on SOLR-572: -- {quote} I don't understand what you mean by defer to others but on making this default or not, I'm fine either way. {quote} Just meaning, I'm not the only one who has a say in whether or not is a default component. My guess is not everyone will want it in the default list of components. Very cool on the other stuff. One other thing to think about: What if we want a different underlying spell checker? The Lucene spell checker approach isn't exactly state of the art as far as I understand it. Obviously not your concern at the moment, but might be good to think about the ability to interchange the underlying implementation by abstracting the notion of spelling a bit while still maintaining the same search component interface. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597494#action_12597494 ] Otis Gospodnetic commented on SOLR-572: --- Grant - I agree it would be nice. But let's get this one in first. Perhaps you can add that idea to the list in SOLR-507. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597345#action_12597345 ] Noble Paul commented on SOLR-572: - * the spellcheck.dictionary=default must be optional in query. The user must be able to name a dictionary as 'default' and that can be used as the default if no value is passed. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597351#action_12597351 ] Otis Gospodnetic commented on SOLR-572: --- I had a quick look and it all looks nice and clean. I like the config, though I think solr is too specific - the source field could be in a vanilla Lucene indexthat lives somewhere on disk, or example. Thus, I'd change solr to index. Oh, I see, you are reading field values from the index of the current core. I think that is fine, but wouldn't it also be good to be able to read field values from a vanilla Lucene index? (but you wouldn't know the field type and thus would not be able to get the Analyzer for the field) Also, and regardless of the above, instead of having indexDir and path, why not call them both location and maybe even let them include the file: schema for consistency, if it works with the code that uses those locations? Also on TODO: * Read dictionary from plain-text files. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597354#action_12597354 ] Shalin Shekhar Mangar commented on SOLR-572: Otis, I agree that we should call index' instead of solr for the type and path can be renamed to location. But indexDir refers to the target for the spell check index whereas path currently refers to the source of the dictionary, so IMHO we should keep indexDir as it is (It can also be a relative path). For supporting arbitrary lucene indices, user must specify type=index, field=fieldName, location=path/to/lucene/index/directory which should be enough (TODO). In that case the analyzer can be fixed as something (say WhitespaceAnalyzer or StandardAnalyzer). I'm not sure I understand your comment on the schema. If this is for text files then I was thinking more about having a text file which would have one word per line and all those words would go into the same dictionary. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597358#action_12597358 ] Otis Gospodnetic commented on SOLR-572: --- I see (indexDir comment). Might be better to make it more obvious then - sourceIndex for the Lucene index that serves as the source of data) vs. targetIndex (or spellcheckerIndex) for the resulting spellchecker index. For Lucene indices to be used as sources of data type=index, field=fieldName, location=path/to/lucene/index/directory makes sense. Ignore my comment about the schema, I'm just complicating things with that. Yes, one word per line for plain-text file data sources - that can easily be digested with PlainTextDictionary class (part of Lucene SC). Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.