[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045523#comment-13045523 ] James Dyer commented on SOLR-2571: -- I added thresholdTokenFrequency to the SpellCheckComponent wiki page. IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044917#comment-13044917 ] James Dyer commented on SOLR-2571: -- {quote} what makes this 'decision' of correctlySpelled? Do you know? {quote} I took a quick look to find out. Its more complicated than I thought! Here's the basic jist (I think!) : - If the instance of SolrSpellChecker returns frequency data and all suggestions have frequency 0, TRUE. - If the instance of SolrSpellChecker returns frequency data and any suggestion have frequency == 0, FALSE. - If the instance of SolrSpellChecker returns NO frequency data but has suggestions, OMIT. - If the instance of SolrSpellChecker returns NO suggestions, FALSE. Possibly this isn't fully accurate but I'm at least mostly correct here. Seems like the discrepency with DirectSolrSpellChecker is because it isn't returning Frequency info? This all happens in SpellCheckComponent.toNamedList() ... I'm guessing the code here uses the presence or absence of frequency data as kind of a proxy indicator whether or not its dealing with IndexBasedSpellChecker or FileBasedSpellChecker. Possibly it would be better if each instance of SolrSpellChecker had a isCorrectlySpelled() method that toNamedList() could call? Maybe I should I go open another jira issue for that? IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045029#comment-13045029 ] Robert Muir commented on SOLR-2571: --- {quote} This version takes all of DirectSolrSpellChecker's parameters as Integer and Float objects rather than Strings, as appropriate. {quote} Did you maybe upload an older patch? I took a look and it only seems to cutover the threshold param. {quote} I'm not sure if this would have validated any unit tests (I didn't see any tests that use DirectSolrSpellChecker). {quote} There is a test (DirectSolrSpellCheckerTest), but its probably not that great :) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045031#comment-13045031 ] Robert Muir commented on SOLR-2571: --- {quote} Possibly this isn't fully accurate but I'm at least mostly correct here. Seems like the discrepency with DirectSolrSpellChecker is because it isn't returning Frequency info? {quote} This sounds like a bug, care to open a separate issue on it? (we can resolve the int/float stuff here on this one). The thing certainly intends to return freq info... {noformat} SuggestWord[] suggestions = checker.suggestSimilar(new Term(field, token.toString()), options.count, options.reader, options.onlyMorePopular, accuracy); for (SuggestWord suggestion : suggestions) result.add(token, suggestion.string, suggestion.freq); {noformat} IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043431#comment-13043431 ] Robert Muir commented on SOLR-2571: --- Thanks for updating the patch! {quote} I found that DirectSolrSpellChecker returns results in a slightly different format than IndexBasedSpellChecker. Is this OK? Can SOLRJ handle this or do we need to tweak there? {quote} Not sure, I have used DirectSolrSpellChecker with solrj and I didn't have any problems... but that's not saying there isn't one. {quote} Also, in one case IndexBasedSpellChecker returns correctlySpelled=false while DirectSolrSpellChecker returns correctlySpelled=true. Is this discrepancy valid? {quote} I don't know, what makes this 'decision' of correctlySpelled? Do you know? Remember also the DirectSolrSpellChecker is a different spellchecker totally than IndexBasedSpellChecker (it uses a fundamentally different algorithm), although I tried to keep some of the parameters consistent. Another question is, there are lots of other float/int arguments to DirectSolrSpellChecker, maybe we should cut all of these over to int and float while we are here? IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042683#comment-13042683 ] Robert Muir commented on SOLR-2571: --- Hi James, I'm confused about this one a little bit. Perhaps DirectSolrSpellChecker is actually wrong? If I configure the thing like this: {noformat} float name=thresholdTokenFrequency0.07/float {noformat} Then it does apply the parameter. I guess what I'm asking is, if in general we should be using int/float/etc in these types and not str (especially DirectSolrSpellChecker which takes a lot of numeric parameters but expects them all to be str). Just glancing through solrconfig.xml its not clear that there is a precedent, it appears inconsistent as far as numeric parameters. IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043165#comment-13043165 ] Mike Sokolov commented on SOLR-2571: sounds like a good case for a config schema IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043168#comment-13043168 ] Robert Muir commented on SOLR-2571: --- Mike, I think I agree: currently we are relying upon examples in the wiki, but in this case one did not exist and it was/is totally confusing. IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043172#comment-13043172 ] Mike Sokolov commented on SOLR-2571: I posted a patch in SOLR-1758 that has a preliminary schema and implements schema-checking when loading config files that could help IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org