[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-07 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045523#comment-13045523
 ] 

James Dyer commented on SOLR-2571:
--

I added thresholdTokenFrequency to the SpellCheckComponent wiki page.

 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Assignee: Robert Muir
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, 
 SOLR-2571.patch, SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-06 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044917#comment-13044917
 ] 

James Dyer commented on SOLR-2571:
--

{quote}
what makes this 'decision' of correctlySpelled? Do you know?
{quote}

I took a quick look to find out.  Its more complicated than I thought!  Here's 
the basic jist (I think!) :
 - If the instance of SolrSpellChecker returns frequency data and all 
suggestions have frequency 0, TRUE.
 - If the instance of SolrSpellChecker returns frequency data and any 
suggestion have frequency == 0, FALSE.
 - If the instance of SolrSpellChecker returns NO frequency data but has 
suggestions, OMIT.
 - If the instance of SolrSpellChecker returns NO suggestions, FALSE. 

Possibly this isn't fully accurate but I'm at least mostly correct here.  Seems 
like the discrepency with DirectSolrSpellChecker is because it isn't returning 
Frequency info?

This all happens in SpellCheckComponent.toNamedList() ... I'm guessing the code 
here uses the presence or absence of frequency data as kind of a proxy 
indicator whether or not its dealing with IndexBasedSpellChecker or 
FileBasedSpellChecker.  Possibly it would be better if each instance of 
SolrSpellChecker had a isCorrectlySpelled() method that toNamedList() could 
call?  Maybe I should I go open another jira issue for that?


 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, 
 SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045029#comment-13045029
 ] 

Robert Muir commented on SOLR-2571:
---

{quote}
This version takes all of DirectSolrSpellChecker's parameters as Integer and 
Float objects rather than Strings, as appropriate. 
{quote}

Did you maybe upload an older patch? I took a look and it only seems to cutover 
the threshold param.

{quote}
I'm not sure if this would have validated any unit tests (I didn't see any 
tests that use DirectSolrSpellChecker).
{quote}

There is a test (DirectSolrSpellCheckerTest), but its probably not that great :)


 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, 
 SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045031#comment-13045031
 ] 

Robert Muir commented on SOLR-2571:
---

{quote}
Possibly this isn't fully accurate but I'm at least mostly correct here. Seems 
like the discrepency with DirectSolrSpellChecker is because it isn't returning 
Frequency info?
{quote}

This sounds like a bug, care to open a separate issue on it? (we can resolve 
the int/float stuff here on this one).

The thing certainly intends to return freq info...
{noformat}
SuggestWord[] suggestions = checker.suggestSimilar(new Term(field, 
token.toString()), 
  options.count, options.reader, options.onlyMorePopular, accuracy);
  for (SuggestWord suggestion : suggestions)
result.add(token, suggestion.string, suggestion.freq);
{noformat}

 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, 
 SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043431#comment-13043431
 ] 

Robert Muir commented on SOLR-2571:
---

Thanks for updating the patch!

{quote}
I found that DirectSolrSpellChecker returns results in a slightly different 
format than IndexBasedSpellChecker. Is this OK? Can SOLRJ handle this or do we 
need to tweak there?
{quote}

Not sure, I have used DirectSolrSpellChecker with solrj and I didn't have any 
problems... but that's not saying there isn't one.

{quote}
Also, in one case IndexBasedSpellChecker returns correctlySpelled=false while 
DirectSolrSpellChecker returns correctlySpelled=true. Is this discrepancy 
valid?
{quote}

I don't know, what makes this 'decision' of correctlySpelled? Do you know? 
Remember also the DirectSolrSpellChecker is a different spellchecker totally 
than IndexBasedSpellChecker (it uses a fundamentally different algorithm), 
although I tried to keep some of the parameters consistent.

Another question is, there are lots of other float/int arguments to 
DirectSolrSpellChecker, maybe we should cut all of these over to int and 
float while we are here?


 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042683#comment-13042683
 ] 

Robert Muir commented on SOLR-2571:
---

Hi James, I'm confused about this one a little bit. Perhaps 
DirectSolrSpellChecker is actually wrong?

If I configure the thing like this:

{noformat}
float name=thresholdTokenFrequency0.07/float
{noformat}

Then it does apply the parameter. I guess what I'm asking is, if in general we 
should be using int/float/etc in these types and not str (especially 
DirectSolrSpellChecker which takes a lot of numeric parameters but expects them 
all to be str). Just glancing through solrconfig.xml its not clear that there 
is a precedent, it appears inconsistent as far as numeric parameters.


 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-02 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043165#comment-13043165
 ] 

Mike Sokolov commented on SOLR-2571:


sounds like a good case for a config schema

 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043168#comment-13043168
 ] 

Robert Muir commented on SOLR-2571:
---

Mike, I think I agree: currently we are relying upon examples in the wiki, but 
in this case one did not exist and it was/is totally confusing.


 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-02 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043172#comment-13043172
 ] 

Mike Sokolov commented on SOLR-2571:


I posted a patch in SOLR-1758 that has a preliminary schema and implements 
schema-checking when loading config files that could help

 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org