[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-07-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12610691#action_12610691
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Here are 2 more bugs:

1)
Search for:
  united states of America
Suggests:
united states oft America

It looks like the SC doesn't check stopwords, and of is a stopword.  Thus, it 
does not exist in the index,
but oft does, so SC suggests oft and thinks of is misspelled.  I think 
the SC component should check the list of
stopwords, too, no?

2)
Search for:
united states of America
Suggests:
united states oftAmericaa

The of-oft is described above.  But note how SC suggested America-Americaa, 
but it didn't do that for america.
This looks like case-sensitivity problem.  Shouldn't the SC be case-insensitive?

I can't produce a patch now (no src handy), so I'm hoping Grant or somebody 
else can do it based on this report.


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: solr-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 http://wiki.apache.org/solr/SpellCheckComponent
 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-07-03 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12610276#action_12610276
 ] 

Grant Ingersoll commented on SOLR-572:
--

Hi Bojan,

Thanks for the patch.  I think it would be best to open a new issue for it.  

However, I'm not sure what is going on here.  When I look at the Lucene code, 
it has this:
{code}
final int freq = (ir != null  field != null) ? ir.docFreq(new Term(field, 
word)) : 0;
final int goalFreq = (morePopular  ir != null  field != null) ? freq : 0;
// if the word exists in the real index and we don't care for word frequency, 
return the word itself
if (!morePopular  freq  0) {
  return new String[] { word };
}
{code}

The comment says it all, so maybe we have something else going on wrong.

At a minimum, your patch at least needs to account for when you want to get 
more popular suggestions even if the word exists. 

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: solr-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 http://wiki.apache.org/solr/SpellCheckComponent
 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-26 Thread Geoffrey Young (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608487#action_12608487
 ] 

Geoffrey Young commented on SOLR-572:
-

I'm seeing random weirdness in the collation results.  the same query 
shift-refreshed sometimes yields (in json)


{noformat}
{
 responseHeader:{
params:{
spellcheck:true,
q:redbull air show,
qf:search-en,
spellcheck.collate:true,
qt:dismax,
wt:json,
rows:0}},
 response:{numFound:0,start:0,docs:[]
 },
 spellcheck:{
  suggestions:[
redbull,[
 numFound,1,
 startOffset,0,
 endOffset,7,
 suggestion,[redbelly]],
show,[
 numFound,1,
 startOffset,12,
 endOffset,16,
 suggestion,[shot]],
collation,redbelly airshotw]}}
{noformat}

note the collation spacing and extraneous 'w'.  a refresh toggles between 
that and what you might expect :

{noformat}
collation,redbelly air shot]
{noformat}

--Geoff

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 http://wiki.apache.org/solr/SpellCheckComponent
 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608519#action_12608519
 ] 

Grant Ingersoll commented on SOLR-572:
--

Can you open a new issue to track this?  Looks like a string replace issue on 
the offsets.  We probably should do the collation a bit differently to make 
sure the words fit right.  We'll probably have to right pad or something like 
that.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 http://wiki.apache.org/solr/SpellCheckComponent
 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-26 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608527#action_12608527
 ] 

Sean Timm commented on SOLR-572:


For what it is worth, here is the code that I used client side before the 
collation feature was available.  I haven't looked at how it is done in this 
patch.  It has some nice features such as delimiting the spelling correction, 
e.g., with HTML bold tags, and preserving the users initial case on each word.

{code}
StringBuilder buff = new StringBuilder();
StringBuilder rawBuff = new StringBuilder();
int last = 0;
String userStr = null;
// for each suggestion
for( Suggestion s : suggestions ) {
// add part before the mispelling
userStr = userQuery.substring( last, s.startOffset );
buff.append( userStr );
rawBuff.append( userStr );
String suggestion = s.suggestion;
if( _spellCheckPreserveUserCase ) {
userStr = userQuery.substring( s.startOffset, s.endOffset );
char[] userCh = userStr.toCharArray();
boolean initialUpper = Character.isUpperCase( userCh[0] );
boolean allUpper = true;
for( char c : userCh ) {
if( Character.isLowerCase( c ) ) {
allUpper = false;
break;
}
}
if( allUpper ) {
suggestion = suggestion.toUpperCase();
}
else if( initialUpper ) {
userCh = suggestion.toCharArray();
userCh[0] = Character.toUpperCase( userCh[0] );
suggestion = new String( userCh );
}
}
buff.append( _spellCheckStartHighlight ).append( suggestion )
.append( _spellCheckEndHighlight );
rawBuff.append( suggestion );
last = s.endOffset;
}
// add part after all mispellings
userStr = userQuery.substring( last );
buff.append( userStr );
rawBuff.append( userStr );
if( log().isDebugEnabled() ) {
log().debug( Did you mean:  + buff );
log().debug( Did you mean link:  + rawBuff );
}
{code}

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 http://wiki.apache.org/solr/SpellCheckComponent
 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-20 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12606719#action_12606719
 ] 

Grant Ingersoll commented on SOLR-572:
--

Because of the stupid way it gets initialized as a  
NamedListInitializerWhateverWhatever.  I'm open to alternate  
suggestions on how to do it and take advantage of the resource loader,  
etc.

Every time I go to do initialization stuff in Solr these days I pine  
for Spring, since we are basically re-inventing it, albeit not as  
nicely.

-Grant



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 http://wiki.apache.org/solr/SpellCheckComponent
 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-19 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12606634#action_12606634
 ] 

Noble Paul commented on SOLR-572:
-

Why do we need to add the queryConverter definition outside of the speallcheck 
search component? 
Is it going to be used by any other component other than this? 

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 http://wiki.apache.org/solr/SpellCheckComponent
 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-17 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605635#action_12605635
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


A few questions/comments:

# Why is a WhiteSpaceTokenizer being used for tokenizing the value for a 
spellcheck.q parameter? Wouldn't it be more correct to use the query analyzer 
if the index is being built from a Solr field?
# The above argument also applies to queryAnalyzerFieldType which is being used 
for QueryConverter.
# I see that we can specify our own query converter through the queryConverter 
section in solrconfig.xml. But the SpellCheckComponent uses 
SpellingQueryConverter directly instead of an interface. We should add a 
QueryConvertor interface if this needs to be pluggable.
# If name is omitted from two dictionaries in solrconfig.xml then both get 
named as Default from the SolrSpellChecker#init method and they overwrite each 
other in the spellCheckers map
# How about building the index in the inform() method? I understand that the 
users can build the index using spellcheck.build=true and they can also use 
QuerySenderListener to build the index but this limits the user to use 
FSDirectory because if we use RAMDirectory and solr is restarted, the 
QuerySenderListener never fires and spell checker is left with no index. It's 
not a major inconvenience to use FSDirectory always but then RAMDirectory 
doesn't bring much to the table.



 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-17 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605645#action_12605645
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
Why is a WhiteSpaceTokenizer being used for tokenizing the value for a 
spellcheck.q parameter? Wouldn't it be more correct to use the query analyzer 
if the index is being built from a Solr field?

The above argument also applies to queryAnalyzerFieldType which is being used 
for QueryConverter
{quote}

My understanding was that the sc.q parameter was already analyzed and ready to 
be checked, thus all it needed was a conversion to tokens.  As for the 
queryAnalyzerFieldType, that assumes the implementation is the 
IndexBasedSpellChecker or some other field based one that the 
SpellCheckComponent doesn't have access to, thus my reasoning that it needs to 
be handled separately and explicitly, which is why it isn't a part of the 
spellchecker configuration.

 {quote}
I see that we can specify our own query converter through the queryConverter 
section in solrconfig.xml. But the SpellCheckComponent uses 
SpellingQueryConverter directly instead of an interface. We should add a 
QueryConvertor interface if this needs to be pluggable.
{quote}

I thought about making it an abstract base class, but in my mind it is really 
easy to override the SpellingQueryConverter and the component should know how 
to deal with it.

 {quote}
If name is omitted from two dictionaries in solrconfig.xml then both get named 
as Default from the SolrSpellChecker#init method and they overwrite each other 
in the spellCheckers map
{quote}

Hmm, not good.  I will fix.

{quote}
How about building the index in the inform() method? I understand that the 
users can build the index using spellcheck.build=true and they can also use 
QuerySenderListener to build the index but this limits the user to use 
FSDirectory because if we use RAMDirectory and solr is restarted, the 
QuerySenderListener never fires and spell checker is left with no index. It's 
not a major inconvenience to use FSDirectory always but then RAMDirectory 
doesn't bring much to the table.
{quote}

I think this gets back to our early discussions about it not working in inform 
b/c we don't have the reader at that point, or something like that.  I really 
don't know the right answer, but do feel free to try it out.  I do think it 
belongs in inform, but not sure if Solr is ready at that point.  As for the 
QuerySenderListener, seems like it should fire if it is restarted, but I admit 
I don't know a whole lot about that functionality.  


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605293#action_12605293
 ] 

Grant Ingersoll commented on SOLR-572:
--

OK, I'd like to commit this tomorrow or Wednesday.  I am going to open another 
issue to bring in LUCENE-1297 to the configuration

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605367#action_12605367
 ] 

Yonik Seeley commented on SOLR-572:
---

For those who are just casually following this issue, is there a good summary 
of current  input options and example output?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-16 Thread Shalin Shekhar Mangar
Grant created a wiki page at
http://wiki.apache.org/solr/SpellCheckComponentwhich has some
documentation on the configuration. I'll try to add more
documentation when I try this out tomorrow.

On Tue, Jun 17, 2008 at 12:03 AM, Yonik Seeley (JIRA) [EMAIL PROTECTED]
wrote:


[
 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605367#action_12605367]

 Yonik Seeley commented on SOLR-572:
 ---

 For those who are just casually following this issue, is there a good
 summary of current  input options and example output?

  Spell Checker as a Search Component
  ---
 
  Key: SOLR-572
  URL: https://issues.apache.org/jira/browse/SOLR-572
  Project: Solr
   Issue Type: New Feature
   Components: spellchecker
 Affects Versions: 1.3
 Reporter: Shalin Shekhar Mangar
 Assignee: Grant Ingersoll
 Priority: Minor
  Fix For: 1.3
 
  Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
 
 
  Expose the Lucene contrib SpellChecker as a Search Component. Provide the
 following features:
  * Allow creating a spell index on a given field and make it possible to
 have multiple spell indices -- one for each field
  * Give suggestions on a per-field basis
  * Given a multi-word query, give only one consistent suggestion
  * Process the query with the same analyzer specified for the source field
 and process each token separately
  * Allow the user to specify minimum length for a token (optional)
  Consistency criteria for a multi-word query can consist of the following:
  * Preserve the correct words in the original query as it is
  * Never give duplicate words in a suggestion

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




-- 
Regards,
Shalin Shekhar Mangar.


[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-13 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604818#action_12604818
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
the spell checker component handling build/reload seems highly awkward to me. 
suggestion component really should just do that... and wrap the other 
operations as a /spellchecker/rebuild kinda thing and not even necessarily 
componentize those operations since they don't really necessarily need to be 
hooked together with other operations as a single request.
{quote}

I've thought about a bit, too, as it bothers me, too, but I think the 
initialization, etc. gets a bit tricky, like all Solr initialization.  Not sure 
what to do.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-13 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604830#action_12604830
 ] 

Grant Ingersoll commented on SOLR-572:
--

Sean,

I see the issue and am working on it.  Good catch.  I'll have a patch shortly.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-12 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604617#action_12604617
 ] 

Sean Timm commented on SOLR-572:


It doesn't appear that you can get both extendedResults and count  1.  With 
the below URL, I get 1 suggestion for each misspelled term regardless of the 
value of spellcheck.count.  If I set spellcheck.extendedResults=false, then I 
get the requested three suggestions for each term.

{noformat}
/solr/spellCheckCompRH/?q=waz+designatd+two+bee+Arvil+25+bye+Pres.+it+wazversion=2.2start=0rows=2indent=onspellcheck=truefl=title,url,id,categories,scorehl=onhl.fl=bodyqt=dismaxspellcheck.extendedResults=truespellcheck.count=3
{noformat}

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-12 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604639#action_12604639
 ] 

Erik Hatcher commented on SOLR-572:
---

the spell checker component handling build/reload seems highly awkward to me.   
suggestion component really should just do that... and wrap the other 
operations as a /spellchecker/rebuild kinda thing and not even necessarily 
componentize those operations since they don't really necessarily need to be 
hooked together with other operations as a single request.

anyway, just the overloading of a component to do managerial operations seems 
awkward.  food for thought.  not a -1 kinda thing though.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-10 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604052#action_12604052
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
I don't think it should give me that suggestion. If a word is in the dictionary 
it should not give any suggestions. Am I right?
{quote}

Possibly.  I think it should give a better suggestion if one exists (i.e. more 
frequent) but otherwise, yes, it shouldn't give any suggestion.   For your 
example, I would argree that it should not return a suggestion (assuming golf 
is in the dictionary).  For example, the index could contain the words gilf and 
golf, with gilf having a freq. of 1 and golf having a freq of 10.  If the 
user enters gilf, I think it is reasonable to assume that the suggestion should 
be golf, even though gilf exists.

Not saying this is supported yet, or anything, but just laying out the case.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-10 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604056#action_12604056
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

I think that lower frequency suggestions should be optional. Some users might 
only want to offer suggestions for misspelled words (words not in the 
dictionary). Would it be hard to check if the query term exists in the 
dictionary before returning a suggestion?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-10 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12604062#action_12604062
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I think the frequency awareness may be interesting.  What happens if gilf has 
a frequency of 95K and golf a freq of 100K?  Do we need this to become a SCRH 
config setting expressed as a percentage? (e.g. Show alternative word 
suggestions even if the input word exists in the index iff freq(input 
word)/freq(suggested word)*100  N%?)


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-06 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603070#action_12603070
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Do these latest patches require Lucene 2.4? Would it be better to stay with 
2.3.1?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-06 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603075#action_12603075
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
Do these latest patches require Lucene 2.4? Would it be better to stay with 
2.3.1?
{quote}

They require what is checked into Solr's lib directory, which is Lucene's trunk 
as of yesterday.  There are actually a few changes in Lucene's spell checker 
that I think are worth having in 2.4.  Additionally, I think we will want 
LUCENE-1297 before we are through, which is probably another configuration 
item.  However, that can be added later, unless Otis commits it fairly soon.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-06 Thread Swarag Segu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603220#action_12603220
 ] 

Swarag Segu commented on SOLR-572:
--

Hey guys. Installed the latest patch. Old problem is still there. For example 
if I do q=piza I get:

lst name=spellcheck

lst name=suggestions

lst name=pizzza
int name=numFound1/int
int name=startOffset0/int
int name=endOffset6/int

arr name=suggestion
strpizza/str
/arr
/lst
/lst
/lst

Which is good. Then I do q=golf (golf is in the dictionary)

lst name=spellcheck

lst name=spellcheck
−
lst name=suggestions
−
lst name=golf
int name=numFound1/int
int name=startOffset0/int
int name=endOffset4/int
−
arr name=suggestion
strroof/str
/arr
/lst
/lst
/lst

I don't think it should give me that suggestion. If a word is in the dictionary 
it should not give any suggestions. Am I right?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602645#action_12602645
 ] 

Grant Ingersoll commented on SOLR-572:
--

One thing I haven't quite settled in my mind is the use of the File based spell 
checker.  It seems to me, that the use case for this is as an override where 
one feels the index based spelling is not correct.  Is that right?  Or am I 
missing something?

If it is the case, shouldn't we allow the option, at least, of it truly acting 
as an override?  Currently, the only way to get at it is by passing the 
dictionary name as the param.  The only way I can see this as useful is if you 
are making several round trips to the server, which means you might as well be 
using a request handler and not a search component.

Thoughts?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602651#action_12602651
 ] 

Bojan Smid commented on SOLR-572:
-

File based spell checker would probably be used in cases when Solr index is too 
small or too young. So a user would compile a dictionary file (for instance, 
UNIX words file) and use it as a dictionary.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602654#action_12602654
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
File based spell checker would probably be used in cases when Solr index is too 
small or too young. So a user would compile a dictionary file (for instance, 
UNIX words file) and use it as a dictionary.
{quote}

But how is it useful to return results that aren't in the index?  It's not like 
querying on them results in anything useful.  Seems to me, that in this case, 
you just need to rebuild your dictionary on a regular basis.  Or is it that 
people are using Solr as a spelling server?

Now, I can see it as an override situation.  i.e. one wishes to override 
certain results from the index based one with ones that are in known to be in 
the dictionary, but are lower down.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602687#action_12602687
 ] 

Grant Ingersoll commented on SOLR-572:
--

Oleg,

Can you try specifying a field value anyway for your bug up above?  I think 
this is actually a bug in the Lucene Spell checker.  Namely, the docs say that 
the field value can be null, but, it is trying to construct a Term, which 
requires a non-null field name.

Just give it the name word, perhaps

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602706#action_12602706
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Grant -- The exception is happening because the SpellCheckComponent always 
passes Solr's own IndexReader when calling the 
AbstractLuceneSpellChecker#getSuggestions method even when the underlying spell 
checker is a FileBasedSpellChecker. In that case, since a non-null IndexReader 
is passed onto Lucene, it tries to create a term on the null field name. That 
is when the NullPointerException comes up.

Another problem will occur when using IndexBasedSpellChecker with an arbitary 
Lucene index, because then too, the Solr's IndexReader would be passed to 
Lucene SpellChecker instead of the actual index's reader.

I think a possible solution can be to add another abstract method with the same 
signature as Lucene's SpellChecker to the AbstractLuceneSpellChecker and let 
each sub-class get suggestions on it's own. That way FileBasedSpellChecker will 
pass the correct IndexReader or a null IndexReader into Lucene appropriately. 
The AbstractLuceneSpellChecker#getSuggestion will just call the underlying 
suggest method, get the String[] back and process as it does right now.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602712#action_12602712
 ] 

Otis Gospodnetic commented on SOLR-572:
---

By collate you mean that the SCRH would not only return 
suggestions/corrections for individual token, but it would also try to glue 
together an already corrected query string based on its suggestions?

Example:
Query: cogito ega sum

SCRH returns this correction:
erga - ergo

But also tries to give you the whole thing corrected:
cogito ergo sum

That?  Sounds useful - less work for the client app, should the app developers 
decide that SCRH's collated suggestions are what they would have to do 
themselves anyway.


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602743#action_12602743
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hey guys. Installed the latest patch. Old problem is still there. For example 
if I do q=piza I get:

lst name=spellcheck

lst name=suggestions

lst name=pizzza
int name=numFound1/int
int name=startOffset0/int
int name=endOffset6/int

arr name=suggestion
strpizza/str
/arr
/lst
/lst
/lst

Which is good. Then I do q=pizza (pizza is in the dictionary)

lst name=spellcheck

lst name=suggestions

lst name=pizza
int name=numFound1/int
int name=startOffset0/int
int name=endOffset5/int

arr name=suggestion
strplaza/str
/arr
/lst
/lst
/lst

I don't think it should give me that suggestion. If a word is in the dictionary 
it should not give any suggestions.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602745#action_12602745
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
Grant - The exception is happening because the SpellCheckComponent always 
passes Solr's own IndexReader when calling the 
AbstractLuceneSpellChecker#getSuggestions method even when the underlying spell 
checker is a FileBasedSpellChecker. In that case, since a non-null IndexReader 
is passed onto Lucene, it tries to create a term on the null field name. That 
is when the NullPointerException comes up.
{quote}

Yep, I think I fixed this piece.  See also LUCENE-1299

{quote}
I think a possible solution can be to add another abstract method with the same 
signature as Lucene's SpellChecker to the AbstractLuceneSpellChecker and let 
each sub-class get suggestions on it's own. That way FileBasedSpellChecker will 
pass the correct IndexReader or a null IndexReader into Lucene appropriately. 
The AbstractLuceneSpellChecker#getSuggestion will just call the underlying 
suggest method, get the String[] back and process as it does right now.
{quote}

Not sure I follow the solution (I understand the problem)  Which signature are 
you suggesting?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602828#action_12602828
 ] 

Mike Klaas commented on SOLR-572:
-

[quote]Another use case is where Solr is used with indices that are not indices 
for a narrow domain or that don't have nice, clean, short fields that can be 
used for populating the SC index. For example, if the index consists of a pile 
of web pages, I don't think I'd want to use their data (not even their titles) 
to populate the SC index. I'd really want just a plain dictionary-powered 
SCRH.[/quote]

It works great, actually.  That was you get all the abbreviations, jargon, 
proper names, etc.   Thresholding help prevent most of the cruft from appearing 
in the index.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Swarag Segu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602856#action_12602856
 ] 

Swarag Segu commented on SOLR-572:
--

Hey Guys,
I installed the latest patch and it gives me compile errors :

compile:
[mkdir] Created dir: C:\Documents and Settings\Swarag 
Segu\workspace\solrSrc\build\core
[javac] Compiling 324 source files to C:\Documents and Settings\Swarag 
Segu\workspace\solrSrc\build\core
[javac] C:\Documents and Settings\Swarag 
Segu\workspace\solrSrc\src\java\org\apache\solr\spelling\FileBasedSpellChecker.java:97:
 cannot find symbol
[javac] symbol  : variable MaxFieldLength
[javac] location: class org.apache.lucene.index.IndexWriter
[javac] true, IndexWriter.MaxFieldLength.UNLIMITED);
[javac]  ^
[javac] C:\Documents and Settings\Swarag 
Segu\workspace\solrSrc\src\java\org\apache\solr\spelling\FileBasedSpellChecker.java:96:
 internal error; cannot instantiate org.apache.lucene.index.IndexWriter.init 
at org.apache.lucene.index.IndexWriter to ()
[javac] IndexWriter writer = new IndexWriter(ramDir, 
fieldType.getAnalyzer(),
[javac]  ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors


Am I missing something?
Thanks,
Swarag.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-04 Thread Grant Ingersoll
There are working patches available on the issue without the advanced  
features and everyone is free to fix the current one.  It's not like  
it is that far off from being able to have proper spellchecking,  
pluggability, and context information about where the mistakes are.  I  
frankly don't get what all the fuss is about.


Is it that you disagree with the approach?  That hasn't come across in  
the discussions, but if it is, say so.  I thought we were working on  
it quite well together and made some good progress and are pretty darn  
close.  I don't see that I've taken away any functionality that the  
original patch offers, but I did change it so that it fits a broader  
audience, namely those who are interested in other spell checkers and  
those who want info about where in the query the problem occurs.   
Which, is what the comments suggest people are interested in and also  
what I am interested in for 1.3.


And, I'm sorry, but I said I'd have to let it lie for a few days and  
then I would be back to it.  Cut me some slack.  I don't get paid to  
work on Solr full time.   Is it truly that important that someone  
can't wait a few days for a patch on the trunk version for something  
they never had before?  It ain't like we're talking some core bug here  
that has everyone broken.  Besides, others are perfectly welcome to  
work on it in the meantime.


Sorry for the rant, but I am not going to be pressured into committing  
a patch that I don't think is ready and one that I said I am going to  
be working on to see it through so that we all are happy.


-Grant

On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:


On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll  
[EMAIL PROTECTED] wrote:
I will be back on it tomorrow and will see this through before 1.3  
with the
abstractions.  In other words, -1 on cutting this off  
prematurely.  :-)
Since I don't think this is the only thing holding up 1.3, let's  
just play

it out and get it right so all of us are happy.


This feature may not be holding back 1.3 release. The potential users
of this issue are very much interested in a basic working version.
They may be able to live without these advanced features. May be we
can have another jira issue for enhancements which may/may not go into
1.3 (depending on when it happens).





-Grant

On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:

The current patch has been broken for some days now and  
implementing a
correct query parsing logic may take time to get right. Let's not  
aim for

everything to get into the 1.3 release.

I would like to cut down the scope of this issue to a  
implementation that

indexes files and Lucene indices (both Solr and arbitary) and gives
suggestions while using the correct analyzer for multi-word  
queries. Let's

get a spell checker working and commit it. We can deal with more
enhancements like abstractions for custom spellcheckers and query  
parsing

etc. in another issue which can be dealt with separately (in 1.3 or
after).
Thoughts? If there is a general consensus, I can give a new patch  
which

can
be good enough to go in.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] 


wrote:



[

https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256 
#action_12601256]


Oleg Gnatovskiy commented on SOLR-572:
--

I installed the latest patch. Still getting a NPE. Here is my  
config:


searchComponent name=spellcheck
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst

 lst name=spellchecker
  str
name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/ 
str

  str name=nameexternal/str
   str name=sourceLocationspellings.txt/str
   str name=characterEncodingUTF-8/str
   str name=fieldTypetext_ws/str
  str

name=indexDir/usr/local/apache/lucene/solr2home/solr/data/ 
spellIndex/str

/lst
/searchComponent


Here is the URL I am hitting:

http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

Here is the error:

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.lucene.index.Term.init(Term.java:39) at
org.apache.lucene.index.Term.init(Term.java:36) at

org 
.apache 
.lucene 
.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)

at

org 
.apache 
.solr 
.spelling 
.AbstractLuceneSpellChecker 
.getSuggestions(AbstractLuceneSpellChecker.java:71)

at

org 
.apache 
.solr 
.handler 
.component.SpellCheckComponent.process(SpellCheckComponent.java: 
177)

at

org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:153)

at

org 

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-04 Thread Shalin Shekhar Mangar
Hi Grant,

I did not intend to offend you or put pressure on you in any way. Please
accept my apologies if I came off as rude. In fact, I've been having a lot
of fun working with you and Bojan on this issue. We've definitely covered a
lot of ground very fast.

I completely in favor of the goals for this piece. I was merely suggesting
that with the 1.3 release being a priority, we should go one step at a time
and commit per the initial scope for this issue as written in the issue's
description and then handle the enhancements in another issue. But I'm all
for it if you want to add extra functionality within the same issue.

Once again, I'm deeply sorry if you found my comment offending in any way.

Regards,
Shalin

On Wed, Jun 4, 2008 at 4:33 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:

 There are working patches available on the issue without the advanced
 features and everyone is free to fix the current one.  It's not like it is
 that far off from being able to have proper spellchecking, pluggability, and
 context information about where the mistakes are.  I frankly don't get what
 all the fuss is about.

 Is it that you disagree with the approach?  That hasn't come across in the
 discussions, but if it is, say so.  I thought we were working on it quite
 well together and made some good progress and are pretty darn close.  I
 don't see that I've taken away any functionality that the original patch
 offers, but I did change it so that it fits a broader audience, namely those
 who are interested in other spell checkers and those who want info about
 where in the query the problem occurs.  Which, is what the comments suggest
 people are interested in and also what I am interested in for 1.3.

 And, I'm sorry, but I said I'd have to let it lie for a few days and then I
 would be back to it.  Cut me some slack.  I don't get paid to work on Solr
 full time.   Is it truly that important that someone can't wait a few days
 for a patch on the trunk version for something they never had before?  It
 ain't like we're talking some core bug here that has everyone broken.
  Besides, others are perfectly welcome to work on it in the meantime.

 Sorry for the rant, but I am not going to be pressured into committing a
 patch that I don't think is ready and one that I said I am going to be
 working on to see it through so that we all are happy.

 -Grant


 On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

  On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:

 I will be back on it tomorrow and will see this through before 1.3 with
 the
 abstractions.  In other words, -1 on cutting this off prematurely.  :-)
 Since I don't think this is the only thing holding up 1.3, let's just
 play
 it out and get it right so all of us are happy.


 This feature may not be holding back 1.3 release. The potential users
 of this issue are very much interested in a basic working version.
 They may be able to live without these advanced features. May be we
 can have another jira issue for enhancements which may/may not go into
 1.3 (depending on when it happens).




 -Grant

 On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:

  The current patch has been broken for some days now and implementing a
 correct query parsing logic may take time to get right. Let's not aim
 for
 everything to get into the 1.3 release.

 I would like to cut down the scope of this issue to a implementation
 that
 indexes files and Lucene indices (both Solr and arbitary) and gives
 suggestions while using the correct analyzer for multi-word queries.
 Let's
 get a spell checker working and commit it. We can deal with more
 enhancements like abstractions for custom spellcheckers and query
 parsing
 etc. in another issue which can be dealt with separately (in 1.3 or
 after).
 Thoughts? If there is a general consensus, I can give a new patch which
 can
 be good enough to go in.

 On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) 
 [EMAIL PROTECTED]
 wrote:


 [


 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256
 #action_12601256]

 Oleg Gnatovskiy commented on SOLR-572:
 --

 I installed the latest patch. Still getting a NPE. Here is my config:

 searchComponent name=spellcheck
 class=org.apache.solr.handler.component.SpellCheckComponent
 lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
 /lst

  lst name=spellchecker
  str
 name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
   str name=sourceLocationspellings.txt/str
   str name=characterEncodingUTF-8/str
   str name=fieldTypetext_ws/str
  str


 

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-04 Thread Otis Gospodnetic
Yeah, as an observer I sensed no bad intentions here.

Anyhow, 1.3 is not scheduled yet, my guess is we are still at least a few weeks 
away from 1.3 (and if I had to bet I'd bet at 1.3 being released close to the 
end of summer).  Grant is very eager about this and will get it all in.  Case 
closed, I think.  Nothing to see here, move along.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Shalin Shekhar Mangar [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Wednesday, June 4, 2008 1:32:48 PM
 Subject: Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
 
 Hi Grant,
 
 I did not intend to offend you or put pressure on you in any way. Please
 accept my apologies if I came off as rude. In fact, I've been having a lot
 of fun working with you and Bojan on this issue. We've definitely covered a
 lot of ground very fast.
 
 I completely in favor of the goals for this piece. I was merely suggesting
 that with the 1.3 release being a priority, we should go one step at a time
 and commit per the initial scope for this issue as written in the issue's
 description and then handle the enhancements in another issue. But I'm all
 for it if you want to add extra functionality within the same issue.
 
 Once again, I'm deeply sorry if you found my comment offending in any way.
 
 Regards,
 Shalin
 
 On Wed, Jun 4, 2008 at 4:33 PM, Grant Ingersoll wrote:
 
  There are working patches available on the issue without the advanced
  features and everyone is free to fix the current one.  It's not like it is
  that far off from being able to have proper spellchecking, pluggability, and
  context information about where the mistakes are.  I frankly don't get what
  all the fuss is about.
 
  Is it that you disagree with the approach?  That hasn't come across in the
  discussions, but if it is, say so.  I thought we were working on it quite
  well together and made some good progress and are pretty darn close.  I
  don't see that I've taken away any functionality that the original patch
  offers, but I did change it so that it fits a broader audience, namely those
  who are interested in other spell checkers and those who want info about
  where in the query the problem occurs.  Which, is what the comments suggest
  people are interested in and also what I am interested in for 1.3.
 
  And, I'm sorry, but I said I'd have to let it lie for a few days and then I
  would be back to it.  Cut me some slack.  I don't get paid to work on Solr
  full time.   Is it truly that important that someone can't wait a few days
  for a patch on the trunk version for something they never had before?  It
  ain't like we're talking some core bug here that has everyone broken.
   Besides, others are perfectly welcome to work on it in the meantime.
 
  Sorry for the rant, but I am not going to be pressured into committing a
  patch that I don't think is ready and one that I said I am going to be
  working on to see it through so that we all are happy.
 
  -Grant
 
 
  On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
   On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll 
  wrote:
 
  I will be back on it tomorrow and will see this through before 1.3 with
  the
  abstractions.  In other words, -1 on cutting this off prematurely.  :-)
  Since I don't think this is the only thing holding up 1.3, let's just
  play
  it out and get it right so all of us are happy.
 
 
  This feature may not be holding back 1.3 release. The potential users
  of this issue are very much interested in a basic working version.
  They may be able to live without these advanced features. May be we
  can have another jira issue for enhancements which may/may not go into
  1.3 (depending on when it happens).
 
 
 
 
  -Grant
 
  On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:
 
   The current patch has been broken for some days now and implementing a
  correct query parsing logic may take time to get right. Let's not aim
  for
  everything to get into the 1.3 release.
 
  I would like to cut down the scope of this issue to a implementation
  that
  indexes files and Lucene indices (both Solr and arbitary) and gives
  suggestions while using the correct analyzer for multi-word queries.
  Let's
  get a spell checker working and commit it. We can deal with more
  enhancements like abstractions for custom spellcheckers and query
  parsing
  etc. in another issue which can be dealt with separately (in 1.3 or
  after).
  Thoughts? If there is a general consensus, I can give a new patch which
  can
  be good enough to go in.
 
  On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) 
  [EMAIL PROTECTED]
  wrote:
 
 
  [
 
 
  
 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256
  #action_12601256]
 
  Oleg Gnatovskiy commented on SOLR-572:
  --
 
  I installed

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-03 Thread Shalin Shekhar Mangar
The current patch has been broken for some days now and implementing a
correct query parsing logic may take time to get right. Let's not aim for
everything to get into the 1.3 release.

I would like to cut down the scope of this issue to a implementation that
indexes files and Lucene indices (both Solr and arbitary) and gives
suggestions while using the correct analyzer for multi-word queries. Let's
get a spell checker working and commit it. We can deal with more
enhancements like abstractions for custom spellcheckers and query parsing
etc. in another issue which can be dealt with separately (in 1.3 or after).
Thoughts? If there is a general consensus, I can give a new patch which can
be good enough to go in.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED]
wrote:


[
 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256]

 Oleg Gnatovskiy commented on SOLR-572:
 --

 I installed the latest patch. Still getting a NPE. Here is my config:

 searchComponent name=spellcheck
 class=org.apache.solr.handler.component.SpellCheckComponent
lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst

 lst name=spellchecker
  str
 name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
   str name=sourceLocationspellings.txt/str
   str name=characterEncodingUTF-8/str
   str name=fieldTypetext_ws/str
  str
 name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str
/lst
  /searchComponent


 Here is the URL I am hitting:
 http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

 Here is the error:

 HTTP Status 500 - null java.lang.NullPointerException at
 org.apache.lucene.index.Term.init(Term.java:39) at
 org.apache.lucene.index.Term.init(Term.java:36) at
 org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
 at
 org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
 at
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)

 spelling.txt is in my solr/home/conf.

  Spell Checker as a Search Component
  ---
 
  Key: SOLR-572
  URL: https://issues.apache.org/jira/browse/SOLR-572
  Project: Solr
   Issue Type: New Feature
   Components: spellchecker
 Affects Versions: 1.3
 Reporter: Shalin Shekhar Mangar
 Assignee: Grant Ingersoll
 Priority: Minor
  Fix For: 1.3
 
  Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
 
 
  Expose the Lucene contrib SpellChecker as a Search Component. Provide the
 following features:
  * Allow creating a spell index on a given field and make it possible to
 have multiple spell indices -- one for each field
  * Give suggestions on a per-field basis
  * Given a multi-word query, give only one 

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-03 Thread Otis Gospodnetic
I'm +1 on getting the basic stuff done and committed for 1.3.
If Grant is hot on getting the abstractions in for 1.3, he will do so, but I 
think it's OK to get this done in 2 parts:
1) core working and committed for 1.3
2) abstractions working and committed after 1.3 if Grant doesn't finish them 
before 1.3

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Shalin Shekhar Mangar [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Tuesday, June 3, 2008 3:53:10 PM
 Subject: Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
 
 The current patch has been broken for some days now and implementing a
 correct query parsing logic may take time to get right. Let's not aim for
 everything to get into the 1.3 release.
 
 I would like to cut down the scope of this issue to a implementation that
 indexes files and Lucene indices (both Solr and arbitary) and gives
 suggestions while using the correct analyzer for multi-word queries. Let's
 get a spell checker working and commit it. We can deal with more
 enhancements like abstractions for custom spellcheckers and query parsing
 etc. in another issue which can be dealt with separately (in 1.3 or after).
 Thoughts? If there is a general consensus, I can give a new patch which can
 be good enough to go in.
 
 On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) 
 wrote:
 
 
 [
  
 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256]
 
  Oleg Gnatovskiy commented on SOLR-572:
  --
 
  I installed the latest patch. Still getting a NPE. Here is my config:
 
  
  class=org.apache.solr.handler.component.SpellCheckComponent
 
   
   false
   
   false
   
   1
 
 
 
   
  name=classnameorg.apache.solr.spelling.FileBasedSpellChecker
   external
   spellings.txt
   UTF-8
   text_ws
   
  name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex
 
   
 
 
  Here is the URL I am hitting:
  
 http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true
 
  Here is the error:
 
  HTTP Status 500 - null java.lang.NullPointerException at
  org.apache.lucene.index.Term.(Term.java:39) at
  org.apache.lucene.index.Term.(Term.java:36) at
  
 org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
  at
  
 org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
  at
  
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
  at
  
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
  at
  
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
  
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
  at
  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
  at
  
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
  
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
  
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
  
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
  at
  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
  at
  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
  
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at
  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
  at
  org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
  at
  
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
  at java.lang.Thread.run(Thread.java:619)
 
  spelling.txt is in my solr/home/conf.
 
   Spell Checker as a Search Component
   ---
  
   Key: SOLR-572
   URL: https://issues.apache.org/jira/browse/SOLR-572
   Project: Solr
Issue Type: New Feature
Components: spellchecker
  Affects Versions: 1.3
  Reporter: Shalin Shekhar Mangar
  Assignee: Grant Ingersoll
  Priority: Minor
   Fix For: 1.3
  
   Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
  SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
  SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
  SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-03 Thread Grant Ingersoll
I will be back on it tomorrow and will see this through before 1.3  
with the abstractions.  In other words, -1 on cutting this off  
prematurely.  :-)  Since I don't think this is the only thing holding  
up 1.3, let's just play it out and get it right so all of us are happy.


-Grant

On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:


The current patch has been broken for some days now and implementing a
correct query parsing logic may take time to get right. Let's not  
aim for

everything to get into the 1.3 release.

I would like to cut down the scope of this issue to a implementation  
that

indexes files and Lucene indices (both Solr and arbitary) and gives
suggestions while using the correct analyzer for multi-word queries.  
Let's

get a spell checker working and commit it. We can deal with more
enhancements like abstractions for custom spellcheckers and query  
parsing
etc. in another issue which can be dealt with separately (in 1.3 or  
after).
Thoughts? If there is a general consensus, I can give a new patch  
which can

be good enough to go in.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] 


wrote:



  [
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256 
#action_12601256]


Oleg Gnatovskiy commented on SOLR-572:
--

I installed the latest patch. Still getting a NPE. Here is my config:

searchComponent name=spellcheck
class=org.apache.solr.handler.component.SpellCheckComponent
  lst name=defaults
!-- omp = Only More Popular --
str name=spellcheck.onlyMorePopularfalse/str
!-- exr = Extended Results --
str name=spellcheck.extendedResultsfalse/str
!--  The number of suggestions to return --
str name=spellcheck.count1/str
  /lst

   lst name=spellchecker
str
name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
str name=nameexternal/str
 str name=sourceLocationspellings.txt/str
 str name=characterEncodingUTF-8/str
 str name=fieldTypetext_ws/str
str
name=indexDir/usr/local/apache/lucene/solr2home/solr/data/ 
spellIndex/str

  /lst
/searchComponent


Here is the URL I am hitting:
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

Here is the error:

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.lucene.index.Term.init(Term.java:39) at
org.apache.lucene.index.Term.init(Term.java:36) at
org 
.apache 
.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java: 
228)

at
org 
.apache 
.solr 
.spelling 
.AbstractLuceneSpellChecker 
.getSuggestions(AbstractLuceneSpellChecker.java:71)

at
org 
.apache 
.solr 
.handler 
.component.SpellCheckComponent.process(SpellCheckComponent.java:177)

at
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:153)

at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
125)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)

at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
274)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:235)

at
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
206)

at
org 
.apache 
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

at
org 
.apache 
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:175)

at
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)

at
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

at
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
109)

at
org 
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
286)

at
org 
.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
844)

at
org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint 
$Worker.run(JIoEndpoint.java:447)

at java.lang.Thread.run(Thread.java:619)

spelling.txt is in my solr/home/conf.


Spell Checker as a Search Component
---

   Key: SOLR-572
   URL: https://issues.apache.org/jira/browse/SOLR-572
   Project: Solr
Issue Type: New Feature
Components: spellchecker
  Affects Versions: 1.3
  Reporter: Shalin Shekhar Mangar
  Assignee: Grant Ingersoll
  Priority: Minor
   Fix For: 1.3

   Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,

SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll [EMAIL PROTECTED] wrote:
 I will be back on it tomorrow and will see this through before 1.3 with the
 abstractions.  In other words, -1 on cutting this off prematurely.  :-)
  Since I don't think this is the only thing holding up 1.3, let's just play
 it out and get it right so all of us are happy.

This feature may not be holding back 1.3 release. The potential users
of this issue are very much interested in a basic working version.
They may be able to live without these advanced features. May be we
can have another jira issue for enhancements which may/may not go into
1.3 (depending on when it happens).




 -Grant

 On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:

 The current patch has been broken for some days now and implementing a
 correct query parsing logic may take time to get right. Let's not aim for
 everything to get into the 1.3 release.

 I would like to cut down the scope of this issue to a implementation that
 indexes files and Lucene indices (both Solr and arbitary) and gives
 suggestions while using the correct analyzer for multi-word queries. Let's
 get a spell checker working and commit it. We can deal with more
 enhancements like abstractions for custom spellcheckers and query parsing
 etc. in another issue which can be dealt with separately (in 1.3 or
 after).
 Thoughts? If there is a general consensus, I can give a new patch which
 can
 be good enough to go in.

 On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED]
 wrote:


  [

 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256]

 Oleg Gnatovskiy commented on SOLR-572:
 --

 I installed the latest patch. Still getting a NPE. Here is my config:

 searchComponent name=spellcheck
 class=org.apache.solr.handler.component.SpellCheckComponent
  lst name=defaults
!-- omp = Only More Popular --
str name=spellcheck.onlyMorePopularfalse/str
!-- exr = Extended Results --
str name=spellcheck.extendedResultsfalse/str
!--  The number of suggestions to return --
str name=spellcheck.count1/str
  /lst

   lst name=spellchecker
str
 name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
str name=nameexternal/str
 str name=sourceLocationspellings.txt/str
 str name=characterEncodingUTF-8/str
 str name=fieldTypetext_ws/str
str

 name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str
  /lst
 /searchComponent


 Here is the URL I am hitting:

 http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

 Here is the error:

 HTTP Status 500 - null java.lang.NullPointerException at
 org.apache.lucene.index.Term.init(Term.java:39) at
 org.apache.lucene.index.Term.init(Term.java:36) at

 org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
 at

 org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
 at

 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
 at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at

 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at

 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
 at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)

 spelling.txt is in my solr/home/conf.

 Spell Checker as a Search Component
 ---

   Key: SOLR-572
   URL: https://issues.apache.org/jira/browse/SOLR-572
   Project: Solr
Issue Type: New Feature
Components: spellchecker
  Affects Versions: 1.3
  

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-30 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601031#action_12601031
 ] 

Noble Paul commented on SOLR-572:
-

We must consider committing a basic version of spellchecker without the 
intelligent query parsing etc. Most of the users need will be met . Adding 
enhancements later is not a bad idea. (as long as we are not breaking backward 
compatibility)



 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-30 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

I installed the latest patch. Still getting a NPE. Here is my config:

searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst

lst name=spellchecker
  str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
  str name=sourceLocationspellings.txt/str
  str name=characterEncodingUTF-8/str
  str name=fieldTypetext_ws/str
  str 
name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str
/lst
  /searchComponent


Here is the URL I am hitting: 
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

Here is the error:

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.lucene.index.Term.init(Term.java:39) at 
org.apache.lucene.index.Term.init(Term.java:36) at 
org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
 at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at 
java.lang.Thread.run(Thread.java:619)

spelling.txt is in my solr/home/conf.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600520#action_12600520
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Grant, unless I'm mistaken, the reason to add spellcheck.q parameter was to 
avoid the tedious query parsing logic that may be needed to extract 
spellcheckable terms from the q parameter. Do we really need to do this? All 
the extra things in the q parameter are usually added by the frontend itself, 
isn't it?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600524#action_12600524
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
Grant, unless I'm mistaken, the reason to add spellcheck.q parameter was to 
avoid the tedious query parsing logic that may be needed to extract 
spellcheckable terms from the q parameter. Do we really need to do this? All 
the extra things in the q parameter are usually added by the frontend itself, 
isn't it?
{quote}

Is that practical?  How would an application even know how to generate 
spellcheck.q without parsing, etc.?   I think the component should just work on 
the input query.  I guess I hadn't really thought about the need for 
spellcheck.q before, but now that you put it in that light, I am not sure I see 
the need for it.

I don't think all the extra things are necessarily added by the application.  
Users can input range queries, etc.  The point is, it all depends on the 
application.

At any rate, it is trivial to override the SpellingQueryConverter to not do the 
original REGEX and just apply the analyzer to produce the tokens.  I suppose, 
we could offer two converters, one w/ the regex, and one without, or it could 
just have a flag.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600563#action_12600563
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

I still have some issues. Here is my config:
  lst name=spellchecker
  str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
  str 
name=sourceLocation/usr/local/apache/lucene/solr1home/conf/spellings.txt/str
  str name=fieldword/str
  str name=characterEncodingUTF-8/str
  !--str 
name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str--
/lst
But why do I need a field for a filebased dictionary? Also is the correct way 
to call this URL: 
http://wil1devsch1.cs.tmcs:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.builld=true
 ?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600582#action_12600582
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Oleg -- You shouldn't need field for a file-based dictionary. fieldType is 
optional for file-based dictionary. field is necessary only when you're using 
a IndexBasedSpellChecker. If you're running into a problem it's a bug. Except 
for the double L in spellcheck.build in your URL, everything else looks Ok.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600590#action_12600590
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Here is what I am getting:

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.lucene.index.Term.init(Term.java:39) at 
org.apache.lucene.index.Term.init(Term.java:36) at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:67)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:160)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at 
java.lang.Thread.run(Thread.java:619)

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600217#action_12600217
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Did you guys change the required URL parameters structure? I am hitting the 
following URL: 
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=default
 and I am getting a nullpointer exception. The config is the one from the 
sample, and I am using the latest patch.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600269#action_12600269
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I haven't applied/tried the latest patch yet, but maybe it's
quicker/better to ask here.  I'm wondering/worried about the case
where the input is a multi-term query string and a subset (e.g. 2 of 5
terms) of the query terms is misspelled.

For example, what happens when the query is:

london brigge is fallinge down
(my 2 year old's current hit)

In this case the suggestions should be:
# brigge = bridge
# fallinge = falling (or fall, more likely)

Is there something in the response that will allow the client to
figure out the positioning of the spelling suggestions and piece
together the ideal alternative query, in this case london bridge is
falling/fall down?

Ideally, the client could piece the new query string, so that it can, for 
example, italicize the misspelled words (see Google's DYM).  If the current 
SCRH returns the final corrected string, e.g. london bridge is falling down 
the client has no easy/accurate way of figuring out what was changed, I think.  
If the SCRH returned some mark-up that told the client which word(s) changed, 
then the client could do something with those changed words, e.g. london 
bridge{was:brigge}

Or, if that has problems, maybe each word should be returned separately and 
sequentially:

word=london/ !-- unchanged --
word=briggebridge/word

or maybe with offset info:

word=london offset=0/ !-- unchanged --
word=brigge offset=6bridge/word

Thoughts?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600272#action_12600272
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hello. I am hitting 
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=defaultspellcheck.build=true
 when trying to build the dictionary. My config looks this this: 
searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst
lst name=spellchecker
  str 
name=classnameorg.apache.solr.spelling.IndexBasedSpellChecker/str
  str name=namedefault/str
  str name=fieldTypetext_ws/str
  str 
name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str

/lst
lst name=spellchecker
  str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
  str name=sourceLocationspellings.txt/str
  str name=fieldTypetext_ws/str
  str name=characterEncodingUTF-8/str
  str 
name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str
/lst
/searchComponent


And the NPE is:

SEVERE: java.lang.NullPointerException
at 
org.apache.solr.util.HighFrequencyDictionary.init(HighFrequencyDictionary.java:48)
at 
org.apache.solr.spelling.IndexBasedSpellChecker.loadLuceneDictionary(IndexBasedSpellChecker.java:103)
at 
org.apache.solr.spelling.IndexBasedSpellChecker.build(IndexBasedSpellChecker.java:84)
at 
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:133)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:132)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600275#action_12600275
 ] 

Grant Ingersoll commented on SOLR-572:
--

I'm working on it.  Will have a new patch soon.




--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600277#action_12600277
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Is it an actual error, or was I missing something?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600279#action_12600279
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

In response to Ottis, I don't think each word should be returned individually. 
In fact it should probably return the entire phrase, with the suggestions 
inserted. I believe that is what google does.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600283#action_12600283
 ] 

Grant Ingersoll commented on SOLR-572:
--




All you see from Googs is their frontend, so who knows what their  
spell checker does.  I think we should return the words individually,  
the application is responsible for doing the sewing together of the  
new string, IMO.





 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600284#action_12600284
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Should we return suggestions only for the misspelled words, or should we echo 
the correctly spelled ones as well?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600294#action_12600294
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Right, Google only shows you the final output, not what they do in the backend.
But the fact that they italicize misspelled words tells us they have a 
mechanism that allows the front end to identify them.
So I think our task here is to figure out the best/easiest way for the client 
to identify misspelled words and offer the alternative query to the end user.

I think what I outlined above will do that for us:
* output all words sequentially
* mark the words that are misspelled - it may be best to return the original 
word plus corrected word:

word=london/ !-- unchanged --
word=briggebridge/word

or maybe with offset info:

word=london offset=0/ !-- unchanged --
word=brigge offset=6bridge/word

It's also fine to (*also*) return the final corrected string that doesn't mark 
the corrected words in any way, and let the lazy clients just use that.

Grant or Shalin, will either of you be adding this?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600320#action_12600320
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
Grant or Shalin, will either of you be adding this?
{quote}
Yes, I am working on it.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600323#action_12600323
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

I am still confused about my NPE. Was that a config issue on my part, or was it 
a bug? The way Grant said he was working on it, I assumed that it was a bug :-)

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Grant Ingersoll


On May 27, 2008, at 8:25 PM, Oleg Gnatovskiy (JIRA) wrote:



   [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600323 
#action_12600323 ]


Oleg Gnatovskiy commented on SOLR-572:
--

I am still confused about my NPE. Was that a config issue on my  
part, or was it a bug? The way Grant said he was working on it, I  
assumed that it was a bug :-)


Sorry, I meant I was working on the token alignment issue.   I will  
look at this, too, though.






Spell Checker as a Search Component
---

   Key: SOLR-572
   URL: https://issues.apache.org/jira/browse/SOLR-572
   Project: Solr
Issue Type: New Feature
Components: spellchecker
  Affects Versions: 1.3
  Reporter: Shalin Shekhar Mangar
  Assignee: Grant Ingersoll
  Priority: Minor
   Fix For: 1.3

   Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,  
SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,  
SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch



Expose the Lucene contrib SpellChecker as a Search Component.  
Provide the following features:
* Allow creating a spell index on a given field and make it  
possible to have multiple spell indices -- one for each field

* Give suggestions on a per-field basis
* Given a multi-word query, give only one consistent suggestion
* Process the query with the same analyzer specified for the source  
field and process each token separately

* Allow the user to specify minimum length for a token (optional)
Consistency criteria for a multi-word query can consist of the  
following:

* Preserve the correct words in the original query as it is
* Never give duplicate words in a suggestion


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.






[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600326#action_12600326
 ] 

Grant Ingersoll commented on SOLR-572:
--

Your field is null for your Lucene configuration.  You need to  
specify:

str name=fieldfieldName/str

You have fieldType instead.

-Grant





 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599403#action_12599403
 ] 

Grant Ingersoll commented on SOLR-572:
--

Is the prepare thread-safe for dictionary creation?  Seems like there is a 
race-condition on the construction of the dictionaries.  I suppose we need a 
synchronize in there.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599426#action_12599426
 ] 

Grant Ingersoll commented on SOLR-572:
--

Otis,

What's the use case behind:
{quote}
Oh, I see, you are reading field values from the index of the current core. I 
think that is fine, but wouldn't it also be good to be able to read field 
values from a vanilla Lucene index?
{quote}

Seems kind of strange based on what I know of index-based spelling, but I don't 
know everything about it.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599478#action_12599478
 ] 

Grant Ingersoll commented on SOLR-572:
--

Also included in that last patch is a (proposed) sample configuration:

{code}
searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=spellchecker
  str 
name=classnameorg.apache.solr.spelling.IndexBasedSpellChecker/str
lst name=dictionary
  str name=namedefault/str
  str name=fieldword/str
  str name=indexDirc:/temp/spellindex/str
/lst
/lst
lst name=spellchecker
  str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  lst name=dictionary
str name=nameexternal/str
str name=sourceLocationspellings.txt/str
str name=characterEncodingUTF-8/str
str name=spellcheckIndexDir ./spellchecker/str
  /lst

/lst


  /searchComponent
{code}

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599494#action_12599494
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I'm still confused with some of the names in that config.
indexDir looks like the path to the spellchecker index.  But there is also 
spellcheckInexDir.  Is there a functonal difference?

Regarding the wouldn't it also be good to be able to read field values from a 
vanilla Lucene index? - the use case is that not all source indices should 
have to be Solr indices.  What if I have a vanilla Lucene index on the machine 
and I want the SCRH to build a SC index from that index's title field?  That 
is, I want the functionality of SCRH, but I don't have my Lucene index under 
Solr.  Is that doable?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599498#action_12599498
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I think the choice of appropriate suggestions should be left to the user of 
this service.  If it's easily doable, let's make it possible and put 
information about frequencies in an appropriate place.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599558#action_12599558
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Shalin/Grant:

I think Bojan brings up some good questions:
https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12598752#action_12598752

It looks like the call to SpellChecker.exist(...) really got lost:
$ curl --silent 
https://issues.apache.org/jira/secure/attachment/12382691/SOLR-572.patch | grep 
'exist('


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599166#action_12599166
 ] 

Grant Ingersoll commented on SOLR-572:
--

OK, I'm working on this.  

Some thoughts:
1. Why is the initialization done in prepare?  Just to be a little more lazy 
than in init?

2. In FieldSpellChecker, the getSuggestion method goes through and creates the 
suggested map, but then the loop over the entry set at the end only uses the 
value.   I think our response should return the associated correction with the 
original token. 

3. I'm working on the abstraction notion.  Basic idea is to pass off and create 
something like AbstractSpellChecker with a LuceneSpellChecker instantiation and 
handles the loading, etc. like is currently in the SpellCheckComponent and 
implements getSuggestion() like you have.  The goal is to have a common 
response, no matter the spell checker, so that we can plug and play spell 
checkers.  I hope to have a patch soon.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599168#action_12599168
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Grant, please hold on a bit. I'm working on the patch too and it has some 
refactorings which may make merging two patches difficult. I'll post my patch 
in a few minutes and then you can take over.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599170#action_12599170
 ] 

Grant Ingersoll commented on SOLR-572:
--

OK.  Kind of too late, but no worries, I will manage the merge.



 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599202#action_12599202
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Otis -- Sorry, I missed your post earlier. I can't think of a use-case for 
adding frequency information to plain text files. Spell checker's utility comes 
from the fact that it can suggest keywords for which Solr can return documents. 
That is possible only when the tokens (or synonyms) are present in the Solr 
index. Plain text dictionaries will be used to add additional common keywords 
which may not be in the Solr fields used for suggestions but may be present in 
huge fields which you don't want to add to spell checker. For example, I may 
build my index only on vehicle brands but I may like to include terms such as 
cars, manufacturer, make from plain text files, which may be present in 
my huge default search field. Since the intent would be just to match some 
document with the given suggestion, frequency may not play a significant role 
here, IMHO. What do you think?

Bojan -- I think we should include an exists flag in the response. As for 
your point of queries with non-simple tokens, we can introduce another param 
like spellcheck.q to which the application can set the simple query. End 
users almost never know that Solr is running behind the scenes and the Solr 
queries are constructed by the application itself which can send the simple 
query in this way.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599222#action_12599222
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Shalin -- I think you are right.  I looked at SpellChecker again and see that 
the frequency in the main/searchable index is checked at suggest time, 
regardless of what the source of dictionary words (index or file), so frequency 
will be accounted for even when words are loaded from plain-text dictionary 
files.

Unless I'm still missing something, that means that onlyMorePopular *can* (or 
*should*!) be used even when words are loaded from plain-text dictionary files. 
 No?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598716#action_12598716
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hey guys, I am having trouble creating a file-based dictionary.

The file looks like this: 

american
mexican
clothes
shoes

and it is in my solr.home/conf directory.

The solrConfig has the following: searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=dictionary
str name=nameexternal/str
str name=typefile/str
str name=sourceLocationspellings.txt/str
str name=characterEncodingUTF-8/str
str name=spellcheckIndexDir/home/csweb/index/str
/lst
  /searchComponent

I hit it with the following URL: 
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=external

and I get the following stacktrace:

SEVERE: java.lang.NullPointerException
at 
org.apache.lucene.search.spell.SpellChecker.indexDictionary(SpellChecker.java:321)
at 
org.apache.solr.handler.component.SpellCheckComponent$FieldSpellChecker.init(SpellCheckComponent.java:391)
at 
org.apache.solr.handler.component.SpellCheckComponent.loadExternalFileDictionary(SpellCheckComponent.java:204)
at 
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:131)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:133)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:966)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)


Any idea what I am doing wrong? Thanks!

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598728#action_12598728
 ] 

Bojan Smid commented on SOLR-572:
-

I already found the same problem, made a fix and sent it to Shalin, he will 
incorporate it into next patch when it's ready. If you specify field field 
type for that dictionary (and that field type can be found in Solr schema), 
you'll avoid the problem for now.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598727#action_12598727
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Haven't looked at the code, but the first thing I'd try is using a 
full/absolute path to your dictionary file.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598733#action_12598733
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Just got an idea.  File-based dictionaries don't have word frequency 
information and with that we use certain value (e.g. so onlyMorePopular cannot 
be used).  What if we (also) accepted plain-text field dictionaries that 
included word frequency information?
e.g.
ball,100
boil,44
bowl,77
...
I'm not looking at sources now, but could we not feed this word frequency 
information into Lucene SC, so it makes use of that when figuring out top-N 
best words to suggest?

And how would we figure out the frequency of each word to begin with?  I 
imagine we can have a tool/class that, given a path to a dictionary file with 
words and a path to a Lucene/Solr index, looks up each dictionary word's 
frequency in the given index and outputs word,freq for each word.  This 
class could live in Lucene SC, but could be used by SCRH when rebuilding the SC 
index for example.

Does this sound useful and implementable?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598735#action_12598735
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Do you mean adding something like str name=fieldword/str to the 
definition for the file-based dictionary?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598738#action_12598738
 ] 

Bojan Smid commented on SOLR-572:
-

Oleg, that field is now called fieldType, so something like str 
name=fieldTypeword/str should work for you as long as you have fileType 
with name word defined in your schema.xml. Let me know if this works.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598752#action_12598752
 ] 

Bojan Smid commented on SOLR-572:
-

I noticed that when searching for suggestion for a word which exists in 
dictionary, SC returns some similar word instead of returning that same word. 
Old SCRH had field exist which returned true if word exists in the dictionary 
(so the client can treat it as correct word that doesn't need suggestion). 

We can't have exactly the same functionality here (since multi-word queries 
should be supported), but we can make SC return field spellingCorrect in case 
all words from the query exist in the dictionary. Otherwise, there is no way to 
know if spelling was correct or we should display suggestion.

There is a method in Lucene's SC to check if word exists in the index, so it's 
easy to check if word is correct. However, I'm also thinking of situation when 
we don't have just simple words in the query, for instance : toyata AND 
miles:[1 to 1], we want to check just toyata in the index, and return 
suggestion toyota AND miles:[1 to 1]. Other query types which might pose 
a problem are:
- fuzzy query
- wildcard query
- prefix query
...

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598801#action_12598801
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Yes, I've actually run into that problem too. Do you think this is something 
that you will be able to solve?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598835#action_12598835
 ] 

Bojan Smid commented on SOLR-572:
-

Sure, a quick fix can be done easily, but it probably wouldn't cover all 
possibilities, hence my post...

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-20 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598512#action_12598512
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hey guys I created a dictionary index from the following XML file:
add
  doc
field name=id10/field
field name=wordpizza/field
  /doc
  doc
field name=id11/field
field name=wordclub/field
  /doc
  doc
field name=id12/field
field name=wordbar/field
  /doc
/add
My config is the following:
searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=dictionary
str name=namedefault/str
str name=typeindex/str
str name=fieldword/str
!--str name=indexDirc:/temp/spellindex/str--
/lst
 /searchComponent
and word is defined in schema.xml as:
field name=word   type=stringindex=true  stored=true 
required=false/

When I run a query with the following URL:
http://localhost:8983/solr/select/?q=barrspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10
I get the following response:
lst name=spellcheck
lst name=suggestions
int name=numFound1/int
arr name=barr
strbar/str
/arr
/lst
/lst
which is what I expect.
However with this URL:
http://wil1devsch1.cs.tmcs:8983/solr/select/?q=barspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10
 where bar is correctly spelled, I get the following:
lst name=spellcheck
lst name=suggestions
int name=numFound1/int
arr name=bar
strbarr/str
/arr
/lst
/lst
Could you please tell me where the word barr is coming from, and why it is 
being suggested? 

Thanks!

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-20 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598533#action_12598533
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hey guys please disregard my last comment, I had a configuration issue that 
caused the problem. I was just wondering if there is a way to get the 
suggestions not to echo the query if there are no suggestions. For example a 
query where q=food probably should return a suggestion of food.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-20 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598549#action_12598549
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Oleg -- Thanks for trying out the patch. No, currently it does not signal if 
suggestions are not found, it just returns the query terms themselves. I'll add 
that feature.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-19 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597878#action_12597878
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Ok, onlyMorePopular and extendedResults will only be supported for dictionaries 
built from Solr fields.

Yes, the Lucene SpellChecker does create n-grams but think about lowercasing, 
stemming etc. All this analysis can potentially change the word which 
eventually gets n-grammed by Lucene.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-19 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597913#action_12597913
 ] 

Bojan Smid commented on SOLR-572:
-

I would like to add support for different character encodings in file-based 
dictionaries (current implementation will take system's default settings). I'm 
not sure how we'll synchronize your work with my fix? Can you let me know 
when/how can I start my work?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-19 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597917#action_12597917
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Bojan -- I don't want to hold you up so I've uploaded the current state of my 
work. Please go ahead with your changes. I can continue after you're done.

Another issue I noticed with the SCRH is that it accepts the accuracy as a 
request parameter and calls Lucene SpellChecker.setAccuracy before getting the 
suggestion. However, this is neither thread-safe nor can we guarantee that the 
accuracy is actually enforced for the suggestion. Therefore, I think we should 
only have accuracy configurable in the solrconfig.xml and not as a request 
parameter.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-18 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597814#action_12597814
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Grant - I was trying to implement the onlyMorePopular and extendedResults 
format of SCRH when I realized that supporting such a response is not possible 
for text file based dictionaries in the current implementation. Currently, we 
use Lucene's PlainTextDictionary to load such text files and we don't maintain 
any frequency information. What do you suggest?

Bojan/Otis - The terms loaded from the text files are passed onto Lucene's 
SpellChecker as it is. As per Noble's suggestion, I've added support for a 
optional fieldType attribute (this type must be defined in schema.xml). This 
type's query analyzer is used for queries. Wouldn't it be more consistent to 
apply the index-analyzer during index time also?

Both the above problems can be solved if we keep the words loaded from the text 
files in a Lucene index but I'm not sure if we want to go that way.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-18 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597816#action_12597816
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Shalin
I think the onlyMorePopular and extendedResults should be optional, so in case 
of plain text dictionaries this information would just not be present if we 
cannot derive it.  Even if we take words from plain text files and index them 
into a Lucene index their frequency will remain 1.

Does the index-time analyzer make sense?  I don't have the sources handy, but 
doesn't Lucene SC take the input word and chop it up into 2- and 3-grams before 
indexing?  If so, how would index-time analyzer come into play?

In principal, if taking plain text files and indexing words in them into a 
Lucene SC index solves problems, I think that's acceptable - such indices are 
likely to be relatively small, so they should be quick to build and not require 
a lot of memory.



 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597453#action_12597453
 ] 

Noble Paul commented on SOLR-572:
-

Adding a 'field' attribute is not intuitive. If your data needs custom 
analyzers create an extra 'type' in the schema and let us dd an extra attribute 
'dataType'  eg:

{code:xml}
str name=fieldTypemy_new_data_type/str
{code}


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597455#action_12597455
 ] 

Grant Ingersoll commented on SOLR-572:
--

Patch applies cleanly.  Very cool that we have something concrete finally

Some thoughts:
1. I don't believe we use author tags (is this a Solr policy?  I know it is a 
Lucene Java convention)
2. There needs to be unit tests
3. I think it makes sense to have the option to return extended results
4. I don't think it should be a default search component, but will defer to 
others.
5. numFound should be returned when count  1 as well, right?  In other words, 
the structure should be the same for the response no matter what in:
{code}
if (count  1) {
response.add(suggestions, spellChecker.getSuggestions(q, count));
  } else {
NamedList suggestions = new NamedList();
suggestions.add(numFound, 1);
suggestions.add(q, spellChecker.getSuggestion(q));
response.add(suggestions, suggestions);
  }
{code}
That way it can be handled uniformly on the client

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-16 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597466#action_12597466
 ] 

Bojan Smid commented on SOLR-572:
-

The field attribute for file-based dictionary is basically the same field 
attribute as in default dictionary (in both cases they are used to obtain query 
analyzer), so that is the reason why I used the same name. My question was is 
it ok for default dictionary to use the same field to build dictionary from 
solr index and to obtain query analyzer for extracting tokens?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597472#action_12597472
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Bojan -- Thanks for adding this functionality. I'll work on making things more 
configurable like SCRH and add a few tests. I think it is OK and may even be 
needed for a few cases. Though I prefer Noble's suggestion on having fieldType 
instead of field since it gives more freedom to the user.

Grant -- Thanks for looking into the patch. My comments below:
# Right, those were generated by my IDE, I'll remove it in the next patch
# Agree
# Agree, both 2 and 3 are on my todo list
# I don't understand what you mean by defer to others but on making this 
default or not, I'm fine either way.
# Actually, the spellChecker.getSuggestion(q, count) returns a complete named 
list, which already has the numFound element. If you don't specify the count, 
then it gives back only a String for which we need to create a NamedList 
ourselves. In other words, the response format is actually the same both ways.

Noble -- I your suggestion on keeping a fieldType attribute in the 
configuration for non-Solr dictionaries. We can use the QueryAnalyzer defined 
for the given fieldType in Solr's schema. If this attribute is not present, we 
can default to WhitespaceAnalyzer or StandardAnalyzer.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597481#action_12597481
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
I don't understand what you mean by defer to others but on making this 
default or not, I'm fine either way.
{quote}

Just meaning, I'm not the only one who has a say in whether or not is a default 
component.  My guess is not everyone will want it in the default list of 
components.

Very cool on the other stuff.

One other thing to think about:  What if we want a different underlying spell 
checker?  The Lucene spell checker approach isn't exactly state of the art as 
far as I understand it.  Obviously not your concern at the moment, but might be 
good to think about the ability to interchange the underlying implementation by 
abstracting the notion of spelling a bit while still maintaining the same 
search component interface. 

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597494#action_12597494
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Grant - I agree it would be nice.  But let's get this one in first.  Perhaps 
you can add that idea to the list in SOLR-507.


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-15 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597345#action_12597345
 ] 

Noble Paul commented on SOLR-572:
-

 * the spellcheck.dictionary=default must be optional in query. The user must 
be able to name a dictionary as 'default' and that can be used as the default 
if no value is passed.
 



 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-15 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597351#action_12597351
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I had a quick look and it all looks nice and clean.
I like the config, though I think solr is too specific - the source field 
could be in a vanilla Lucene indexthat lives somewhere on disk, or example.  
Thus, I'd change solr to index.  Oh, I see, you are reading field values 
from the index of the current core.  I think that is fine, but wouldn't it also 
be good to be able to read field values from a vanilla Lucene index? (but you 
wouldn't know the field type and thus would not be able to get the Analyzer for 
the field)

Also, and regardless of the above, instead of having indexDir and path, why 
not call them both location and maybe even let them include the file: schema 
for consistency, if it works with the code that uses those locations?

Also on TODO:
* Read dictionary from plain-text files.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-15 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597354#action_12597354
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Otis, I agree that we should call index' instead of solr for the type and 
path can be renamed to location. But indexDir refers to the target for the 
spell check index whereas path currently refers to the source of the 
dictionary, so IMHO we should keep indexDir as it is (It can also be a 
relative path).

For supporting arbitrary lucene indices, user must specify type=index, 
field=fieldName, location=path/to/lucene/index/directory which should be 
enough (TODO). In that case the analyzer can be fixed as something (say 
WhitespaceAnalyzer or StandardAnalyzer).

I'm not sure I understand your comment on the schema. If this is for text files 
then I was thinking more about having a text file which would have one word per 
line and all those words would go into the same dictionary.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-15 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597358#action_12597358
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I see (indexDir comment).  Might be better to make it more obvious then - 
sourceIndex for the Lucene index that serves as the source of data) vs. 
targetIndex (or spellcheckerIndex) for the resulting spellchecker index.

For Lucene indices to be used as sources of data type=index, 
field=fieldName, location=path/to/lucene/index/directory makes sense.

Ignore my comment about the schema, I'm just complicating things with that.  
Yes, one word per line for plain-text file data sources - that can easily be 
digested with PlainTextDictionary class (part of Lucene SC).


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.