subject:"\[jira\] Commented\: \(SOLR\-572\) Spell Checker as a Search Component"


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602645#action_12602645
 ] 

Grant Ingersoll commented on SOLR-572:
--

One thing I haven't quite settled in my mind is the use of the File based spell 
checker.  It seems to me, that the use case for this is as an override where 
one feels the index based spelling is not correct.  Is that right?  Or am I 
missing something?

If it is the case, shouldn't we allow the option, at least, of it truly acting 
as an override?  Currently, the only way to get at it is by passing the 
dictionary name as the param.  The only way I can see this as useful is if you 
are making several round trips to the server, which means you might as well be 
using a request handler and not a search component.

Thoughts?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Bojan Smid (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602651#action_12602651
 ] 

Bojan Smid commented on SOLR-572:
-

File based spell checker would probably be used in cases when Solr index is too 
small or too young. So a user would compile a dictionary file (for instance, 
UNIX words file) and use it as a dictionary.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602654#action_12602654
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
File based spell checker would probably be used in cases when Solr index is too 
small or too young. So a user would compile a dictionary file (for instance, 
UNIX words file) and use it as a dictionary.
{quote}

But how is it useful to return results that aren't in the index?  It's not like 
querying on them results in anything useful.  Seems to me, that in this case, 
you just need to rebuild your dictionary on a regular basis.  Or is it that 
people are using Solr as a spelling server?

Now, I can see it as an override situation.  i.e. one wishes to override 
certain results from the index based one with ones that are in known to be in 
the dictionary, but are lower down.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602687#action_12602687
 ] 

Grant Ingersoll commented on SOLR-572:
--

Oleg,

Can you try specifying a field value anyway for your bug up above?  I think 
this is actually a bug in the Lucene Spell checker.  Namely, the docs say that 
the field value can be null, but, it is trying to construct a Term, which 
requires a non-null field name.

Just give it the name word, perhaps

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602706#action_12602706
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Grant -- The exception is happening because the SpellCheckComponent always 
passes Solr's own IndexReader when calling the 
AbstractLuceneSpellChecker#getSuggestions method even when the underlying spell 
checker is a FileBasedSpellChecker. In that case, since a non-null IndexReader 
is passed onto Lucene, it tries to create a term on the null field name. That 
is when the NullPointerException comes up.

Another problem will occur when using IndexBasedSpellChecker with an arbitary 
Lucene index, because then too, the Solr's IndexReader would be passed to 
Lucene SpellChecker instead of the actual index's reader.

I think a possible solution can be to add another abstract method with the same 
signature as Lucene's SpellChecker to the AbstractLuceneSpellChecker and let 
each sub-class get suggestions on it's own. That way FileBasedSpellChecker will 
pass the correct IndexReader or a null IndexReader into Lucene appropriately. 
The AbstractLuceneSpellChecker#getSuggestion will just call the underlying 
suggest method, get the String[] back and process as it does right now.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602712#action_12602712
 ] 

Otis Gospodnetic commented on SOLR-572:
---

By collate you mean that the SCRH would not only return 
suggestions/corrections for individual token, but it would also try to glue 
together an already corrected query string based on its suggestions?

Example:
Query: cogito ega sum

SCRH returns this correction:
erga - ergo

But also tries to give you the whole thing corrected:
cogito ergo sum

That?  Sounds useful - less work for the client app, should the app developers 
decide that SCRH's collated suggestions are what they would have to do 
themselves anyway.


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602743#action_12602743
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hey guys. Installed the latest patch. Old problem is still there. For example 
if I do q=piza I get:

lst name=spellcheck

lst name=suggestions

lst name=pizzza
int name=numFound1/int
int name=startOffset0/int
int name=endOffset6/int

arr name=suggestion
strpizza/str
/arr
/lst
/lst
/lst

Which is good. Then I do q=pizza (pizza is in the dictionary)

lst name=spellcheck

lst name=suggestions

lst name=pizza
int name=numFound1/int
int name=startOffset0/int
int name=endOffset5/int

arr name=suggestion
strplaza/str
/arr
/lst
/lst
/lst

I don't think it should give me that suggestion. If a word is in the dictionary 
it should not give any suggestions.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-03 Thread Noble Paul നോബിള്‍ नोब्ळ्


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602745#action_12602745
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
Grant - The exception is happening because the SpellCheckComponent always 
passes Solr's own IndexReader when calling the 
AbstractLuceneSpellChecker#getSuggestions method even when the underlying spell 
checker is a FileBasedSpellChecker. In that case, since a non-null IndexReader 
is passed onto Lucene, it tries to create a term on the null field name. That 
is when the NullPointerException comes up.
{quote}

Yep, I think I fixed this piece.  See also LUCENE-1299

{quote}
I think a possible solution can be to add another abstract method with the same 
signature as Lucene's SpellChecker to the AbstractLuceneSpellChecker and let 
each sub-class get suggestions on it's own. That way FileBasedSpellChecker will 
pass the correct IndexReader or a null IndexReader into Lucene appropriately. 
The AbstractLuceneSpellChecker#getSuggestion will just call the underlying 
suggest method, get the String[] back and process as it does right now.
{quote}

Not sure I follow the solution (I understand the problem)  Which signature are 
you suggesting?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602828#action_12602828
 ] 

Mike Klaas commented on SOLR-572:
-

[quote]Another use case is where Solr is used with indices that are not indices 
for a narrow domain or that don't have nice, clean, short fields that can be 
used for populating the SC index. For example, if the index consists of a pile 
of web pages, I don't think I'd want to use their data (not even their titles) 
to populate the SC index. I'd really want just a plain dictionary-powered 
SCRH.[/quote]

It works great, actually.  That was you get all the abbreviations, jargon, 
proper names, etc.   Thresholding help prevent most of the cruft from appearing 
in the index.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Swarag Segu (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602856#action_12602856
 ] 

Swarag Segu commented on SOLR-572:
--

Hey Guys,
I installed the latest patch and it gives me compile errors :

compile:
[mkdir] Created dir: C:\Documents and Settings\Swarag 
Segu\workspace\solrSrc\build\core
[javac] Compiling 324 source files to C:\Documents and Settings\Swarag 
Segu\workspace\solrSrc\build\core
[javac] C:\Documents and Settings\Swarag 
Segu\workspace\solrSrc\src\java\org\apache\solr\spelling\FileBasedSpellChecker.java:97:
 cannot find symbol
[javac] symbol  : variable MaxFieldLength
[javac] location: class org.apache.lucene.index.IndexWriter
[javac] true, IndexWriter.MaxFieldLength.UNLIMITED);
[javac]  ^
[javac] C:\Documents and Settings\Swarag 
Segu\workspace\solrSrc\src\java\org\apache\solr\spelling\FileBasedSpellChecker.java:96:
 internal error; cannot instantiate org.apache.lucene.index.IndexWriter.init 
at org.apache.lucene.index.IndexWriter to ()
[javac] IndexWriter writer = new IndexWriter(ramDir, 
fieldType.getAnalyzer(),
[javac]  ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors


Am I missing something?
Thanks,
Swarag.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-04 Thread Grant Ingersoll

There are working patches available on the issue without the advanced
features and everyone is free to fix the current one. It's not like
it is that far off from being able to have proper spellchecking,
pluggability, and context information about where the mistakes are. I
frankly don't get what all the fuss is about.

Is it that you disagree with the approach? That hasn't come across in
the discussions, but if it is, say so. I thought we were working on
it quite well together and made some good progress and are pretty darn
close. I don't see that I've taken away any functionality that the
original patch offers, but I did change it so that it fits a broader
audience, namely those who are interested in other spell checkers and
those who want info about where in the query the problem occurs.
Which, is what the comments suggest people are interested in and also
what I am interested in for 1.3.

And, I'm sorry, but I said I'd have to let it lie for a few days and
then I would be back to it. Cut me some slack. I don't get paid to
work on Solr full time. Is it truly that important that someone
can't wait a few days for a patch on the trunk version for something
they never had before? It ain't like we're talking some core bug here
that has everyone broken. Besides, others are perfectly welcome to
work on it in the meantime.

Sorry for the rant, but I am not going to be pressured into committing
a patch that I don't think is ready and one that I said I am going to
be working on to see it through so that we all are happy.

-Grant

On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള്‍
नोब्ळ् wrote:

On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll
[EMAIL PROTECTED] wrote:
I will be back on it tomorrow and will see this through before 1.3
with the
abstractions. In other words, -1 on cutting this off
prematurely. :-)
Since I don't think this is the only thing holding up 1.3, let's
just play

it out and get it right so all of us are happy.

This feature may not be holding back 1.3 release. The potential users
of this issue are very much interested in a basic working version.
They may be able to live without these advanced features. May be we
can have another jira issue for enhancements which may/may not go into
1.3 (depending on when it happens).

-Grant

On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:

The current patch has been broken for some days now and
implementing a
correct query parsing logic may take time to get right. Let's not
aim for

everything to get into the 1.3 release.

I would like to cut down the scope of this issue to a
implementation that

indexes files and Lucene indices (both Solr and arbitary) and gives
suggestions while using the correct analyzer for multi-word
queries. Let's

get a spell checker working and commit it. We can deal with more
enhancements like abstractions for custom spellcheckers and query
parsing

etc. in another issue which can be dealt with separately (in 1.3 or
after).
Thoughts? If there is a general consensus, I can give a new patch
which

can
be good enough to go in.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED]

wrote:

[

https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256
#action_12601256]

Oleg Gnatovskiy commented on SOLR-572:
--

I installed the latest patch. Still getting a NPE. Here is my
config:

searchComponent name=spellcheck
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=defaults
!-- omp = Only More Popular --
str name=spellcheck.onlyMorePopularfalse/str
!-- exr = Extended Results --
str name=spellcheck.extendedResultsfalse/str
!-- The number of suggestions to return --
str name=spellcheck.count1/str
/lst

lst name=spellchecker
str
name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/
str

str name=nameexternal/str
str name=sourceLocationspellings.txt/str
str name=characterEncodingUTF-8/str
str name=fieldTypetext_ws/str
str

name=indexDir/usr/local/apache/lucene/solr2home/solr/data/
spellIndex/str

/lst
/searchComponent

Here is the URL I am hitting:

http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

Here is the error:

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.lucene.index.Term.init(Term.java:39) at
org.apache.lucene.index.Term.init(Term.java:36) at

org
.apache
.lucene
.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)

org
.apache
.solr
.spelling
.AbstractLuceneSpellChecker
.getSuggestions(AbstractLuceneSpellChecker.java:71)

org
.apache
.solr
.handler
.component.SpellCheckComponent.process(SpellCheckComponent.java:
177)

org
.apache
.solr
.handler
.component.SearchHandler.handleRequestBody(SearchHandler.java:153)

org

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-04 Thread Shalin Shekhar Mangar

Hi Grant,

I did not intend to offend you or put pressure on you in any way. Please
accept my apologies if I came off as rude. In fact, I've been having a lot
of fun working with you and Bojan on this issue. We've definitely covered a
lot of ground very fast.

I completely in favor of the goals for this piece. I was merely suggesting
that with the 1.3 release being a priority, we should go one step at a time
and commit per the initial scope for this issue as written in the issue's
description and then handle the enhancements in another issue. But I'm all
for it if you want to add extra functionality within the same issue.

Once again, I'm deeply sorry if you found my comment offending in any way.

Regards,
Shalin

On Wed, Jun 4, 2008 at 4:33 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:

There are working patches available on the issue without the advanced
features and everyone is free to fix the current one. It's not like it is
that far off from being able to have proper spellchecking, pluggability, and
context information about where the mistakes are. I frankly don't get what
all the fuss is about.

Is it that you disagree with the approach? That hasn't come across in the
discussions, but if it is, say so. I thought we were working on it quite
well together and made some good progress and are pretty darn close. I
don't see that I've taken away any functionality that the original patch
offers, but I did change it so that it fits a broader audience, namely those
who are interested in other spell checkers and those who want info about
where in the query the problem occurs. Which, is what the comments suggest
people are interested in and also what I am interested in for 1.3.

And, I'm sorry, but I said I'd have to let it lie for a few days and then I
would be back to it. Cut me some slack. I don't get paid to work on Solr
full time. Is it truly that important that someone can't wait a few days
for a patch on the trunk version for something they never had before? It
ain't like we're talking some core bug here that has everyone broken.
Besides, others are perfectly welcome to work on it in the meantime.

Sorry for the rant, but I am not going to be pressured into committing a
patch that I don't think is ready and one that I said I am going to be
working on to see it through so that we all are happy.

-Grant

On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll [EMAIL PROTECTED]
wrote:

I will be back on it tomorrow and will see this through before 1.3 with
the
abstractions. In other words, -1 on cutting this off prematurely. :-)
Since I don't think this is the only thing holding up 1.3, let's just
play
it out and get it right so all of us are happy.

-Grant

On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:

The current patch has been broken for some days now and implementing a
correct query parsing logic may take time to get right. Let's not aim
for
everything to get into the 1.3 release.

I would like to cut down the scope of this issue to a implementation
that
indexes files and Lucene indices (both Solr and arbitary) and gives
suggestions while using the correct analyzer for multi-word queries.
Let's
get a spell checker working and commit it. We can deal with more
enhancements like abstractions for custom spellcheckers and query
parsing
etc. in another issue which can be dealt with separately (in 1.3 or
after).
Thoughts? If there is a general consensus, I can give a new patch which
can
be good enough to go in.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA)
[EMAIL PROTECTED]
wrote:

[

https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256
#action_12601256]

Oleg Gnatovskiy commented on SOLR-572:
--

I installed the latest patch. Still getting a NPE. Here is my config:

lst name=spellchecker
str
name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
str name=nameexternal/str
str name=sourceLocationspellings.txt/str
str name=characterEncodingUTF-8/str
str name=fieldTypetext_ws/str
str

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-04 Thread Otis Gospodnetic

Yeah, as an observer I sensed no bad intentions here.

Anyhow, 1.3 is not scheduled yet, my guess is we are still at least a few weeks
away from 1.3 (and if I had to bet I'd bet at 1.3 being released close to the
end of summer). Grant is very eager about this and will get it all in. Case
closed, I think. Nothing to see here, move along.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message
From: Shalin Shekhar Mangar [EMAIL PROTECTED]
To: solr-dev@lucene.apache.org
Sent: Wednesday, June 4, 2008 1:32:48 PM
Subject: Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

Hi Grant,

Once again, I'm deeply sorry if you found my comment offending in any way.

Regards,
Shalin

On Wed, Jun 4, 2008 at 4:33 PM, Grant Ingersoll wrote:

There are working patches available on the issue without the advanced
features and everyone is free to fix the current one. It's not like it is
that far off from being able to have proper spellchecking, pluggability, and
context information about where the mistakes are. I frankly don't get what
all the fuss is about.

Is it that you disagree with the approach? That hasn't come across in the
discussions, but if it is, say so. I thought we were working on it quite
well together and made some good progress and are pretty darn close. I
don't see that I've taken away any functionality that the original patch
offers, but I did change it so that it fits a broader audience, namely those
who are interested in other spell checkers and those who want info about
where in the query the problem occurs. Which, is what the comments suggest
people are interested in and also what I am interested in for 1.3.

And, I'm sorry, but I said I'd have to let it lie for a few days and then I
would be back to it. Cut me some slack. I don't get paid to work on Solr
full time. Is it truly that important that someone can't wait a few days
for a patch on the trunk version for something they never had before? It
ain't like we're talking some core bug here that has everyone broken.
Besides, others are perfectly welcome to work on it in the meantime.

Sorry for the rant, but I am not going to be pressured into committing a
patch that I don't think is ready and one that I said I am going to be
working on to see it through so that we all are happy.

-Grant

On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll
wrote:

-Grant

On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:

The current patch has been broken for some days now and implementing a
correct query parsing logic may take time to get right. Let's not aim
for
everything to get into the 1.3 release.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA)
[EMAIL PROTECTED]
wrote:

[

https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256
#action_12601256]

Oleg Gnatovskiy commented on SOLR-572:
--

I installed

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-03 Thread Shalin Shekhar Mangar

The current patch has been broken for some days now and implementing a
correct query parsing logic may take time to get right. Let's not aim for
everything to get into the 1.3 release.

I would like to cut down the scope of this issue to a implementation that
indexes files and Lucene indices (both Solr and arbitary) and gives
suggestions while using the correct analyzer for multi-word queries. Let's
get a spell checker working and commit it. We can deal with more
enhancements like abstractions for custom spellcheckers and query parsing
etc. in another issue which can be dealt with separately (in 1.3 or after).
Thoughts? If there is a general consensus, I can give a new patch which can
be good enough to go in.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED]
wrote:


[
 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256]

 Oleg Gnatovskiy commented on SOLR-572:
 --

 I installed the latest patch. Still getting a NPE. Here is my config:

 searchComponent name=spellcheck
 class=org.apache.solr.handler.component.SpellCheckComponent
lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst

 lst name=spellchecker
  str
 name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
   str name=sourceLocationspellings.txt/str
   str name=characterEncodingUTF-8/str
   str name=fieldTypetext_ws/str
  str
 name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str
/lst
  /searchComponent


 Here is the URL I am hitting:
 http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

 Here is the error:

 HTTP Status 500 - null java.lang.NullPointerException at
 org.apache.lucene.index.Term.init(Term.java:39) at
 org.apache.lucene.index.Term.init(Term.java:36) at
 org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
 at
 org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
 at
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)

 spelling.txt is in my solr/home/conf.

  Spell Checker as a Search Component
  ---
 
  Key: SOLR-572
  URL: https://issues.apache.org/jira/browse/SOLR-572
  Project: Solr
   Issue Type: New Feature
   Components: spellchecker
 Affects Versions: 1.3
 Reporter: Shalin Shekhar Mangar
 Assignee: Grant Ingersoll
 Priority: Minor
  Fix For: 1.3
 
  Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
 
 
  Expose the Lucene contrib SpellChecker as a Search Component. Provide the
 following features:
  * Allow creating a spell index on a given field and make it possible to
 have multiple spell indices -- one for each field
  * Give suggestions on a per-field basis
  * Given a multi-word query, give only one

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-03 Thread Otis Gospodnetic

I'm +1 on getting the basic stuff done and committed for 1.3.
If Grant is hot on getting the abstractions in for 1.3, he will do so, but I 
think it's OK to get this done in 2 parts:
1) core working and committed for 1.3
2) abstractions working and committed after 1.3 if Grant doesn't finish them 
before 1.3

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Shalin Shekhar Mangar [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Tuesday, June 3, 2008 3:53:10 PM
 Subject: Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
 
 The current patch has been broken for some days now and implementing a
 correct query parsing logic may take time to get right. Let's not aim for
 everything to get into the 1.3 release.
 
 I would like to cut down the scope of this issue to a implementation that
 indexes files and Lucene indices (both Solr and arbitary) and gives
 suggestions while using the correct analyzer for multi-word queries. Let's
 get a spell checker working and commit it. We can deal with more
 enhancements like abstractions for custom spellcheckers and query parsing
 etc. in another issue which can be dealt with separately (in 1.3 or after).
 Thoughts? If there is a general consensus, I can give a new patch which can
 be good enough to go in.
 
 On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) 
 wrote:
 
 
 [
  
 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256]
 
  Oleg Gnatovskiy commented on SOLR-572:
  --
 
  I installed the latest patch. Still getting a NPE. Here is my config:
 
  
  class=org.apache.solr.handler.component.SpellCheckComponent
 
   
   false
   
   false
   
   1
 
 
 
   
  name=classnameorg.apache.solr.spelling.FileBasedSpellChecker
   external
   spellings.txt
   UTF-8
   text_ws
   
  name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex
 
   
 
 
  Here is the URL I am hitting:
  
 http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true
 
  Here is the error:
 
  HTTP Status 500 - null java.lang.NullPointerException at
  org.apache.lucene.index.Term.(Term.java:39) at
  org.apache.lucene.index.Term.(Term.java:36) at
  
 org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
  at
  
 org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
  at
  
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
  at
  
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
  at
  
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
  
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
  at
  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
  at
  
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
  
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
  
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
  
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
  at
  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
  at
  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
  
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at
  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
  at
  org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
  at
  
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
  at java.lang.Thread.run(Thread.java:619)
 
  spelling.txt is in my solr/home/conf.
 
   Spell Checker as a Search Component
   ---
  
   Key: SOLR-572
   URL: https://issues.apache.org/jira/browse/SOLR-572
   Project: Solr
Issue Type: New Feature
Components: spellchecker
  Affects Versions: 1.3
  Reporter: Shalin Shekhar Mangar
  Assignee: Grant Ingersoll
  Priority: Minor
   Fix For: 1.3
  
   Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
  SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
  SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
  SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-03 Thread Grant Ingersoll

I will be back on it tomorrow and will see this through before 1.3  
with the abstractions.  In other words, -1 on cutting this off  
prematurely.  :-)  Since I don't think this is the only thing holding  
up 1.3, let's just play it out and get it right so all of us are happy.


-Grant

On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:


The current patch has been broken for some days now and implementing a
correct query parsing logic may take time to get right. Let's not  
aim for

everything to get into the 1.3 release.

I would like to cut down the scope of this issue to a implementation  
that

indexes files and Lucene indices (both Solr and arbitary) and gives
suggestions while using the correct analyzer for multi-word queries.  
Let's

get a spell checker working and commit it. We can deal with more
enhancements like abstractions for custom spellcheckers and query  
parsing
etc. in another issue which can be dealt with separately (in 1.3 or  
after).
Thoughts? If there is a general consensus, I can give a new patch  
which can

be good enough to go in.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED] 


wrote:



  [
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256 
#action_12601256]


Oleg Gnatovskiy commented on SOLR-572:
--

I installed the latest patch. Still getting a NPE. Here is my config:

searchComponent name=spellcheck
class=org.apache.solr.handler.component.SpellCheckComponent
  lst name=defaults
!-- omp = Only More Popular --
str name=spellcheck.onlyMorePopularfalse/str
!-- exr = Extended Results --
str name=spellcheck.extendedResultsfalse/str
!--  The number of suggestions to return --
str name=spellcheck.count1/str
  /lst

   lst name=spellchecker
str
name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
str name=nameexternal/str
 str name=sourceLocationspellings.txt/str
 str name=characterEncodingUTF-8/str
 str name=fieldTypetext_ws/str
str
name=indexDir/usr/local/apache/lucene/solr2home/solr/data/ 
spellIndex/str

  /lst
/searchComponent


Here is the URL I am hitting:
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

Here is the error:

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.lucene.index.Term.init(Term.java:39) at
org.apache.lucene.index.Term.init(Term.java:36) at
org 
.apache 
.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java: 
228)

at
org 
.apache 
.solr 
.spelling 
.AbstractLuceneSpellChecker 
.getSuggestions(AbstractLuceneSpellChecker.java:71)

at
org 
.apache 
.solr 
.handler 
.component.SpellCheckComponent.process(SpellCheckComponent.java:177)

at
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:153)

at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
125)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)

at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
274)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:235)

at
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
206)

at
org 
.apache 
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

at
org 
.apache 
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:175)

at
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)

at
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

at
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
109)

at
org 
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
286)

at
org 
.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
844)

at
org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint 
$Worker.run(JIoEndpoint.java:447)

at java.lang.Thread.run(Thread.java:619)

spelling.txt is in my solr/home/conf.


Spell Checker as a Search Component
---

   Key: SOLR-572
   URL: https://issues.apache.org/jira/browse/SOLR-572
   Project: Solr
Issue Type: New Feature
Components: spellchecker
  Affects Versions: 1.3
  Reporter: Shalin Shekhar Mangar
  Assignee: Grant Ingersoll
  Priority: Minor
   Fix For: 1.3

   Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,

SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll [EMAIL PROTECTED] wrote:
 I will be back on it tomorrow and will see this through before 1.3 with the
 abstractions.  In other words, -1 on cutting this off prematurely.  :-)
  Since I don't think this is the only thing holding up 1.3, let's just play
 it out and get it right so all of us are happy.

This feature may not be holding back 1.3 release. The potential users
of this issue are very much interested in a basic working version.
They may be able to live without these advanced features. May be we
can have another jira issue for enhancements which may/may not go into
1.3 (depending on when it happens).




 -Grant

 On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:

 The current patch has been broken for some days now and implementing a
 correct query parsing logic may take time to get right. Let's not aim for
 everything to get into the 1.3 release.

 I would like to cut down the scope of this issue to a implementation that
 indexes files and Lucene indices (both Solr and arbitary) and gives
 suggestions while using the correct analyzer for multi-word queries. Let's
 get a spell checker working and commit it. We can deal with more
 enhancements like abstractions for custom spellcheckers and query parsing
 etc. in another issue which can be dealt with separately (in 1.3 or
 after).
 Thoughts? If there is a general consensus, I can give a new patch which
 can
 be good enough to go in.

 On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) [EMAIL PROTECTED]
 wrote:


  [

 https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256]

 Oleg Gnatovskiy commented on SOLR-572:
 --

 I installed the latest patch. Still getting a NPE. Here is my config:

 searchComponent name=spellcheck
 class=org.apache.solr.handler.component.SpellCheckComponent
  lst name=defaults
!-- omp = Only More Popular --
str name=spellcheck.onlyMorePopularfalse/str
!-- exr = Extended Results --
str name=spellcheck.extendedResultsfalse/str
!--  The number of suggestions to return --
str name=spellcheck.count1/str
  /lst

   lst name=spellchecker
str
 name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
str name=nameexternal/str
 str name=sourceLocationspellings.txt/str
 str name=characterEncodingUTF-8/str
 str name=fieldTypetext_ws/str
str

 name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str
  /lst
 /searchComponent


 Here is the URL I am hitting:

 http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

 Here is the error:

 HTTP Status 500 - null java.lang.NullPointerException at
 org.apache.lucene.index.Term.init(Term.java:39) at
 org.apache.lucene.index.Term.init(Term.java:36) at

 org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
 at

 org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
 at

 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
 at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at

 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at

 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
 at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)

 spelling.txt is in my solr/home/conf.

 Spell Checker as a Search Component
 ---

   Key: SOLR-572
   URL: https://issues.apache.org/jira/browse/SOLR-572
   Project: Solr
Issue Type: New Feature
Components: spellchecker
  Affects Versions: 1.3

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-30 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601031#action_12601031
 ] 

Noble Paul commented on SOLR-572:
-

We must consider committing a basic version of spellchecker without the 
intelligent query parsing etc. Most of the users need will be met . Adding 
enhancements later is not a bad idea. (as long as we are not breaking backward 
compatibility)



 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-30 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12601256#action_12601256
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

I installed the latest patch. Still getting a NPE. Here is my config:

searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst

lst name=spellchecker
  str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
  str name=sourceLocationspellings.txt/str
  str name=characterEncodingUTF-8/str
  str name=fieldTypetext_ws/str
  str 
name=indexDir/usr/local/apache/lucene/solr2home/solr/data/spellIndex/str
/lst
  /searchComponent


Here is the URL I am hitting: 
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.build=true

Here is the error:

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.lucene.index.Term.init(Term.java:39) at 
org.apache.lucene.index.Term.init(Term.java:36) at 
org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
 at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at 
java.lang.Thread.run(Thread.java:619)

spelling.txt is in my solr/home/conf.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600520#action_12600520
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Grant, unless I'm mistaken, the reason to add spellcheck.q parameter was to 
avoid the tedious query parsing logic that may be needed to extract 
spellcheckable terms from the q parameter. Do we really need to do this? All 
the extra things in the q parameter are usually added by the frontend itself, 
isn't it?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600524#action_12600524
]

Grant Ingersoll commented on SOLR-572:
--

{quote}
Grant, unless I'm mistaken, the reason to add spellcheck.q parameter was to
avoid the tedious query parsing logic that may be needed to extract
spellcheckable terms from the q parameter. Do we really need to do this? All
the extra things in the q parameter are usually added by the frontend itself,
isn't it?
{quote}

Is that practical? How would an application even know how to generate
spellcheck.q without parsing, etc.? I think the component should just work on
the input query. I guess I hadn't really thought about the need for
spellcheck.q before, but now that you put it in that light, I am not sure I see
the need for it.

I don't think all the extra things are necessarily added by the application.
Users can input range queries, etc. The point is, it all depends on the
application.

At any rate, it is trivial to override the SpellingQueryConverter to not do the
original REGEX and just apply the analyzer to produce the tokens. I suppose,
we could offer two converters, one w/ the regex, and one without, or it could
just have a flag.

Spell Checker as a Search Component
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600563#action_12600563
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

I still have some issues. Here is my config:
  lst name=spellchecker
  str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
  str 
name=sourceLocation/usr/local/apache/lucene/solr1home/conf/spellings.txt/str
  str name=fieldword/str
  str name=characterEncodingUTF-8/str
  !--str 
name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str--
/lst
But why do I need a field for a filebased dictionary? Also is the correct way 
to call this URL: 
http://wil1devsch1.cs.tmcs:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=externalspellcheck.builld=true
 ?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600582#action_12600582
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Oleg -- You shouldn't need field for a file-based dictionary. fieldType is 
optional for file-based dictionary. field is necessary only when you're using 
a IndexBasedSpellChecker. If you're running into a problem it's a bug. Except 
for the double L in spellcheck.build in your URL, everything else looks Ok.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-28 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600590#action_12600590
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Here is what I am getting:

HTTP Status 500 - null java.lang.NullPointerException at 
org.apache.lucene.index.Term.init(Term.java:39) at 
org.apache.lucene.index.Term.init(Term.java:36) at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:67)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:160)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at 
java.lang.Thread.run(Thread.java:619)

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600217#action_12600217
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Did you guys change the required URL parameters structure? I am hitting the 
following URL: 
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=default
 and I am getting a nullpointer exception. The config is the one from the 
sample, and I am using the latest patch.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600269#action_12600269
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I haven't applied/tried the latest patch yet, but maybe it's
quicker/better to ask here.  I'm wondering/worried about the case
where the input is a multi-term query string and a subset (e.g. 2 of 5
terms) of the query terms is misspelled.

For example, what happens when the query is:

london brigge is fallinge down
(my 2 year old's current hit)

In this case the suggestions should be:
# brigge = bridge
# fallinge = falling (or fall, more likely)

Is there something in the response that will allow the client to
figure out the positioning of the spelling suggestions and piece
together the ideal alternative query, in this case london bridge is
falling/fall down?

Ideally, the client could piece the new query string, so that it can, for 
example, italicize the misspelled words (see Google's DYM).  If the current 
SCRH returns the final corrected string, e.g. london bridge is falling down 
the client has no easy/accurate way of figuring out what was changed, I think.  
If the SCRH returned some mark-up that told the client which word(s) changed, 
then the client could do something with those changed words, e.g. london 
bridge{was:brigge}

Or, if that has problems, maybe each word should be returned separately and 
sequentially:

word=london/ !-- unchanged --
word=briggebridge/word

or maybe with offset info:

word=london offset=0/ !-- unchanged --
word=brigge offset=6bridge/word

Thoughts?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600272#action_12600272
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hello. I am hitting 
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=defaultspellcheck.build=true
 when trying to build the dictionary. My config looks this this: 
searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst
lst name=spellchecker
  str 
name=classnameorg.apache.solr.spelling.IndexBasedSpellChecker/str
  str name=namedefault/str
  str name=fieldTypetext_ws/str
  str 
name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str

/lst
lst name=spellchecker
  str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  str name=nameexternal/str
  str name=sourceLocationspellings.txt/str
  str name=fieldTypetext_ws/str
  str name=characterEncodingUTF-8/str
  str 
name=indexDir/usr/local/apache/lucene/solr1home/solr/data/spellchecker/str
/lst
/searchComponent


And the NPE is:

SEVERE: java.lang.NullPointerException
at 
org.apache.solr.util.HighFrequencyDictionary.init(HighFrequencyDictionary.java:48)
at 
org.apache.solr.spelling.IndexBasedSpellChecker.loadLuceneDictionary(IndexBasedSpellChecker.java:103)
at 
org.apache.solr.spelling.IndexBasedSpellChecker.build(IndexBasedSpellChecker.java:84)
at 
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:133)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:132)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600275#action_12600275
 ] 

Grant Ingersoll commented on SOLR-572:
--

I'm working on it.  Will have a new patch soon.




--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600277#action_12600277
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Is it an actual error, or was I missing something?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600279#action_12600279
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

In response to Ottis, I don't think each word should be returned individually. 
In fact it should probably return the entire phrase, with the suggestions 
inserted. I believe that is what google does.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600283#action_12600283
 ] 

Grant Ingersoll commented on SOLR-572:
--




All you see from Googs is their frontend, so who knows what their  
spell checker does.  I think we should return the words individually,  
the application is responsible for doing the sewing together of the  
new string, IMO.





 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600284#action_12600284
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Should we return suggestions only for the misspelled words, or should we echo 
the correctly spelled ones as well?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600294#action_12600294
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Right, Google only shows you the final output, not what they do in the backend.
But the fact that they italicize misspelled words tells us they have a 
mechanism that allows the front end to identify them.
So I think our task here is to figure out the best/easiest way for the client 
to identify misspelled words and offer the alternative query to the end user.

I think what I outlined above will do that for us:
* output all words sequentially
* mark the words that are misspelled - it may be best to return the original 
word plus corrected word:

word=london/ !-- unchanged --
word=briggebridge/word

or maybe with offset info:

word=london offset=0/ !-- unchanged --
word=brigge offset=6bridge/word

It's also fine to (*also*) return the final corrected string that doesn't mark 
the corrected words in any way, and let the lazy clients just use that.

Grant or Shalin, will either of you be adding this?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600320#action_12600320
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
Grant or Shalin, will either of you be adding this?
{quote}
Yes, I am working on it.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600323#action_12600323
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

I am still confused about my NPE. Was that a config issue on my part, or was it 
a bug? The way Grant said he was working on it, I assumed that it was a bug :-)

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-27 Thread Grant Ingersoll



On May 27, 2008, at 8:25 PM, Oleg Gnatovskiy (JIRA) wrote:



   [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600323 
#action_12600323 ]


Oleg Gnatovskiy commented on SOLR-572:
--

I am still confused about my NPE. Was that a config issue on my  
part, or was it a bug? The way Grant said he was working on it, I  
assumed that it was a bug :-)


Sorry, I meant I was working on the token alignment issue.   I will  
look at this, too, though.






Spell Checker as a Search Component
---

   Key: SOLR-572
   URL: https://issues.apache.org/jira/browse/SOLR-572
   Project: Solr
Issue Type: New Feature
Components: spellchecker
  Affects Versions: 1.3
  Reporter: Shalin Shekhar Mangar
  Assignee: Grant Ingersoll
  Priority: Minor
   Fix For: 1.3

   Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,  
SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,  
SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch



Expose the Lucene contrib SpellChecker as a Search Component.  
Provide the following features:
* Allow creating a spell index on a given field and make it  
possible to have multiple spell indices -- one for each field

* Give suggestions on a per-field basis
* Given a multi-word query, give only one consistent suggestion
* Process the query with the same analyzer specified for the source  
field and process each token separately

* Allow the user to specify minimum length for a token (optional)
Consistency criteria for a multi-word query can consist of the  
following:

* Preserve the correct words in the original query as it is
* Never give duplicate words in a suggestion


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600326#action_12600326
 ] 

Grant Ingersoll commented on SOLR-572:
--

Your field is null for your Lucene configuration.  You need to  
specify:

str name=fieldfieldName/str

You have fieldType instead.

-Grant





 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599403#action_12599403
 ] 

Grant Ingersoll commented on SOLR-572:
--

Is the prepare thread-safe for dictionary creation?  Seems like there is a 
race-condition on the construction of the dictionaries.  I suppose we need a 
synchronize in there.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599426#action_12599426
 ] 

Grant Ingersoll commented on SOLR-572:
--

Otis,

What's the use case behind:
{quote}
Oh, I see, you are reading field values from the index of the current core. I 
think that is fine, but wouldn't it also be good to be able to read field 
values from a vanilla Lucene index?
{quote}

Seems kind of strange based on what I know of index-based spelling, but I don't 
know everything about it.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599478#action_12599478
 ] 

Grant Ingersoll commented on SOLR-572:
--

Also included in that last patch is a (proposed) sample configuration:

{code}
searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=spellchecker
  str 
name=classnameorg.apache.solr.spelling.IndexBasedSpellChecker/str
lst name=dictionary
  str name=namedefault/str
  str name=fieldword/str
  str name=indexDirc:/temp/spellindex/str
/lst
/lst
lst name=spellchecker
  str name=classnameorg.apache.solr.spelling.FileBasedSpellChecker/str
  lst name=dictionary
str name=nameexternal/str
str name=sourceLocationspellings.txt/str
str name=characterEncodingUTF-8/str
str name=spellcheckIndexDir ./spellchecker/str
  /lst

/lst


  /searchComponent
{code}

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599494#action_12599494
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I'm still confused with some of the names in that config.
indexDir looks like the path to the spellchecker index.  But there is also 
spellcheckInexDir.  Is there a functonal difference?

Regarding the wouldn't it also be good to be able to read field values from a 
vanilla Lucene index? - the use case is that not all source indices should 
have to be Solr indices.  What if I have a vanilla Lucene index on the machine 
and I want the SCRH to build a SC index from that index's title field?  That 
is, I want the functionality of SCRH, but I don't have my Lucene index under 
Solr.  Is that doable?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599498#action_12599498
 ] 

Otis Gospodnetic commented on SOLR-572:
---

I think the choice of appropriate suggestions should be left to the user of 
this service.  If it's easily doable, let's make it possible and put 
information about frequencies in an appropriate place.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-23 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599558#action_12599558
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Shalin/Grant:

I think Bojan brings up some good questions:
https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12598752#action_12598752

It looks like the call to SpellChecker.exist(...) really got lost:
$ curl --silent 
https://issues.apache.org/jira/secure/attachment/12382691/SOLR-572.patch | grep 
'exist('


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599166#action_12599166
]

Grant Ingersoll commented on SOLR-572:
--

OK, I'm working on this.

Some thoughts:
1. Why is the initialization done in prepare? Just to be a little more lazy
than in init?

2. In FieldSpellChecker, the getSuggestion method goes through and creates the
suggested map, but then the loop over the entry set at the end only uses the
value. I think our response should return the associated correction with the
original token.

3. I'm working on the abstraction notion. Basic idea is to pass off and create
something like AbstractSpellChecker with a LuceneSpellChecker instantiation and
handles the loading, etc. like is currently in the SpellCheckComponent and
implements getSuggestion() like you have. The goal is to have a common
response, no matter the spell checker, so that we can plug and play spell
checkers. I hope to have a patch soon.

Spell Checker as a Search Component
---

Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
SOLR-572.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599168#action_12599168
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Grant, please hold on a bit. I'm working on the patch too and it has some 
refactorings which may make merging two patches difficult. I'll post my patch 
in a few minutes and then you can take over.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599170#action_12599170
 ] 

Grant Ingersoll commented on SOLR-572:
--

OK.  Kind of too late, but no worries, I will manage the merge.



 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599202#action_12599202
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Otis -- Sorry, I missed your post earlier. I can't think of a use-case for 
adding frequency information to plain text files. Spell checker's utility comes 
from the fact that it can suggest keywords for which Solr can return documents. 
That is possible only when the tokens (or synonyms) are present in the Solr 
index. Plain text dictionaries will be used to add additional common keywords 
which may not be in the Solr fields used for suggestions but may be present in 
huge fields which you don't want to add to spell checker. For example, I may 
build my index only on vehicle brands but I may like to include terms such as 
cars, manufacturer, make from plain text files, which may be present in 
my huge default search field. Since the intent would be just to match some 
document with the given suggestion, frequency may not play a significant role 
here, IMHO. What do you think?

Bojan -- I think we should include an exists flag in the response. As for 
your point of queries with non-simple tokens, we can introduce another param 
like spellcheck.q to which the application can set the simple query. End 
users almost never know that Solr is running behind the scenes and the Solr 
queries are constructed by the application itself which can send the simple 
query in this way.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-22 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12599222#action_12599222
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Shalin -- I think you are right.  I looked at SpellChecker again and see that 
the frequency in the main/searchable index is checked at suggest time, 
regardless of what the source of dictionary words (index or file), so frequency 
will be accounted for even when words are loaded from plain-text dictionary 
files.

Unless I'm still missing something, that means that onlyMorePopular *can* (or 
*should*!) be used even when words are loaded from plain-text dictionary files. 
 No?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598716#action_12598716
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hey guys, I am having trouble creating a file-based dictionary.

The file looks like this: 

american
mexican
clothes
shoes

and it is in my solr.home/conf directory.

The solrConfig has the following: searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=dictionary
str name=nameexternal/str
str name=typefile/str
str name=sourceLocationspellings.txt/str
str name=characterEncodingUTF-8/str
str name=spellcheckIndexDir/home/csweb/index/str
/lst
  /searchComponent

I hit it with the following URL: 
http://localhost:8983/solr/select/?q=pizzaspellcheck=truespellcheck.dictionary=external

and I get the following stacktrace:

SEVERE: java.lang.NullPointerException
at 
org.apache.lucene.search.spell.SpellChecker.indexDictionary(SpellChecker.java:321)
at 
org.apache.solr.handler.component.SpellCheckComponent$FieldSpellChecker.init(SpellCheckComponent.java:391)
at 
org.apache.solr.handler.component.SpellCheckComponent.loadExternalFileDictionary(SpellCheckComponent.java:204)
at 
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:131)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:133)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:966)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)


Any idea what I am doing wrong? Thanks!

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598728#action_12598728
 ] 

Bojan Smid commented on SOLR-572:
-

I already found the same problem, made a fix and sent it to Shalin, he will 
incorporate it into next patch when it's ready. If you specify field field 
type for that dictionary (and that field type can be found in Solr schema), 
you'll avoid the problem for now.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598727#action_12598727
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Haven't looked at the code, but the first thing I'd try is using a 
full/absolute path to your dictionary file.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598733#action_12598733
 ] 

Otis Gospodnetic commented on SOLR-572:
---

Just got an idea.  File-based dictionaries don't have word frequency 
information and with that we use certain value (e.g. so onlyMorePopular cannot 
be used).  What if we (also) accepted plain-text field dictionaries that 
included word frequency information?
e.g.
ball,100
boil,44
bowl,77
...
I'm not looking at sources now, but could we not feed this word frequency 
information into Lucene SC, so it makes use of that when figuring out top-N 
best words to suggest?

And how would we figure out the frequency of each word to begin with?  I 
imagine we can have a tool/class that, given a path to a dictionary file with 
words and a path to a Lucene/Solr index, looks up each dictionary word's 
frequency in the given index and outputs word,freq for each word.  This 
class could live in Lucene SC, but could be used by SCRH when rebuilding the SC 
index for example.

Does this sound useful and implementable?


 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598735#action_12598735
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Do you mean adding something like str name=fieldword/str to the 
definition for the file-based dictionary?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598738#action_12598738
 ] 

Bojan Smid commented on SOLR-572:
-

Oleg, that field is now called fieldType, so something like str 
name=fieldTypeword/str should work for you as long as you have fileType 
with name word defined in your schema.xml. Let me know if this works.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598752#action_12598752
 ] 

Bojan Smid commented on SOLR-572:
-

I noticed that when searching for suggestion for a word which exists in 
dictionary, SC returns some similar word instead of returning that same word. 
Old SCRH had field exist which returned true if word exists in the dictionary 
(so the client can treat it as correct word that doesn't need suggestion). 

We can't have exactly the same functionality here (since multi-word queries 
should be supported), but we can make SC return field spellingCorrect in case 
all words from the query exist in the dictionary. Otherwise, there is no way to 
know if spelling was correct or we should display suggestion.

There is a method in Lucene's SC to check if word exists in the index, so it's 
easy to check if word is correct. However, I'm also thinking of situation when 
we don't have just simple words in the query, for instance : toyata AND 
miles:[1 to 1], we want to check just toyata in the index, and return 
suggestion toyota AND miles:[1 to 1]. Other query types which might pose 
a problem are:
- fuzzy query
- wildcard query
- prefix query
...

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-21 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598801#action_12598801
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Yes, I've actually run into that problem too. Do you think this is something 
that you will be able to solve?

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component