Re: Spellchecker in production environment

2007-05-21 Thread Adam Hiatt
I happen to be using a backported version in my live code: http:// 
www.cnet.com/4244-5_1-0.html?query=ipidtag=srchtarget=nw


This is SOLR 1.1 w/ patches. There are no deps on 1.2ish stuff.

BTW, the UI is somewhat confusing, we're changing that soon.

-- Adam




On May 21, 2007, at 2:20 PM, fstauffer wrote:



Hello,

I am currently working on a project that is uses the 1.1 solr  
release for
its search engine (using a php interface).  We are about to deploy  
this

project in a production environment and we would like to include
Spellchecker support as well. I managed to get the spellchecker  
request

handler to work with a nightly build, which triggered a few questions:
 - would it be better in a production environment to backport the
spellchecker functionality in the 1.1 release (I am not sure how  
deep the

spellchecker impacts other parts of solr)?
 - or should I wait for the 1.2 release (no idea about the timeframe)?
 - or should I keep on using nightly builds?

Thanks a lot for your answers or opinions,

Franck


--
View this message in context: http://www.nabble.com/Spellchecker-in- 
production-environment-tf3792904.html#a10727310

Sent from the Solr - Dev mailing list archive at Nabble.com.




Re: Spellchecker in production environment

2007-05-21 Thread Adam Hiatt

Or just listen to yonik...

-- Adam




On May 21, 2007, at 2:25 PM, Yonik Seeley wrote:


On 5/21/07, fstauffer [EMAIL PROTECTED] wrote:
 - or should I wait for the 1.2 release (no idea about the  
timeframe)?


This month!


 - or should I keep on using nightly builds?


Yes, until 1.2 is out.

-Yonik




Re: Spellchecker in production environment

2007-05-21 Thread Adam Hiatt
Actually one correction to this. I made a separate jar with these  
changes and dropped it in the solr lib/ dir. The base jar is unmodified.


-- Adam




On May 21, 2007, at 2:48 PM, Adam Hiatt wrote:

I happen to be using a backported version in my live code: http:// 
www.cnet.com/4244-5_1-0.html?query=ipidtag=srchtarget=nw


This is SOLR 1.1 w/ patches. There are no deps on 1.2ish stuff.

BTW, the UI is somewhat confusing, we're changing that soon.

-- Adam




On May 21, 2007, at 2:20 PM, fstauffer wrote:



Hello,

I am currently working on a project that is uses the 1.1 solr  
release for
its search engine (using a php interface).  We are about to deploy  
this

project in a production environment and we would like to include
Spellchecker support as well. I managed to get the spellchecker  
request
handler to work with a nightly build, which triggered a few  
questions:

 - would it be better in a production environment to backport the
spellchecker functionality in the 1.1 release (I am not sure how  
deep the

spellchecker impacts other parts of solr)?
 - or should I wait for the 1.2 release (no idea about the  
timeframe)?

 - or should I keep on using nightly builds?

Thanks a lot for your answers or opinions,

Franck


--
View this message in context: http://www.nabble.com/Spellchecker- 
in-production-environment-tf3792904.html#a10727310

Sent from the Solr - Dev mailing list archive at Nabble.com.




Re: [jira] Commented: (SOLR-199) N-gram

2007-05-01 Thread Adam Hiatt

Good point. That looks flat out broken.
-- Adam




On May 1, 2007, at 2:16 PM, Chris Hostetter wrote:



:  NGramTokenizerFactory is refering to constants from, and  
constructing

: an instance of, EdgeNGramTokenizer

: Are you saying that this worries you b/c it is referenced in the  
example
: schema and will thus break without the lucene-analyzers package?  
I do
: agree that this example should probably be taken out for the time  
being

: (at the least).

no, i'm saying that the class NGramTokenizerFactory does not produce
instances of NGramTokenizer ... it produces instances of
EdgeNGramTokenizer (which is coincidently what  
EdgeNGramTokenizerFactory

does as well)


-Hoss




[jira] Updated: (SOLR-199) N-gram

2007-05-01 Thread Adam Hiatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Hiatt updated SOLR-199:


Attachment: SOLR-199-n-gram.patch

This is the new patch, not just cut out of SOLR-81...

I removed references to the Base class and fixed the edge n-gram bug.

 N-gram
 --

 Key: SOLR-199
 URL: https://issues.apache.org/jira/browse/SOLR-199
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Adam Hiatt
Priority: Trivial
 Attachments: SOLR-199-n-gram.patch, SOLR-81-ngram.patch


 This tracks the creation of a patch that adds the n-gram/edge n-gram 
 tokenizing functionality that was initially part of SOLR-81 (spell checking). 
 This was taken out b/c the lucene SpellChecker class removed this dependency. 
 None-the-less, I think this is useful functionality and the addition is 
 trivial. How does everyone feel about such an addition?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-199) N-gram

2007-04-26 Thread Adam Hiatt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492106
 ] 

Adam Hiatt commented on SOLR-199:
-

Quoted:
Adam, I think Yonik is just saying that the n-gram stuff I added to Lucene's 
contrib/analyzers was added after 2.1 was released, so we'd need a version of 
that jar from the trunk at this time. I see mentions of Solr 1.2, so perhaps we 
can grab the 2.2-dev version of that jar and add it to Solr starting with 
release 1.2?

Understood. I talked with Yonik and he mentioned possibly upgrading to a lucene 
2.2-dev in the future. I'm not sure he intended that to happen in time for solr 
1.2 however. I suppose if it came to it, we could probably use the analyzers 
2.2-dev with 2.1 core. I'm guessing the API was stable, but I'm not sure if we 
want to complicate things that much.

Quoted:
Question: How will the spellchecker you are writing or considering writing 
going to be different/better than the one in contrib/spellchecker? 

The initial use case was actually to support autocomplete functionality. IE 
using the start n-gramming functionality to build tokens that we can match term 
fragments upon. 

However, I do still plan to write a native Solr spell checker based on this 
same patch sometime in the future. The major improvements with a native system 
are several fold. First, it allows for truly native use of a Solr-configurable 
lucene index. Second, we will be able to take advantage of native Solr caching. 
Third, we will be able to boost on arbitrary aspects. For example, take the 
misspelling 'ipad' and the indexed terms 'ipod' and 'ipaq'. Both the indexed 
terms are the same edit distance away from the misspelling. They also have the 
same number of 2 grams (though not 3 grams). If find that 'ipod' is the more 
valuable term we can boost slightly based on its popularity and draw out ahead. 
The final big win is the ability to spell check on individual input tokens. For 
example, assume that we have the term 'ipod' indexed in our spell checker, but 
not the term 'apple ipod' and the misspelling 'apple ipdo' is entered. The 
overlap between 'ipod' and 'apple ipdo' is slight enough to not warrant a 
suggestion. However if we tokenize on white space and spell correct on each 
token we would be able to catch the 'ipdo' misspelling. I'm sure there are 
other use cases, but those are the ones that I've identified.



 N-gram
 --

 Key: SOLR-199
 URL: https://issues.apache.org/jira/browse/SOLR-199
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Adam Hiatt
Priority: Trivial
 Attachments: SOLR-81-ngram.patch


 This tracks the creation of a patch that adds the n-gram/edge n-gram 
 tokenizing functionality that was initially part of SOLR-81 (spell checking). 
 This was taken out b/c the lucene SpellChecker class removed this dependency. 
 None-the-less, I think this is useful functionality and the addition is 
 trivial. How does everyone feel about such an addition?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-199) N-gram

2007-03-28 Thread Adam Hiatt (JIRA)
N-gram
--

 Key: SOLR-199
 URL: https://issues.apache.org/jira/browse/SOLR-199
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Adam Hiatt
Priority: Trivial


This tracks the creation of a patch that adds the n-gram/edge n-gram tokenizing 
functionality that was initially part of SOLR-81 (spell checking). This was 
taken out b/c the lucene SpellChecker class removed this dependency. 
None-the-less, I think this is useful functionality and the addition is 
trivial. How does everyone feel about such an addition?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-199) N-gram

2007-03-28 Thread Adam Hiatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Hiatt updated SOLR-199:


Attachment: SOLR-81-ngram.patch

Here is the patch.

 N-gram
 --

 Key: SOLR-199
 URL: https://issues.apache.org/jira/browse/SOLR-199
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Adam Hiatt
Priority: Trivial
 Attachments: SOLR-81-ngram.patch


 This tracks the creation of a patch that adds the n-gram/edge n-gram 
 tokenizing functionality that was initially part of SOLR-81 (spell checking). 
 This was taken out b/c the lucene SpellChecker class removed this dependency. 
 None-the-less, I think this is useful functionality and the addition is 
 trivial. How does everyone feel about such an addition?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-81) Add Query Spellchecker functionality

2007-03-06 Thread Adam Hiatt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478302
 ] 

Adam Hiatt commented on SOLR-81:


BTW updated patch added.

 Add Query Spellchecker functionality
 

 Key: SOLR-81
 URL: https://issues.apache.org/jira/browse/SOLR-81
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: SOLR-81-edgengram-ngram.patch, 
 SOLR-81-ngram-schema.patch, SOLR-81-ngram.patch, SOLR-81-ngram.patch, 
 SOLR-81-ngram.patch, SOLR-81-ngram.patch, SOLR-81-spellchecker.patch, 
 SOLR-81-spellchecker.patch


 Use the simple approach of n-gramming outside of Solr and indexing n-gram 
 documents.  For example:
 doc
 field name=wordlettuce/field
 field name=start3let/field
 field name=gram3let ett ttu tuc uce/field
 field name=end3uce/field
 field name=start4lett/field
 field name=gram4lett ettu ttuc tuce/field
 field name=end4tuce/field
 /doc
 See:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg01254.html
 Java clients: SOLR-20 (add delete commit optimize), SOLR-30 (search)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-81) Add Query Spellchecker functionality

2007-03-02 Thread Adam Hiatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Hiatt updated SOLR-81:
---

Attachment: SOLR-81-spellchecker.patch

This patch was developed off of Otis's previous patch. It fixes a numSuggestion 
bug + adds an accuracy argument for the spellchecker + adds a commit handler 
for updating the spell correction index. It removes the n-gram generation from 
the spell correction index generation because that isn't actually needed to 
build the index (SpellChecker does that for you).

 Add Query Spellchecker functionality
 

 Key: SOLR-81
 URL: https://issues.apache.org/jira/browse/SOLR-81
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: SOLR-81-edgengram-ngram.patch, 
 SOLR-81-ngram-schema.patch, SOLR-81-ngram.patch, SOLR-81-ngram.patch, 
 SOLR-81-ngram.patch, SOLR-81-ngram.patch, SOLR-81-spellchecker.patch


 Use the simple approach of n-gramming outside of Solr and indexing n-gram 
 documents.  For example:
 doc
 field name=wordlettuce/field
 field name=start3let/field
 field name=gram3let ett ttu tuc uce/field
 field name=end3uce/field
 field name=start4lett/field
 field name=gram4lett ettu ttuc tuce/field
 field name=end4tuce/field
 /doc
 See:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg01254.html
 Java clients: SOLR-20 (add delete commit optimize), SOLR-30 (search)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-81) Add Query Spellchecker functionality

2007-02-16 Thread Adam Hiatt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473848
 ] 

Adam Hiatt commented on SOLR-81:


What was the bug? I couldn't tell from the Lucene issue description.





 Add Query Spellchecker functionality
 

 Key: SOLR-81
 URL: https://issues.apache.org/jira/browse/SOLR-81
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: SOLR-81-edgengram-ngram.patch, 
 SOLR-81-ngram-schema.patch, SOLR-81-ngram.patch


 Use the simple approach of n-gramming outside of Solr and indexing n-gram 
 documents.  For example:
 doc
 field name=wordlettuce/field
 field name=start3let/field
 field name=gram3let ett ttu tuc uce/field
 field name=end3uce/field
 field name=start4lett/field
 field name=gram4lett ettu ttuc tuce/field
 field name=end4tuce/field
 /doc
 See:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg01254.html
 Java clients: SOLR-20 (add delete commit optimize), SOLR-30 (search)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.