RE: Solr Spellcheck suggestions only return from /select handler when returning search results

2014-09-11 Thread Thomas Michael Engelke
 Hi James, hi list,

I can confirm the existence of data that's within
1 Levenshtein step from ichtscheiben:

{
 responseHeader: {

status: 0,
 QTime: 0,
 params: {
 fl: name,spell,
 indent:
true,
 q: name:Sichtscheiben,
 _: 1410423419758,
 wt:
json,
 rows: 50
 }
 },
 response: {
 numFound: 6,
 start:
0,
 docs: [
 {
 name: Sichtscheiben,
 spell: Sichtscheiben

},
 {
 name: Sichtscheiben,
 spell: Sichtscheiben
 },
 {

name: Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 }
 ]
 }
}

Multiple records
exist that should match.

The note for alternativeTermCount is
appreciated.

I've tried another term: Transport. I get suggestions
when I use Transpor and Transpo, even Transpotr, but ransport
doesn't yield any suggestions. Maybe it's a question of the beginning of
a word and has not really anything to do with stemming.

Am 10.09.2014
15:19 schrieb Dyer, James: 

 Thomas,
 
 It looks like you've set
things up correctly in that while the user is searching against a
stemmed field (name), spellcheck is checking against a
lightly-analyzed copy of it (spell). This is the right way to do it as
spellcheck against stemmed forms is usually undesirable.
 
 But as
you've experienced, you will sometimes get results (due to stemming) and
also suggestions (because the spellechecker is looking at unstemmed
forms). If you do not want spellcheck to return anything when you get
results, you can set spellcheck.maxResultsForSuggest=0.
 
 Now
keeping in mind we're comparing unstemmed forms, can you verify you
indeed have something in your index that is within 2 edits of
ichtscheiben ? My guess is you probably don't, which would be why you
do not get spelling results in that case.
 
 Also, even if you do have
something within 2 edits, if ichtscheiben occurs in your index, by
default it won't try to correct it at all (even if the query returns
nothing, maybe because of filters or other required terms on the query).
In this case you need to set spellcheck.alternativeTermCount to a
non-zero value (try maybe 5).
 
 See
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
[1] and following sections.
 
 James Dyer
 Ingram Content Group

(615) 213-4311
 
 -Original Message-
 From: Thomas Michael
Engelke [mailto:thomas.enge...@posteo.de] 
 Sent: Wednesday, September
10, 2014 5:00 AM
 To: Solr user
 Subject: Solr Spellcheck suggestions
only return from /select handler when returning search results
 

Hi,
 
 I'm experimenting with the Spellcheck component and have
therefor
 used the example configuration for spell checking to try
things out. My
 solrconfig.xml looks like this:
 
 searchComponent
name=spellcheck
 class=solr.SpellCheckComponent
 str

name=queryAnalyzerFieldTypespell/str
 !-- Multiple Spell

Checkers can be declared and used by this
 component
 --
 !-- a

spellchecker built from a field of the main index --
 lst

name=spellchecker
 str name=namedefault/str
 str

name=fieldspell/str
 str

name=classnamesolr.DirectSolrSpellChecker/str
 !-- the
spellcheck
 distance measure used, the default is the internal
levenshtein --
 str
 name=distanceMeasureinternal/str
 !--
uncomment this to require
 suggestions to occur in 1% of the
documents
 float
 name=thresholdTokenFrequency.01/float
 --

/lst
 !-- a
 spellchecker that can break or combine words. See
/spell handler below
 for usage --
 lst name=spellchecker

str
 name=namewordbreak/str
 str

name=classnamesolr.WordBreakSolrSpellChecker/str
 str

name=fieldspell/str
 str name=combineWordstrue/str
 str

name=breakWordstrue/str
 int name=maxChanges10/int

/lst
 
 /searchComponent
 
 And I've added the spellcheck
component to my
 /select request handler:
 
 requestHandler
name=/select
 class=solr.SearchHandler
 ...
 arr
name=last-components
 
 strspellcheck/str
 /arr

/requestHandler
 
 I have built up the
 spellchecker source in the
schema.xml from the name field:
 
 field
 name=spell type=spell
indexed=true stored=true required=false
 multiValued=false/

copyField source=name dest=spell
 maxChars=3 /
 ...

fieldType name=spell class=solr.TextField

positionIncrementGap=100
 analyzer type=index
 tokenizer

class=solr.StandardTokenizerFactory/
 /analyzer
 analyzer

type=query
 tokenizer class=solr.StandardTokenizerFactory/
 

/analyzer
 /fieldType
 
 As I'm querying the /select request
handler,
 I should get spellcheck suggestions with my results. However,
I rarely
 get a suggestion. Examples:
 
 query: Sichtscheibe,
spellcheck suggestion:
 Sichtscheiben (works)
 query: Sichtscheib,
spellcheck suggestion:
 Sichtscheiben (works)
 query: ichtscheiben, no
spellcheck suggestions
 
 As
 far as I can identify, I only get
suggestions when I get real search
 results. I get results for the
first 2 examples, because the german
 StemFilterFactory translates
Sichtscheibe and Sichtscheiben into
 Sichtscheib, so there are

RE: Solr Spellcheck suggestions only return from /select handler when returning search results

2014-09-11 Thread Dyer, James
Thomas,

Yes, you are right about the problem being with the beginning of the word 
needing correction.  If you are using DirectSolrSpellChecker, you need to set 
the minPrefix parameter to 0.  Otherwise the default (1) requires the first 
character to match for it to try and correct it.

See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Thursday, September 11, 2014 3:46 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Spellcheck suggestions only return from /select handler when 
returning search results

 Hi James, hi list,

I can confirm the existence of data that's within
1 Levenshtein step from ichtscheiben:

{
 responseHeader: {

status: 0,
 QTime: 0,
 params: {
 fl: name,spell,
 indent:
true,
 q: name:Sichtscheiben,
 _: 1410423419758,
 wt:
json,
 rows: 50
 }
 },
 response: {
 numFound: 6,
 start:
0,
 docs: [
 {
 name: Sichtscheiben,
 spell: Sichtscheiben

},
 {
 name: Sichtscheiben,
 spell: Sichtscheiben
 },
 {

name: Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 }
 ]
 }
}

Multiple records
exist that should match.

The note for alternativeTermCount is
appreciated.

I've tried another term: Transport. I get suggestions
when I use Transpor and Transpo, even Transpotr, but ransport
doesn't yield any suggestions. Maybe it's a question of the beginning of
a word and has not really anything to do with stemming.

Am 10.09.2014
15:19 schrieb Dyer, James: 

 Thomas,
 
 It looks like you've set
things up correctly in that while the user is searching against a
stemmed field (name), spellcheck is checking against a
lightly-analyzed copy of it (spell). This is the right way to do it as
spellcheck against stemmed forms is usually undesirable.
 
 But as
you've experienced, you will sometimes get results (due to stemming) and
also suggestions (because the spellechecker is looking at unstemmed
forms). If you do not want spellcheck to return anything when you get
results, you can set spellcheck.maxResultsForSuggest=0.
 
 Now
keeping in mind we're comparing unstemmed forms, can you verify you
indeed have something in your index that is within 2 edits of
ichtscheiben ? My guess is you probably don't, which would be why you
do not get spelling results in that case.
 
 Also, even if you do have
something within 2 edits, if ichtscheiben occurs in your index, by
default it won't try to correct it at all (even if the query returns
nothing, maybe because of filters or other required terms on the query).
In this case you need to set spellcheck.alternativeTermCount to a
non-zero value (try maybe 5).
 
 See
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
[1] and following sections.
 
 James Dyer
 Ingram Content Group

(615) 213-4311
 
 -Original Message-
 From: Thomas Michael
Engelke [mailto:thomas.enge...@posteo.de] 
 Sent: Wednesday, September
10, 2014 5:00 AM
 To: Solr user
 Subject: Solr Spellcheck suggestions
only return from /select handler when returning search results
 

Hi,
 
 I'm experimenting with the Spellcheck component and have
therefor
 used the example configuration for spell checking to try
things out. My
 solrconfig.xml looks like this:
 
 searchComponent
name=spellcheck
 class=solr.SpellCheckComponent
 str

name=queryAnalyzerFieldTypespell/str
 !-- Multiple Spell

Checkers can be declared and used by this
 component
 --
 !-- a

spellchecker built from a field of the main index --
 lst

name=spellchecker
 str name=namedefault/str
 str

name=fieldspell/str
 str

name=classnamesolr.DirectSolrSpellChecker/str
 !-- the
spellcheck
 distance measure used, the default is the internal
levenshtein --
 str
 name=distanceMeasureinternal/str
 !--
uncomment this to require
 suggestions to occur in 1% of the
documents
 float
 name=thresholdTokenFrequency.01/float
 --

/lst
 !-- a
 spellchecker that can break or combine words. See
/spell handler below
 for usage --
 lst name=spellchecker

str
 name=namewordbreak/str
 str

name=classnamesolr.WordBreakSolrSpellChecker/str
 str

name=fieldspell/str
 str name=combineWordstrue/str
 str

name=breakWordstrue/str
 int name=maxChanges10/int

/lst
 
 /searchComponent
 
 And I've added the spellcheck
component to my
 /select request handler:
 
 requestHandler
name=/select
 class=solr.SearchHandler
 ...
 arr
name=last-components
 
 strspellcheck/str
 /arr

/requestHandler
 
 I have built up the
 spellchecker source in the
schema.xml from the name field:
 
 field
 name=spell type=spell
indexed=true stored=true required=false
 multiValued=false/

copyField source=name dest=spell
 maxChars=3 /
 ...

fieldType name=spell class=solr.TextField

positionIncrementGap=100
 analyzer

RE: Solr Spellcheck suggestions only return from /select handler when returning search results

2014-09-10 Thread Dyer, James
Thomas,

It looks like you've set things up correctly in that while the user is 
searching against a stemmed field (name), spellcheck is checking against a 
lightly-analyzed copy of it (spell).  This is the right way to do it as 
spellcheck against stemmed forms is usually undesirable.

But as you've experienced, you will sometimes get results (due to stemming) and 
also suggestions (because the spellechecker is looking at unstemmed forms).  If 
you do not want spellcheck to return anything when you get results, you can set 
spellcheck.maxResultsForSuggest=0.

Now keeping in mind we're comparing unstemmed forms, can you verify you indeed 
have something in your index that is within 2 edits of ichtscheiben ?  My 
guess is you probably don't, which would be why you do not get spelling results 
in that case.

Also, even if you do have something within 2 edits, if ichtscheiben occurs in 
your index, by default it won't try to correct it at all (even if the query 
returns nothing, maybe because of filters or other required terms on the 
query).  In this case you need to set spellcheck.alternativeTermCount to a 
non-zero value (try maybe 5).

See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
and following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Wednesday, September 10, 2014 5:00 AM
To: Solr user
Subject: Solr Spellcheck suggestions only return from /select handler when 
returning search results

 Hi,

I'm experimenting with the Spellcheck component and have therefor
used the example configuration for spell checking to try things out. My
solrconfig.xml looks like this:

 searchComponent name=spellcheck
class=solr.SpellCheckComponent
 str
name=queryAnalyzerFieldTypespell/str
 !-- Multiple Spell
Checkers can be declared and used by this
 component
 --
 !-- a
spellchecker built from a field of the main index --
 lst
name=spellchecker
 str name=namedefault/str
 str
name=fieldspell/str
 str
name=classnamesolr.DirectSolrSpellChecker/str
 !-- the spellcheck
distance measure used, the default is the internal levenshtein --
 str
name=distanceMeasureinternal/str
 !-- uncomment this to require
suggestions to occur in 1% of the documents
 float
name=thresholdTokenFrequency.01/float
 --
 /lst
 !-- a
spellchecker that can break or combine words. See /spell handler below
for usage --
 lst name=spellchecker
 str
name=namewordbreak/str
 str
name=classnamesolr.WordBreakSolrSpellChecker/str
 str
name=fieldspell/str
 str name=combineWordstrue/str
 str
name=breakWordstrue/str
 int name=maxChanges10/int
 /lst

/searchComponent

And I've added the spellcheck component to my
/select request handler:

 requestHandler name=/select
class=solr.SearchHandler
 ...
 arr name=last-components

strspellcheck/str
 /arr
 /requestHandler

I have built up the
spellchecker source in the schema.xml from the name field:

 field
name=spell type=spell indexed=true stored=true required=false
multiValued=false/
 copyField source=name dest=spell
maxChars=3 /
 ...
 fieldType name=spell class=solr.TextField
positionIncrementGap=100
 analyzer type=index
 tokenizer
class=solr.StandardTokenizerFactory/
 /analyzer
 analyzer
type=query
 tokenizer class=solr.StandardTokenizerFactory/

/analyzer
 /fieldType

As I'm querying the /select request handler,
I should get spellcheck suggestions with my results. However, I rarely
get a suggestion. Examples:

query: Sichtscheibe, spellcheck suggestion:
Sichtscheiben (works)
query: Sichtscheib, spellcheck suggestion:
Sichtscheiben (works)
query: ichtscheiben, no spellcheck suggestions

As
far as I can identify, I only get suggestions when I get real search
results. I get results for the first 2 examples, because the german
StemFilterFactory translates Sichtscheibe and Sichtscheiben into
Sichtscheib, so there are matches found. However, the third query
should result in a suggestion, as the Levenshtein distance is less than
in the second example.

Suggestions, improvements, corrections?