What happens if you set spellcheck.maxCollations to more than 1?

--------------------------------------------------
From: "Alexei Martchenko" <ale...@superdownloads.com.br>
Sent: Wednesday, August 17, 2011 11:01 PM
To: <solr-user@lucene.apache.org>
Subject: Re: suggester issues

I've been indexing and reindexing stuff here with Shingles. I don't believe it's the best approach. Results are interesting, but I believe it's not what
the suggester is meant to be.

I tried

<fieldType name="textSuggestion" class="solr.TextField"
positionIncrementGap="10" stored="false" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="4"
outputUnigrams="true" outputUnigramsIfNoShingles="false" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

but I got compound words in the suggestion itself.

If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri i
get

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="dri">
<int name="numFound">6</int>
<int name="startOffset">0</int>
<int name="endOffset">3</int>
<arr name="suggestion">
<str>drivers</str>
<str>drivers nvidia</str>
<str>drivers intel</str>
<str>drivers nvidia geforce</str>
<str>drive</str>
<str>driver</str>
</arr>
</lst>
<str name="collation">drivers</str>
</lst>
</lst>
</response>

but when i enter the second word,
http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20n<http://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n>
it
scrambles everything

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="drivers">
<int name="numFound">4</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>drivers</str>
<str>drivers nvidia</str>
<str>drivers intel</str>
<str>drivers nvidia geforce</str>
</arr>
</lst>
<lst name="n">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">9</int>
<arr name="suggestion">
<str>nvidia</str>
<str>net</str>
<str>nvidia geforce</str>
<str>network</str>
<str>new</str>
<str>n</str>
<str>ninja</str>
</arr>
</lst>
<str name="collation">drivers nvidia</str>
</lst>
</lst>
</response>

Although the collation seems fine for this, it's not exactly what suggester
is supposed to do.

Any thoughts?

2011/8/17 Alexei Martchenko <ale...@superdownloads.com.br>

I have the very very very same problem. I could copy+paste your message as
mine. I've discovered so far that bigger dictionaries work better for me,
controlling threshold is much better than avoid indexing one or twio fields.
Of course i'm still polishing this.

At this very moment I was looking into Shingles, are you using them?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

How are your fields?

2011/8/17 Kuba Krzemień <krzemien.k...@gmail.com>

Hello, I am working on creating a auto-complete functionality for my
platform which indexes large ammounts of text (title + contents) - there is too much data for a dictionary. I am using the latest version of Solr (3.3)
and I am trying to take advantage of the Suggester functionality.
Unfortunately so far the outcome isn't that great.

The Suggester works only for single words or whole phrases (depends on the tokenizer). When using the first option, I am unable to suggest any combined queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
worse, querying 'new AND y' gives the same results (also when using
collate), which means that the returned suggestion may give no results -
what makes sense separately often doesn't work combined. I need a way to
find only those suggestions, that will return results when doing a AND query (for example 'new AND york', 'new AND year', as long as they give results
upon querying - 'new AND yeti' shouldn't be returned as a suggestion).

When I use the second tokenizer and the suggestions return phrases, for
'ne' I will get 'new york' and 'new year', but for 'new y' I will get
nothing. Also, for 'y' I will get nothing, so the issue remains.

If someone has some experience working with the Suggester, or if someone
has created a well working auto-suggester based on Solr, please help me.
I've been trying to find a sollution for this for quite some time.

Yours sincerely,
Jackob K




--

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533




--

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Reply via email to