On Tue, 24 Jun 2008 19:17:58 -0700
Ryan McKinley <[EMAIL PROTECTED]> wrote:

> also, check the LukeRequestHandler
> 
> if there is a document you think *should* match, you can see what  
> tokens it has actually indexed...
> 

hi Ryan,
I can't see the tokens generated using LukeRequestHandler.

I can get to the document I want : 
http://localhost:8983/solr/_test_/admin/luke/?id=Jay%20Rock

and for the field I am interested , i get only :
[...]
<lst name="artist_ngram">
<str name="type">ngram</str>
<str name="schema">ITS----------</str>
<str name="flags">ITS----------</str>
<str name="value">Jay Rock</str>
<str name="internal">Jay Rock</str>
<float name="boost">1.0</float>
<int name="docFreq">0</int>
</lst>
[...]

( all the other fields look pretty much identical , none of them show the 
tokens generated).

using the luke tool itself ( lukeall.jar ,source # 0.8.1, linked against 
Lucene's 2.4 libs bundled with the nightly build), I see the following tokens, 
for this document + field:

ja, ay, y ,  r, ro, 
oc, ck, jay, ay , y r, 
 ro, roc, ock, jay , ay r, 
y ro,  roc, rock, jay r, ay ro, 
y roc,  rock, jay ro, ay roc, y rock, 
jay roc, ay rock, jay rock

Which is precisely what I expect, given that my 'ngram' type is defined as :

        <!-- n-gram tokenization -->
                <fieldType name="ngram" class="solr.TextField"
                        positionIncrementGap="100">
                        <analyzer type="index">
                                <tokenizer
                                        
class="org.apache.solr.analysis.NGramTokenizerFactory"
                                        minGramSize="2" maxGramSize="15" />
                                <filter class="solr.LowerCaseFilterFactory" />
                                <filter 
class="solr.RemoveDuplicatesTokenFilterFactory" />
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer
                                        
class="org.apache.solr.analysis.NGramTokenizerFactory"
                                        minGramSize="2" maxGramSize="15" />
                                <filter class="solr.LowerCaseFilterFactory" />
                                <filter 
class="solr.RemoveDuplicatesTokenFilterFactory" />
                        </analyzer>
                </fieldType>


My question now is, was I supposed to get any more information from 
LukeRequestHandler ?


furthermore, if I perform , on this same core with exactly this data :
http://localhost:8983/solr/_test_/select?q=artist_ngram:ro

I get this document returned (and many others).

but, if I search for 'roc' instead of 'ro' :
http://localhost:8983/solr/_test_/select?q=artist_ngram:roc

−
        <response>
−
        <lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">48</int>
−
        <lst name="params">
<str name="q">artist_ngram:roc</str>
<str name="debugQuery">true</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
−
        <lst name="debug">
<str name="rawquerystring">artist_ngram:roc</str>
<str name="querystring">artist_ngram:roc</str>
<str name="parsedquery">PhraseQuery(artist_ngram:"ro oc roc")</str>
<str name="parsedquery_toString">artist_ngram:"ro oc roc"</str>
<lst name="explain"/>
<str name="QParser">OldLuceneQParser</str>
−
        <lst name="timing">
.[...]

Is searching on nGram tokenized fields  limited to the minGramSize ?

Thanks for any pointers you can provide,
B
_________________________
{Beto|Norberto|Numard} Meijome

"I didn't attend the funeral, but I sent a nice letter saying  I approved of 
it."
  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

Reply via email to