Re: How to debug ?

Norberto Meijome Wed, 25 Jun 2008 08:50:01 -0700

On Tue, 24 Jun 2008 19:17:58 -0700
Ryan McKinley <[EMAIL PROTECTED]> wrote:


> also, check the LukeRequestHandler
> 
> if there is a document you think *should* match, you can see what  
> tokens it has actually indexed...
> 

hi Ryan,
I can't see the tokens generated using LukeRequestHandler.

I can get to the document I want : 
http://localhost:8983/solr/_test_/admin/luke/?id=Jay%20Rock

and for the field I am interested , i get only :
[...]
<lst name="artist_ngram">
<str name="type">ngram</str>
<str name="schema">ITS----------</str>
<str name="flags">ITS----------</str>
<str name="value">Jay Rock</str>
<str name="internal">Jay Rock</str>
<float name="boost">1.0</float>
<int name="docFreq">0</int>
</lst>
[...]

( all the other fields look pretty much identical , none of them show the 
tokens generated).

using the luke tool itself ( lukeall.jar ,source # 0.8.1, linked against 
Lucene's 2.4 libs bundled with the nightly build), I see the following tokens, 
for this document + field:

ja, ay, y ,  r, ro, 
oc, ck, jay, ay , y r, 
 ro, roc, ock, jay , ay r, 
y ro,  roc, rock, jay r, ay ro, 
y roc,  rock, jay ro, ay roc, y rock, 
jay roc, ay rock, jay rock

Which is precisely what I expect, given that my 'ngram' type is defined as :

        <!-- n-gram tokenization -->
                <fieldType name="ngram" class="solr.TextField"
                        positionIncrementGap="100">
                        <analyzer type="index">
                                <tokenizer
                                        
class="org.apache.solr.analysis.NGramTokenizerFactory"
                                        minGramSize="2" maxGramSize="15" />
                                <filter class="solr.LowerCaseFilterFactory" />
                                <filter 
class="solr.RemoveDuplicatesTokenFilterFactory" />
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer
                                        
class="org.apache.solr.analysis.NGramTokenizerFactory"
                                        minGramSize="2" maxGramSize="15" />
                                <filter class="solr.LowerCaseFilterFactory" />
                                <filter 
class="solr.RemoveDuplicatesTokenFilterFactory" />
                        </analyzer>
                </fieldType>


My question now is, was I supposed to get any more information from 
LukeRequestHandler ?


furthermore, if I perform , on this same core with exactly this data :
http://localhost:8983/solr/_test_/select?q=artist_ngram:ro

I get this document returned (and many others).

but, if I search for 'roc' instead of 'ro' :
http://localhost:8983/solr/_test_/select?q=artist_ngram:roc

−
        <response>
−
        <lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">48</int>
−
        <lst name="params">
<str name="q">artist_ngram:roc</str>
<str name="debugQuery">true</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
−
        <lst name="debug">
<str name="rawquerystring">artist_ngram:roc</str>
<str name="querystring">artist_ngram:roc</str>
<str name="parsedquery">PhraseQuery(artist_ngram:"ro oc roc")</str>
<str name="parsedquery_toString">artist_ngram:"ro oc roc"</str>
<lst name="explain"/>
<str name="QParser">OldLuceneQParser</str>
−
        <lst name="timing">
.[...]

Is searching on nGram tokenized fields  limited to the minGramSize ?

Thanks for any pointers you can provide,
B
_________________________
{Beto|Norberto|Numard} Meijome

"I didn't attend the funeral, but I sent a nice letter saying  I approved of 
it."
  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

Re: How to debug ?

Reply via email to