Re: small facets not working

Grant Ingersoll Tue, 19 May 2009 04:04:55 -0700


On May 19, 2009, at 5:50 AM, Justin wrote:

I have a solr index which contains research data from the human genome
project.
Each document contains about 60 facets, including one generalcompositefield that contains all the facet data. the general facet isanywhere from
100KB to 7MB.
One facet is called Gene.Symbol and, you guessed it, it containsonly the
gene symbol. There is only one Symbol per gene (for smarty pantses out
there, the aliases are contained in another facet).
When I do a search for anything in the big general facet, I findwhat i'mlooking for. But if I do a search in the Gene.Symbol facet, it doesnot
find anything.

I realize it's probably finding the string repeated elsewhere in the
document, but how do I get it to find it in the Gene.Symbol facet?

I'd look at the analysis tool in Solr admin and compare putting invarious gene names. It seems a bit odd that you are applying Porterstemming to gene names.

You are likely getting matches due to the WordDelimiterFilter andother manipulations in the BFDText. In the Symbol field you aren'tdoing nearly as much to the tokens, so I doubt there is an "abc" genein there.

You could try doing a prefix query. You could also try creating n-grams during indexing or other mechanisms for allowing matches withina string.



so a search for

http://localhost:8983/solr/core0/select?indent=on&version=2.2&q=Gene.Symbol:abc

returns nothing, but a search for

http://localhost:8983/solr/core0/select?indent=on&version=2.2&q=abc

returns
ABCC2
ABCC8
ABCD1
ABCG1
ABCA1
...
CABC1
...
ABCD3
ABCC5
ABCC9
ABCG2
ABCB11
ABCC3
ABCF1
ABCC1
ABCF2
ABCB9



Schema.xml:

<fieldType name="symbol" class="solr.TextField"positionIncrementGap="0">

       <analyzer type="index">
         <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.TrimFilterFactory" />
      </analyzer>
      <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.TrimFilterFactory" />
      </analyzer>

</fieldType>
...
<!-- yes, taken directly from the example -->

<fieldType name="text" class="solr.TextField"positionIncrementGap="100">

     <analyzer type="index">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.SynonymFilterFactory"synonyms="synonyms.txt"

ignoreCase="true" expand="true"/>
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
   </fieldType>

...
<field name="Gene.Symbol"           type="symbol" indexed="true"
stored="true" required="true" multiValued="false" omitNorms="false"/>

<field name="BFDText" type="text"indexed="true"

stored="false"     multiValued="true"  omitNorms="true"/>
...
<defaultSearchField>BFDText</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="*" dest="BFDText"/>


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: small facets not working

Reply via email to