I have a solr index which contains research data from the human genome
project.

Each document contains about 60 facets, including one general composite
field that contains all the facet data.   the general facet is anywhere from
100KB to 7MB.

One facet is called Gene.Symbol and, you guessed it, it contains only the
gene symbol. There is only one Symbol per gene (for smarty pantses out
there, the aliases are contained in another facet).

When I do a search for anything in the big general facet, I find what i'm
looking for.  But if I do a search in the Gene.Symbol facet, it does not
find anything.

I realize it's probably finding the string repeated elsewhere in the
document, but how do I get it to find it in the Gene.Symbol facet?

so a search for

http://localhost:8983/solr/core0/select?indent=on&version=2.2&q=Gene.Symbol:abc

returns nothing, but a search for

http://localhost:8983/solr/core0/select?indent=on&version=2.2&q=abc

returns
ABCC2
ABCC8
ABCD1
ABCG1
ABCA1
...
CABC1
...
ABCD3
ABCC5
ABCC9
ABCG2
ABCB11
ABCC3
ABCF1
ABCC1
ABCF2
ABCB9



Schema.xml:

<fieldType name="symbol" class="solr.TextField" positionIncrementGap="0">
        <analyzer type="index">
          <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.TrimFilterFactory" />
       </analyzer>
       <analyzer type="query">
          <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.TrimFilterFactory" />
       </analyzer>

</fieldType>
...
<!-- yes, taken directly from the example -->

 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

...
<field name="Gene.Symbol"           type="symbol" indexed="true"
stored="true" required="true" multiValued="false" omitNorms="false"/>
<field name="BFDText"                        type="text" indexed="true"
stored="false"     multiValued="true"  omitNorms="true"/>
...
<defaultSearchField>BFDText</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="*" dest="BFDText"/>

Reply via email to