1) When faceting use field of type string. That'll rid you of your
tokenization problems.
Alternatively do not use any tokenizers.
Also turn doc values on for the field. It'll improve performance.
2) If however you do need to use a tokenized field for faceting, make sure
that they're pretty short in terms of number of tokens or else your app
will die real soon.

On Mon, 28 Dec 2015, 22:24 Kevin Lopez <kevin.lopez...@gmail.com> wrote:

> I am not sure I am following correctly. The field I upload the document to
> would be "content" the analyzed field is "ColonCancerField". The "content"
> field contains the entire text of the document, in my case a pubmed
> abstract. This is a tokenized field. I made this field untokenized and I
> still received the same results [the results for not instead of not
> necessarily (in my current example I have 2 docs with not and 1 doc with
> not necessarily {not is of course in the document that contains not
> necessarily})]:
>
> http://imgur.com/a/1bfXT
>
> I also tried this:
>
> http://localhost:8983/solr/Cytokine/select?&q=ColonCancerField
> :"not+necessarily"
>
> I still receive the two documents, which is the same as doing
> ColonCancerField:"not"
>
> Just to clarify the structure looks like this: *content (untokenized,
> unanalyzed)* [copied to]==> *ColonCancerField *(tokenized, analyzed) then I
> browse the ColonCancerField and the facets state that there is 1 document
> for not necessarily, but when selecting it, solr returns 2 results.
>
> -Kevin
>
> On Mon, Dec 28, 2015 at 10:22 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > Can you do the opposite?  Index into an unanalyzed field and copy into
> the
> > analyzed?
> >
> > If I remember correctly facets are based off of indexed values so if you
> > tokenize the field then the facets will be as you are seeing now.
> > On Dec 28, 2015 9:45 AM, "Kevin Lopez" <kevin.lopez...@gmail.com> wrote:
> >
> > > *What I am trying to accomplish: *
> > > Generate a facet based on the documents uploaded and a text file
> > containing
> > > terms from a domain/ontology such that a facet is shown if a term is in
> > the
> > > text file and in a document (key phrase extraction).
> > >
> > > *The problem:*
> > > When I select the facet for the term "*not necessarily*" (we see there
> > is a
> > > space) and I get the results for the term "*not*". The field is
> tokenized
> > > and multivalued. This leads me to believe that I can not use a
> tokenized
> > > field as a facet field. I tried to copy the values of the field to a
> text
> > > field with a keywordtokenizer. I am told when checking the schema
> > browser:
> > > "Sorry, no Term Info available :(" This is after I delete the old index
> > and
> > > upload the documents again. The facet is coming from a field that is
> > > already copied from another field, so I cannot copy this field to a
> text
> > > field with a keywordtokenizer or strfield. What can I do to fix this?
> Is
> > > there an alternate way to accomplish this?
> > >
> > > *Here is my configuration:*
> > >
> > > <copyField source="ColonCancerField" dest="cytokineField"/>
> > >
> > > <field name="cytokineField" indexed="true" stored="true"
> > > multiValued="true" type="Cytokine_Pass"/>
> > > <fieldType name="Cytokine_Pass" class="solr.TextField">
> > >     <analyzer>
> > >     <tokenizer class="solr.KeywordTokenizerFactory" />
> > >     </analyzer>
> > > </fieldType>
> > >
> > >   <field name="ColonCancerField" type="ColonCancer" indexed="true"
> > > stored="true" multiValued="true"
> > >    termPositions="true"
> > >    termVectors="true"
> > >    termOffsets="true"/>
> > > <fieldType name="ColonCancer" class="solr.TextField"
> > > sortMissingLast="true" omitNorms="true">
> > > <analyzer>
> > > <filter class="solr.ShingleFilterFactory"
> > >             minShingleSize="2" maxShingleSize="5"
> > >             outputUnigramsIfNoShingles="true"
> > >     />
> > >   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >       <filter class="solr.LowerCaseFilterFactory"/>
> > >     <filter class="solr.SynonymFilterFactory"
> > > synonyms="synonyms_ColonCancer.txt" ignoreCase="true" expand="true"
> > > tokenizerFactory="solr.KeywordTokenizerFactory"/>
> > >     <filter class="solr.KeepWordFilterFactory"
> > >             words="prefLabels_ColonCancer.txt" ignoreCase="true"/>
> > >   </analyzer>
> > > </fieldType>
> > > <copyField source="content" dest="ColonCancerField"/>
> > >
> > > Regards,
> > >
> > > Kevin
> > >
> >
>
-- 
Regards,
Binoy Dalal

Reply via email to