Thanks guys, the solr.ShingleFilterFactory did work to get me multiple
terms per facet but now I am seeing some redundancy in the facets
numbers. See below...

Highway (62)
Highway System (59)
National (59)
National Highway (59)
National Highway System (59)
System (59)

See what's going on here? How can I make my multi token facets smarter
so that the tokens aren't duplicated?

Thanks in advance,
Adam

On Tue, Oct 26, 2010 at 10:32 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
> Facets are generated from indexed terms.
>
> Depending on your need/use-case:
>
> You can use a additional separate String field (which is not tokenized) for 
> facets, populate it via copyField. Search on tokenized field facet on 
> non-tokenized field.
>
> Or
>
> You can add solr.ShingleFilterFactory to your index analyzer to form multiple 
> word terms.
>
> --- On Wed, 10/27/10, Adam Estrada <estrada.a...@gmail.com> wrote:
>
>> From: Adam Estrada <estrada.a...@gmail.com>
>> Subject: Multiple Word Facets
>> To: solr-user@lucene.apache.org
>> Date: Wednesday, October 27, 2010, 4:43 AM
>> All,
>> I am a new to Solr faceting and stuck on how to get
>> multiple-word
>> facets returned from a standard Solr query. See below for
>> what is
>> currently being returned.
>>
>> <lst name="facet_counts">
>> <lst name="facet_queries"/>
>> <lst name="facet_fields">
>> <lst name="title">
>> <int name="Federal">89</int>
>> <int name="EFLHD">87</int>
>> <int name="Eastern">87</int>
>> <int name="Lands">87</int>
>> <int name="Highways">84</int>
>> <int name="FHWA">60</int>
>> <int name="Transportation">32</int>
>> <int name="GIS">22</int>
>> <int name="Planning">19</int>
>> <int name="Asset">15</int>
>> <int name="Environment">15</int>
>> <int name="Management">14</int>
>> <int name="Realty">12</int>
>> <int name="Highway">11</int>
>> <int name="HEP">10</int>
>> <int name="Program">9</int>
>> <int name="HEPGIS">7</int>
>> <int name="Resources">7</int>
>> <int name="Roads">7</int>
>> <int name="EEI">6</int>
>> <int name="Environmental">6</int>
>> <int name="Right">6</int>
>> <int name="Way">6</int>
>> ...etc...
>>
>> There are many terms in there that are 2 or 3 word phrases.
>> For
>> example, Eastern Federal Lands Highway Division all gets
>> broken down
>> in to the individual words that make up the total group of
>> words. I've
>> seen quite a few websites that do what it is I am trying to
>> do here so
>> any suggestions at this point would be great. See my schema
>> below
>> (copied from the example schema).
>>
>>     <fieldType name="text"
>> class="solr.TextField" positionIncrementGap="100">
>>       <analyzer type="index">
>>          <tokenizer
>> class="solr.WhitespaceTokenizerFactory"/>
>>     <filter
>> class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="false"/>
>>         <filter
>> class="solr.StopFilterFactory"
>>
>> ignoreCase="true"
>>
>> words="stopwords.txt"
>>
>> enablePositionIncrements="true"
>>
>> />
>>     <filter
>> class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1"
>> generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0"
>> catenateAll="0" splitOnCaseChange="1"/>
>>         <filter
>> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>       </analyzer>
>>
>> Similar for type="query". Please advise on how to group or
>> cluster
>> document terms so that they can be used as facets.
>>
>> Many thanks in advance,
>> Adam Estrada
>>
>
>
>
>

Reply via email to