Ahhh...I see! I am doing my testing crawling a couple websites using
Nutch and in doing so I am assigning my facets to the title field
which is type=text. Are you saying that I will need to manually
generate the content for my facet field? I can see the reason and need
for doing it that way but I really need for my faceting to happen
dynamically based on the content in the field which in this case is
the title of a URL.

Thanks again for all the tips on getting this working for me.

Adam

On Wed, Oct 27, 2010 at 9:19 AM, Jayendra Patil
<jayendra.patil....@gmail.com> wrote:
> The Shingle Filter Breaks the words in a sentence into a combination of 2/3
> words.
>
> For faceting field you should use :-
> <field name="facet_field" *type="string"* indexed="true" stored="true"
> multiValued="true"/>
>
> The type of the field should be *string *so that it is not tokenised at all.
>
> On Wed, Oct 27, 2010 at 9:12 AM, Adam Estrada <estrada.a...@gmail.com>wrote:
>
>> Thanks guys, the solr.ShingleFilterFactory did work to get me multiple
>> terms per facet but now I am seeing some redundancy in the facets
>> numbers. See below...
>>
>> Highway (62)
>> Highway System (59)
>> National (59)
>> National Highway (59)
>> National Highway System (59)
>> System (59)
>>
>> See what's going on here? How can I make my multi token facets smarter
>> so that the tokens aren't duplicated?
>>
>> Thanks in advance,
>> Adam
>>
>> On Tue, Oct 26, 2010 at 10:32 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
>> > Facets are generated from indexed terms.
>> >
>> > Depending on your need/use-case:
>> >
>> > You can use a additional separate String field (which is not tokenized)
>> for facets, populate it via copyField. Search on tokenized field facet on
>> non-tokenized field.
>> >
>> > Or
>> >
>> > You can add solr.ShingleFilterFactory to your index analyzer to form
>> multiple word terms.
>> >
>> > --- On Wed, 10/27/10, Adam Estrada <estrada.a...@gmail.com> wrote:
>> >
>> >> From: Adam Estrada <estrada.a...@gmail.com>
>> >> Subject: Multiple Word Facets
>> >> To: solr-user@lucene.apache.org
>> >> Date: Wednesday, October 27, 2010, 4:43 AM
>> >> All,
>> >> I am a new to Solr faceting and stuck on how to get
>> >> multiple-word
>> >> facets returned from a standard Solr query. See below for
>> >> what is
>> >> currently being returned.
>> >>
>> >> <lst name="facet_counts">
>> >> <lst name="facet_queries"/>
>> >> <lst name="facet_fields">
>> >> <lst name="title">
>> >> <int name="Federal">89</int>
>> >> <int name="EFLHD">87</int>
>> >> <int name="Eastern">87</int>
>> >> <int name="Lands">87</int>
>> >> <int name="Highways">84</int>
>> >> <int name="FHWA">60</int>
>> >> <int name="Transportation">32</int>
>> >> <int name="GIS">22</int>
>> >> <int name="Planning">19</int>
>> >> <int name="Asset">15</int>
>> >> <int name="Environment">15</int>
>> >> <int name="Management">14</int>
>> >> <int name="Realty">12</int>
>> >> <int name="Highway">11</int>
>> >> <int name="HEP">10</int>
>> >> <int name="Program">9</int>
>> >> <int name="HEPGIS">7</int>
>> >> <int name="Resources">7</int>
>> >> <int name="Roads">7</int>
>> >> <int name="EEI">6</int>
>> >> <int name="Environmental">6</int>
>> >> <int name="Right">6</int>
>> >> <int name="Way">6</int>
>> >> ...etc...
>> >>
>> >> There are many terms in there that are 2 or 3 word phrases.
>> >> For
>> >> example, Eastern Federal Lands Highway Division all gets
>> >> broken down
>> >> in to the individual words that make up the total group of
>> >> words. I've
>> >> seen quite a few websites that do what it is I am trying to
>> >> do here so
>> >> any suggestions at this point would be great. See my schema
>> >> below
>> >> (copied from the example schema).
>> >>
>> >>     <fieldType name="text"
>> >> class="solr.TextField" positionIncrementGap="100">
>> >>       <analyzer type="index">
>> >>          <tokenizer
>> >> class="solr.WhitespaceTokenizerFactory"/>
>> >>     <filter
>> >> class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> >> ignoreCase="true" expand="false"/>
>> >>         <filter
>> >> class="solr.StopFilterFactory"
>> >>
>> >> ignoreCase="true"
>> >>
>> >> words="stopwords.txt"
>> >>
>> >> enablePositionIncrements="true"
>> >>
>> >> />
>> >>     <filter
>> >> class="solr.WordDelimiterFilterFactory"
>> >> generateWordParts="1"
>> >> generateNumberParts="1" catenateWords="0"
>> >> catenateNumbers="0"
>> >> catenateAll="0" splitOnCaseChange="1"/>
>> >>         <filter
>> >> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >>       </analyzer>
>> >>
>> >> Similar for type="query". Please advise on how to group or
>> >> cluster
>> >> document terms so that they can be used as facets.
>> >>
>> >> Many thanks in advance,
>> >> Adam Estrada
>> >>
>> >
>> >
>> >
>> >
>>
>

Reply via email to