Ahhh...I see! I am doing my testing crawling a couple websites using Nutch and in doing so I am assigning my facets to the title field which is type=text. Are you saying that I will need to manually generate the content for my facet field? I can see the reason and need for doing it that way but I really need for my faceting to happen dynamically based on the content in the field which in this case is the title of a URL.
Thanks again for all the tips on getting this working for me. Adam On Wed, Oct 27, 2010 at 9:19 AM, Jayendra Patil <jayendra.patil....@gmail.com> wrote: > The Shingle Filter Breaks the words in a sentence into a combination of 2/3 > words. > > For faceting field you should use :- > <field name="facet_field" *type="string"* indexed="true" stored="true" > multiValued="true"/> > > The type of the field should be *string *so that it is not tokenised at all. > > On Wed, Oct 27, 2010 at 9:12 AM, Adam Estrada <estrada.a...@gmail.com>wrote: > >> Thanks guys, the solr.ShingleFilterFactory did work to get me multiple >> terms per facet but now I am seeing some redundancy in the facets >> numbers. See below... >> >> Highway (62) >> Highway System (59) >> National (59) >> National Highway (59) >> National Highway System (59) >> System (59) >> >> See what's going on here? How can I make my multi token facets smarter >> so that the tokens aren't duplicated? >> >> Thanks in advance, >> Adam >> >> On Tue, Oct 26, 2010 at 10:32 PM, Ahmet Arslan <iori...@yahoo.com> wrote: >> > Facets are generated from indexed terms. >> > >> > Depending on your need/use-case: >> > >> > You can use a additional separate String field (which is not tokenized) >> for facets, populate it via copyField. Search on tokenized field facet on >> non-tokenized field. >> > >> > Or >> > >> > You can add solr.ShingleFilterFactory to your index analyzer to form >> multiple word terms. >> > >> > --- On Wed, 10/27/10, Adam Estrada <estrada.a...@gmail.com> wrote: >> > >> >> From: Adam Estrada <estrada.a...@gmail.com> >> >> Subject: Multiple Word Facets >> >> To: solr-user@lucene.apache.org >> >> Date: Wednesday, October 27, 2010, 4:43 AM >> >> All, >> >> I am a new to Solr faceting and stuck on how to get >> >> multiple-word >> >> facets returned from a standard Solr query. See below for >> >> what is >> >> currently being returned. >> >> >> >> <lst name="facet_counts"> >> >> <lst name="facet_queries"/> >> >> <lst name="facet_fields"> >> >> <lst name="title"> >> >> <int name="Federal">89</int> >> >> <int name="EFLHD">87</int> >> >> <int name="Eastern">87</int> >> >> <int name="Lands">87</int> >> >> <int name="Highways">84</int> >> >> <int name="FHWA">60</int> >> >> <int name="Transportation">32</int> >> >> <int name="GIS">22</int> >> >> <int name="Planning">19</int> >> >> <int name="Asset">15</int> >> >> <int name="Environment">15</int> >> >> <int name="Management">14</int> >> >> <int name="Realty">12</int> >> >> <int name="Highway">11</int> >> >> <int name="HEP">10</int> >> >> <int name="Program">9</int> >> >> <int name="HEPGIS">7</int> >> >> <int name="Resources">7</int> >> >> <int name="Roads">7</int> >> >> <int name="EEI">6</int> >> >> <int name="Environmental">6</int> >> >> <int name="Right">6</int> >> >> <int name="Way">6</int> >> >> ...etc... >> >> >> >> There are many terms in there that are 2 or 3 word phrases. >> >> For >> >> example, Eastern Federal Lands Highway Division all gets >> >> broken down >> >> in to the individual words that make up the total group of >> >> words. I've >> >> seen quite a few websites that do what it is I am trying to >> >> do here so >> >> any suggestions at this point would be great. See my schema >> >> below >> >> (copied from the example schema). >> >> >> >> <fieldType name="text" >> >> class="solr.TextField" positionIncrementGap="100"> >> >> <analyzer type="index"> >> >> <tokenizer >> >> class="solr.WhitespaceTokenizerFactory"/> >> >> <filter >> >> class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> >> ignoreCase="true" expand="false"/> >> >> <filter >> >> class="solr.StopFilterFactory" >> >> >> >> ignoreCase="true" >> >> >> >> words="stopwords.txt" >> >> >> >> enablePositionIncrements="true" >> >> >> >> /> >> >> <filter >> >> class="solr.WordDelimiterFilterFactory" >> >> generateWordParts="1" >> >> generateNumberParts="1" catenateWords="0" >> >> catenateNumbers="0" >> >> catenateAll="0" splitOnCaseChange="1"/> >> >> <filter >> >> class="solr.RemoveDuplicatesTokenFilterFactory"/> >> >> </analyzer> >> >> >> >> Similar for type="query". Please advise on how to group or >> >> cluster >> >> document terms so that they can be used as facets. >> >> >> >> Many thanks in advance, >> >> Adam Estrada >> >> >> > >> > >> > >> > >> >