Hello All, Iam getting some ghost facets in solr 1.4. Can anybody kindly help me understand why I get them and how to eliminate them. My schema.xml snippet is given at the end. Iam indexing Named Entities extracted via OpenNLP into solr. My understanding regarding KeywordTokenizerFactory is that it will use all words as a single token, am I right ? for example: "New York" will be indexed as 'New York' and will not be split right??? However I see then splitup in facets as follows when running the query " http://localhost:8080/solr-admin/topicscore/select/?facet=true&facet.limit=-1"...but when I search with standard handler qt=standard&q=keyword:"New" I dont find any doc which has just "New". After digging in a bit I found that if several keywords have a common starting word it is being pulled out as another facet like the following. Any help is greatly appreciated
Result ------------ <int name="New">47</int> --------> Ghost <int name="New Hampshire">7</int> <int name="New Jersey">16</int> <int name="New Orleans">10</int> <int name="New York">147</int> <int name="New York City">23</int> <int name="New York Giants">8</int> <int name="New York Islanders">5</int> <int name="New York Mercantile Exchange">6</int> <int name="New York Mets">8</int> <int name="New York Stock Exchange">10</int> <int name="New York Times">8</int> <int name="New York University">5</int> <int name="New Zealand">7</int> <int name="Energy">7</int> --------------> Ghost <int name="Energy Department">5</int> <int name="Energy Information Administration">5</int> <int name="Federal">7</int> --------------> Ghost <int name="Federal Deposit Insurance Corp.">6</int> <int name="Federal Reserve">26</int> <int name="Federal Reserve Chairman">6</int> <int name="North">27</int> <int name="North Carolina">8</int> <int name="North Dakota">7</int> <int name="North Korea">12</int> Schema.xml ----------------- <fieldType name="keywordText" class="solr.TextField" sortMissingLast="true" omitNorms="true" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.TrimFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt,entity-stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.TrimFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt,entity-stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <field name="person" type="keywordText" indexed="true" stored="true" multiValued="true" termVectors="false" termPositions="false" termOffsets="false"/> <field name="organization" type="keywordText" indexed="true" stored="true" multiValued="true" termVectors="false" termPositions="false" termOffsets="false"/> <field name="location" type="keywordText" indexed="true" stored="true" multiValued="true" termVectors="false" termPositions="false" termOffsets="false"/> <field name="keyword" type="keywordText" indexed="true" stored="true" multiValued="true" termVectors="false" termPositions="false" termOffsets="false"/>