I tried tweaking "WordDelimiterFactory" but I won't accept # OR @ symbols and it ignored totally. I need solution plz suggest.
On 4 August 2011 21:08, Jonathan Rochkind <rochk...@jhu.edu> wrote: > It's the WordDelimiterFactory in your filter chain that's removing the > punctuation entirely from your index, I think. > > Read up on what the WordDelimiter filter does, and what it's settings are; > decide how you want things to be tokenized in your index to get the behavior > your want; either get WordDelimiter to do it that way by passing it > different arguments, or stop using WordDelimiter; come back with any > questions after trying that! > > > > On 8/4/2011 11:22 AM, Mohammad Shariq wrote: > >> I have indexed around 1 million tweets ( using "text" dataType). >> when I search the tweet with "#" OR "@" I dont get the exact result. >> e.g. when I search for "#ipad" OR "@ipad" I get the result where ipad >> is >> mentioned skipping the "#" and "@". >> please suggest me, how to tune or what are filterFactories to use to get >> the >> desired result. >> I am indexing the tweet as "text", below is "text" which is there in my >> schema.xml. >> >> >> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.**KeywordTokenizerFactory"/> >> <filter class="solr.**CommonGramsFilterFactory" words="stopwords.txt" >> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/> >> <filter class="solr.**WordDelimiterFilterFactory" >> generateWordParts="1" >> generateNumberParts="1" catenateWords="1" catenateNumbers="1" >> catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.**LowerCaseFilterFactory"/> >> <filter class="solr.**SnowballPorterFilterFactory" >> protected="protwords.txt" language="English"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.**KeywordTokenizerFactory"/> >> <filter class="solr.**CommonGramsFilterFactory" >> words="stopwords.txt" >> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/> >> <filter class="solr.**WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.**LowerCaseFilterFactory"/> >> <filter class="solr.**SnowballPorterFilterFactory" >> protected="protwords.txt" language="English"/> >> </analyzer> >> </fieldType> >> >> -- Thanks and Regards Mohammad Shariq