Hi Lee, Thanks for you reply!
Yes, we actually need those filters. This dynamic field is parsing the meta data concerning each video and they may have different content. If I understand where you're going with your comment you mean that I probably should plan it better and create field types that are more specific for different field contents, correct? But still, that does not explain why I have indexed this specific value "EHT2011-2012" and the very same value does not match anything when I search for it. On Mon, Feb 13, 2012 at 11:28 AM, Lee Carroll <lee.a.carr...@googlemail.com>wrote: > Hi You have a lot of language processing for a field which contains, > at least in your example non words. > > Do you need the synonyms, two lots of stemming, etc.... > > what is the field for? > > >>" I don't believe that this last point is what actually causes > >> my unsatisfactory results" > > it probably is > > On 13 February 2012 10:02, Dirceu Vieira <dirceu...@gmail.com> wrote: > > Hi, > > > > Anybody has any thoughts about this? > > I'm really struggling whit these problems, any hints would be very > welcome! > > > > Regards, > > > > Dirceu > > > > On Fri, Feb 10, 2012 at 4:45 PM, Dirceu Vieira <dirceu...@gmail.com> > wrote: > > > >> Hi Guys, > >> > >> Would someone have time to help me understand what's happening here: > >> > >> I have a dynamic field called *prMeta_service *and this value > *"EHT2011-2012" > >> *is indexed for various documents. > >> > >> When I search for the same exact value (*"EHT2011-2012"*), it ends up > NOT > >> matching the content. > >> I have spent quite a lot of time lately trying to understand what > happens, > >> reading every documentation possible about the Token Filters that are > used > >> in this field, but I can't seem to find the answer. > >> > >> It seems to me that for some reason, the parser is getting lost because > >> the value contains letters and numbers, I mention that because I have > tried > >> querying only for *"2011-2012" and *"*20112012*" and then I have the > >> expected results. > >> > >> I am using Solr 1.4, and I haven't tried it in any other version. > >> > >> Another interesting factor is that for some reason the > >> SnowballPorterFilterFactory is removing a character from *"2011" * and > so > >> *"201" *is the value that is actually indexed. > >> I don't believe that this last point is what actually causes > >> my unsatisfactory results, but I just wanted to know if anybody have any > >> issue with the Finish language stemming. > >> > >> > >> I would very much appreciate if someone could spare some time to help me > >> on this issue. > >> > >> > >> My configuration looks like: > >> > >> > >> *- Dynamic field: * > >> > >> <dynamicField name="prMeta_*" type="text" indexed="true" stored="true" > >> multiValued="true"/> > >> > >> *- Field type:* > >> > >> <fieldType name="text" class="solr.TextField" > positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" words=" > >> stopwords.txt"/> > >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > >> generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll > >> ="0"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt" > >> /> > >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > >> <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/> > >> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" > maxGramSize=" > >> 25"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="true"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" words=" > >> stopwords.txt"/> > >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > >> generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll > >> ="0"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt" > >> /> > >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > >> <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/> > >> </analyzer> > >> </fieldType> > >> > >> *- The field analysis gives me that as a response:* > >> > >> EHT2011-2012 EHT2011-2012 EHT 2011 2012 20112012 eht 2011 2012 20112012 > >> eht 2011 2012 20112012 eht 2011 2012 20112012 eht 201 2012 20112012 e > eheht2202012202012012220201201120112201120201120120112012 > >> > >> - *When I run the query in the admin in debug mode (&debugQuery=true), > >> that's the result:* > >> > >> <str name="rawquerystring"> > >> prMeta_service:EHT2011-2012 > >> </str> > >> <str name="querystring"> > >> prMeta_service:EHT2011-2012 > >> </str> > >> <str name="parsedquery"> > >> PhraseQuery(prMeta_service:"eht 201 2012") > >> </str> > >> <str name="parsedquery_toString"> > >> prMeta_service:"eht 201 2012" > >> </str> > >> > >> > >> Thank you very much in advance! > >> > >> Best regards, > >> > >> -- > >> Dirceu Vieira Júnior > >> ------------------------------------------------------------------- > >> +47 9753 2473 > >> dirceuvjr.blogspot.com > >> twitter.com/dirceuvjr > >> > >> > > > > > > -- > > Dirceu Vieira Júnior > > ------------------------------------------------------------------- > > +47 9753 2473 > > dirceuvjr.blogspot.com > > twitter.com/dirceuvjr > -- Dirceu Vieira Júnior ------------------------------------------------------------------- +47 9753 2473 dirceuvjr.blogspot.com twitter.com/dirceuvjr