Hi Lee,

Thanks for you reply!

Yes, we actually need those filters. This dynamic field is parsing the meta
data concerning each video and they may have different content.
If I understand where you're going with your comment you mean that I
probably should plan it better and create field types that are more
specific for different field contents, correct?

But still, that does not explain why I have indexed this specific value
"EHT2011-2012" and the very same value does not match anything when I
search for it.


On Mon, Feb 13, 2012 at 11:28 AM, Lee Carroll
<lee.a.carr...@googlemail.com>wrote:

> Hi You have a lot of language processing for a field which contains,
> at least in your example non words.
>
> Do you need the synonyms, two lots of stemming, etc....
>
> what is the field for?
>
> >>" I don't believe that this last point is what actually causes
> >> my unsatisfactory results"
>
> it probably is
>
> On 13 February 2012 10:02, Dirceu Vieira <dirceu...@gmail.com> wrote:
> > Hi,
> >
> > Anybody has any thoughts about this?
> > I'm really struggling whit these problems, any hints would be very
> welcome!
> >
> > Regards,
> >
> > Dirceu
> >
> > On Fri, Feb 10, 2012 at 4:45 PM, Dirceu Vieira <dirceu...@gmail.com>
> wrote:
> >
> >> Hi Guys,
> >>
> >> Would someone have time to help me understand what's happening here:
> >>
> >> I have a dynamic field called *prMeta_service *and this value
> *"EHT2011-2012"
> >> *is indexed for various documents.
> >>
> >> When I search for the same exact value (*"EHT2011-2012"*), it ends up
> NOT
> >> matching the content.
> >> I have spent quite a lot of time lately trying to understand what
> happens,
> >> reading every documentation possible about the Token Filters that are
> used
> >> in this field, but I can't seem to find the answer.
> >>
> >> It seems to me that for some reason, the parser is getting lost because
> >> the value contains letters and numbers, I mention that because I have
> tried
> >> querying only for *"2011-2012" and *"*20112012*" and then I have the
> >> expected results.
> >>
> >> I am using Solr 1.4, and I haven't tried it in any other version.
> >>
> >> Another interesting factor is that for some reason the
> >> SnowballPorterFilterFactory is removing a character from *"2011" * and
> so
> >> *"201" *is the value that is actually indexed.
> >> I don't believe that this last point is what actually causes
> >> my unsatisfactory results, but I just wanted to know if anybody have any
> >> issue with the Finish language stemming.
> >>
> >>
> >> I would very much appreciate if someone could spare some time to help me
> >> on this issue.
> >>
> >>
> >> My configuration looks like:
> >>
> >>
> >> *- Dynamic field: *
> >>
> >> <dynamicField name="prMeta_*" type="text" indexed="true" stored="true"
> >> multiValued="true"/>
> >>
> >> *- Field type:*
> >>
> >> <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
> >> <analyzer type="index">
> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> >> stopwords.txt"/>
> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> >> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll
> >> ="0"/>
> >> <filter class="solr.LowerCaseFilterFactory"/>
> >> <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"
> >> />
> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >> <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
> >> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="
> >> 25"/>
> >> </analyzer>
> >> <analyzer type="query">
> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >> <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> >> stopwords.txt"/>
> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> >> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll
> >> ="0"/>
> >> <filter class="solr.LowerCaseFilterFactory"/>
> >> <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"
> >> />
> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >> <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
> >> </analyzer>
> >> </fieldType>
> >>
> >> *- The field analysis gives me that as a response:*
> >>
> >>  EHT2011-2012 EHT2011-2012 EHT 2011 2012 20112012 eht 2011 2012 20112012
> >> eht 2011 2012 20112012 eht 2011 2012 20112012 eht 201 2012 20112012 e
> eheht2202012202012012220201201120112201120201120120112012
> >>
> >> - *When I run the query in the admin in debug mode (&debugQuery=true),
> >> that's the result:*
> >>
> >> <str name="rawquerystring">
> >> prMeta_service:EHT2011-2012
> >> </str>
> >> <str name="querystring">
> >> prMeta_service:EHT2011-2012
> >> </str>
> >> <str name="parsedquery">
> >> PhraseQuery(prMeta_service:"eht 201 2012")
> >> </str>
> >> <str name="parsedquery_toString">
> >> prMeta_service:"eht 201 2012"
> >> </str>
> >>
> >>
> >> Thank you very much in advance!
> >>
> >> Best regards,
> >>
> >> --
> >> Dirceu Vieira Júnior
> >> -------------------------------------------------------------------
> >> +47 9753 2473
> >> dirceuvjr.blogspot.com
> >> twitter.com/dirceuvjr
> >>
> >>
> >
> >
> > --
> > Dirceu Vieira Júnior
> > -------------------------------------------------------------------
> > +47 9753 2473
> > dirceuvjr.blogspot.com
> > twitter.com/dirceuvjr
>



-- 
Dirceu Vieira Júnior
-------------------------------------------------------------------
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr

Reply via email to