Hmmm. It would help if you posted a couple of other pieces of information.... BTW, if this is new code are you considering donating it back? If so please open a JIRA so we can track it, see: http://wiki.apache.org/solr/HowToContribute
But to your question: First couple of things I'd do: 1> see what the admin/analysis page tells you happens. 2> attach &debug=query to your test case, see what the parsed query looks like. 3> use the admin/schema browser link for the field in question to see what actually makes it into the index. (Or use Luke or even the TermsComponent). My bet is that 2 or 3 will show something unexpected which may give you some clues. Best, Erick On Tue, Jun 24, 2014 at 5:00 AM, Erlend Garåsen <e.f.gara...@usit.uio.no> wrote: > > I'm trying to create a Norwegian Lemmatizer based on a dictionary, but for > some odd reason I don't get any search results even thought the Analyzer in > Solr Admin shows that it does the right thing. It works at query time if I > have reindexed everything based on another stemmer, e.g. > NorwegianMinimalStemmer. > > Here's a screenshot of how it lemmatizes the Norwegian word "studenter" > (masculine indefinite noun, plural - English: "students"). The stem is > "student". So far so good: > http://folk.uio.no/erlendfg/solr/lemmatizer.png > > But I get no/few results if I search for "studenter" compared to "student". > If I switch to solr.NorwegianMinimalStemFilterFactory in schema.xml at index > time and reindexes everything, it works as it should: > <analyzer type="index"> > <filter class="solr.NorwegianMinimalStemFilterFactory" variant="no"/> > > What is wrong with my TokenFilter and/or how can I debug this further? I > have tried a lot of different things without any luck, for example decode > everything explicitly to UTF8 (the wordlist is in iso-8859-1, but I'm > reading it properly by setting the correct character set) and trim all the > words without any help. The byte sequence also seems to be correct for the > stemmed word. My lemmatizer shows [73 74 75 64 65 6e 74], exactly the same > as when I have configured NorwegianMinimalStemFilterFactory in schema.xml. > > Here's the source code of my lemmatizer. Please note that it is not > finished: > http://folk.uio.no/erlendfg/solr/ > > Here's the line in my wordlist which contains the word "studenter": > 66235 student studenter subst mask appell fl ub normert 700 3 > > The following line returns the stem (input is "studenter"): > final String[] values = stemmer.stem(termAtt.buffer()); > > The rest of the code is in NorwegianLemmatizerFilter. If several stems are > returned, they are all added. > > Erlend