Re: stemming irregular plurals?

2014-07-29 Thread Rob Nikander
Ah, yes, that does it. Thank you both. Rob On Jul 29, 2014, at 10:30 AM, Alexandre Patry wrote: > > On 29/07/2014 10:28, Rob Nikander wrote: >> Mmm. I don’t see a way to construct one, except passing an FST, which isn’t >> exactly a map. I look at the FST javadoc; it’s a rabbit hole. > You

Re: stemming irregular plurals?

2014-07-29 Thread Alexandre Patry
On 29/07/2014 10:28, Rob Nikander wrote: Mmm. I don’t see a way to construct one, except passing an FST, which isn’t exactly a map. I look at the FST javadoc; it’s a rabbit hole. You probably want to look at http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscel

Re: stemming irregular plurals?

2014-07-29 Thread Rob Nikander
Mmm. I don’t see a way to construct one, except passing an FST, which isn’t exactly a map. I look at the FST javadoc; it’s a rabbit hole. Rob On Jul 29, 2014, at 10:14 AM, Robert Muir wrote: > You can put this thing before your stemmer, with a custom map of exceptions: > > http://lucene.apach

Re: stemming irregular plurals?

2014-07-29 Thread Robert Muir
You can put this thing before your stemmer, with a custom map of exceptions: http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.html On Tue, Jul 29, 2014 at 10:03 AM, Robert Nikander wrote: > Hi, > > I created an Analyzer with a Po

Re: RE: Stemming and Wildcard - or fire and water

2013-01-04 Thread Trejkaz
On Sat, Jan 5, 2013 at 4:06 AM, Klaus Nesbigall wrote: > The actual behavior doesn't work either. > The english word families will not be found in case the user types the query > familie* > So why solve the problem by postulate one oppinion as right and another as > wrong? > A simple flag which

AW: RE: Stemming and Wildcard - or fire and water

2013-01-04 Thread Klaus Nesbigall
I've encountered the same problem and tried to use your workaround. But overwriting the parser hasn't done the job. I do not understand why the stemming is done anyway. Uwe wrote > This is a well-known problem: Wildcards cannot be analyzed by the query > parser, because the analysis would destr

RE: Stemming and Wildcard - or fire and water

2012-12-11 Thread Lars-Erik Aabech
A possible workaround could be to modify search terms with wildcard tokens by stemming them manually and creating a new search string. Searches for hersen* would be modified to hers* and return what you expect. Con is of course that you search for more than you specified. Lars-Erik > -Origin

RE: Stemming and Wildcard - or fire and water

2012-12-11 Thread Uwe Schindler
This is a well-known problem: Wildcards cannot be analyzed by the query parser, because the analysis would destroy the wildcard characters; also stemming of parts of terms will never work. For Solr there is a workaround (MultiTermAware component), but it is also very limited and only works when

Re: Stemming - limited index expansion

2012-06-12 Thread Jack Krupansky
Krupansky -Original Message- From: Paul Hill Sent: Tuesday, June 12, 2012 7:43 PM To: java-user@lucene.apache.org Subject: RE: Stemming - limited index expansion Thanks for the reply. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Tuesday, June 1

RE: Stemming - limited index expansion

2012-06-12 Thread Paul Hill
Thanks for the reply. > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Tuesday, June 12, 2012 1:14 PM > To: java-user@lucene.apache.org > Subject: Re: Stemming - limited index expansion > > I don't completely follow precisel

Re: Stemming - limited index expansion

2012-06-12 Thread Jack Krupansky
I don't completely follow precisely what you want to do, but the WordDelimiterFilter is an example of a token filter that outputs an extra token at the same position, such as with its CATENATE_ALL/WORDS/NUMBERS options. https://builds.apache.org/job/Lucene-trunk/javadoc/analyzers-common/org/ap

Re: Stemming and Wildcard Queries

2010-05-21 Thread Erick Erickson
Another approach to stemming at index time but still providing exact matches when requested is to index the stemmed version AND the original version at the same position (think synonyms). But here's the trick, index the original token with a special character. For instance, indexing "running" would

Re: Stemming and Wildcard Queries

2010-05-21 Thread Ivan Provalov
Thanks, everyone! --- On Thu, 5/20/10, Herbert Roitblat wrote: > From: Herbert Roitblat > Subject: Re: Stemming and Wildcard Queries > To: java-user@lucene.apache.org > Date: Thursday, May 20, 2010, 4:48 PM > At a general level, we have found > that stemming during indexin

Re: Stemming and Wildcard Queries

2010-05-20 Thread Herbert Roitblat
At a general level, we have found that stemming during indexing is not advisable. Sometimes users want the exact form and if you have removed the exact form during indexing, obviously, you cannot provide that. Rather, we have found that stemming during search is more useful, or maybe it should

Re: Stemming and Wildcard Queries

2010-05-20 Thread Ahmet Arslan
> Is there a good way to combine the > wildcard queries and stemming?  > > As is, the field which is stemmed at index time, won't work > with some wildcard queries. org.apache.lucene.queryParser.analyzing.AnalyzingQueryParser may help? ---

Re: Stemming Problem

2010-05-19 Thread Larry Hendrix
Thanks for the advice. I want to keep the capitalization because in our application we are mining specific contact and company names from news articles. About 99% of the time if we match a contact or company and it's capitalized we avoid false matches. --Larry On May 18, 2010, at 7:46 PM, Eric

Re: Stemming Problem

2010-05-18 Thread Erick Erickson
You can construct your own analyzer by creating it from a pre-existing Tokenizer (e.g. WhiteSpaceTokenizer) and any number of TokenfFilters (e.g. TokenFilter). You can string any number of TokenFilters together to get many different effects. But I have to ask, why you want to keep capitalization?

RE: Stemming Problem

2010-05-18 Thread Christopher Condit
Hi Larry- > Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having > problems with stemming. Does anyone have a recommendation for other > text analyzers that handle stemming and also keep capitalization, stop words, > and punctuation? Have you tried the SnowballFilter? You co

Re: Stemming

2009-05-11 Thread Hannu Väisänen
On Fri, May 08, 2009 at 08:57:59AM -0400, Matthew Hall wrote: > process your > words into a more base form before they go into the stemmed Malaga (http://home.arcor.de/bjoern-beutel/malaga/) can be used to make a program that converts words to a base form. --

Re: Stemming

2009-05-08 Thread Matthew Hall
Ganesh wrote: My opinion is Stemming process is to get the base word. Here it is not doing so. Unfortunately this is where your problem lies, stemming doesn't do this, it breaks words that are almost lexically equivalent down into a similar root word. thus cat = cats. From the wiki: "*Stemm

Re: Stemming behavior

2008-12-19 Thread Grant Ingersoll
This is likely one of the many subtleties of the Porter stemmer. Dr. Porter has chosen a particular way of doing things, but it isn't necessarily right for everyone. You really have to measure the net benefit across all your searches, not specifically just one. If you can't live with thi

Re: stemming in Lucene

2008-04-15 Thread Hannu Väisänen
Wojtek H wrote: >Snowball stemmers are part of Lucene, but for few languages only >But maybe there is a better way or there are people working on >something like that? I use Malaga (http://home.arcor.de/bjoern-beutel/malaga/) for lemmatization and index the result. http://joyds1.joensuu.fi/progra

Re: stemming in Lucene

2008-04-02 Thread Mathieu Lecarme
Wojtek H a écrit : Hi all, Snowball stemmers are part of Lucene, but for few languages only. We have documents in various languages and so need stemmers for many languages (in particular polish). One of the ideas is to use ispell dictionaries. There are ispell dicts for many languages and so thi

Re: stemming in Lucene

2008-04-01 Thread Karl Wettin
Wojtek H skrev: Snowball stemmers are part of Lucene, but for few languages only. We org.apache.lucene.analysis contains a few more stemmers. have documents in various languages and so need stemmers for many languages (in particular polish). Have you seen Stempel? http://www.getopt.org/ste

Re: Stemming and highlighting

2008-01-04 Thread markharw00d
Let's say for the query algorithm, the word algorith is also a match, how do the highlighter know that it should also highlight occurrences of the word algorith? (I am not sure it does this anyway) The highlighter knows to highlight stemmed words because both the query terms and the docume

Re: Stemming and highlighting

2008-01-04 Thread Daniel Naber
On Freitag, 4. Januar 2008, Marjan Celikik wrote: > I am a new Lucene user and I would like to know the following. How does > Lucene bring together fuzzy queries and highlighting? You need to call rewrite() on the fuzzy query. This will expand the fuzzy query to all similar terms (e.g. belies~ -

Re: Stemming terms in SpanQuery

2006-05-02 Thread Jason Calabrese
I think the best way to tokening/stem is to use the analyzer directly. for example: TokenStream ts = analyzer.tokenStream(field, new StringReader(text)); Token token = null; while ((token = ts.next()) != null) { Term newTerm = new Term(field, token.termTe

Re: Stemming german words

2006-01-31 Thread Markus Fischer
Jonathan, what should I say, I'm feeling like an idiot now. Of course you're right. This actually solves the issue ;) thanks and sorry for wasting time, - Markus Jonathan O'Connor wrote: Markus, As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er sucht etwas". Sadly, y

Re: Stemming german words

2006-01-31 Thread Stefan Gusenbauer
Jonathan O'Connor wrote: Markus, As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er sucht etwas". Sadly, you may be able to fix this one problem, but there will be hundreds of other problems too. Stemmers are never perfect. You just have to live with it. Most users wo

Re: Stemming german words

2006-01-31 Thread Jonathan O'Connor
Markus, As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er sucht etwas". Sadly, you may be able to fix this one problem, but there will be hundreds of other problems too. Stemmers are never perfect. You just have to live with it. Most users won't have a problem with tha

Re: Stemming and exact phrases

2005-10-10 Thread Erik Hatcher
On Oct 10, 2005, at 1:44 AM, Anand Kishore wrote: Does stemming result in failure of exact phrase matches??? It shouldn't. Please provide a simple scenario where you're seeing such a failure. Stemming will allow you to find more than the exact phrase, but it should always match an exact

Re: Stemming and exact phrases

2005-10-09 Thread Chris Lu
In some cases, it does. But you can choose to use some Analyzers which only cut words out just by empty spaces. Chris Full-Text Search on Any Databases http://www.dbsight.net On 10/9/05, Anand Kishore <[EMAIL PROTECTED]> wrote: > Hi, > > Does stemming result in failu

Re: Stemming at Query time

2005-06-01 Thread Shey Rab Pawo
If your stemmer worked on indexing, then won't the "breath" entry automatically pick up all of these? So, isn't the project unnecessary and otiose? On 5/31/05, Daniel Naber <[EMAIL PROTECTED]> wrote: > On Monday 30 May 2005 18:54, Andrew Boyd wrote: > > > Now that the QueryParser knows about pos

Re: Stemming at Query time

2005-05-31 Thread Daniel Naber
On Monday 30 May 2005 18:54, Andrew Boyd wrote: >   Now that the QueryParser knows about position increments has anyone > used this to do stemming at query time and not at indexing time?  I > suppose one would need a reverse stemmer.  Given the query breath it > would need to inject breathe, breat

Re: Stemming at Query time

2005-05-31 Thread Paul Libbrecht
You'd only need position-increment if using phrase-query... otherwise... positions are quite much ignored and you can expand the query with an or. Eg, I'd do expand the query for breath to: Term(breath)^2 or (Term(breathes) or Term(breathe) or Term(breathing)) I am not sure you can make a phra