Ah, yes, that does it. Thank you both.
Rob
On Jul 29, 2014, at 10:30 AM, Alexandre Patry
wrote:
>
> On 29/07/2014 10:28, Rob Nikander wrote:
>> Mmm. I don’t see a way to construct one, except passing an FST, which isn’t
>> exactly a map. I look at the FST javadoc; it’s a rabbit hole.
> You
On 29/07/2014 10:28, Rob Nikander wrote:
Mmm. I don’t see a way to construct one, except passing an FST, which isn’t
exactly a map. I look at the FST javadoc; it’s a rabbit hole.
You probably want to look at
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscel
Mmm. I don’t see a way to construct one, except passing an FST, which isn’t
exactly a map. I look at the FST javadoc; it’s a rabbit hole.
Rob
On Jul 29, 2014, at 10:14 AM, Robert Muir wrote:
> You can put this thing before your stemmer, with a custom map of exceptions:
>
> http://lucene.apach
You can put this thing before your stemmer, with a custom map of exceptions:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.html
On Tue, Jul 29, 2014 at 10:03 AM, Robert Nikander
wrote:
> Hi,
>
> I created an Analyzer with a Po
On Sat, Jan 5, 2013 at 4:06 AM, Klaus Nesbigall wrote:
> The actual behavior doesn't work either.
> The english word families will not be found in case the user types the query
> familie*
> So why solve the problem by postulate one oppinion as right and another as
> wrong?
> A simple flag which
I've encountered the same problem and tried to use your workaround. But
overwriting the parser hasn't done the job.
I do not understand why the stemming is done anyway.
Uwe wrote
> This is a well-known problem: Wildcards cannot be analyzed by the query
> parser, because the analysis would destr
A possible workaround could be to modify search terms with wildcard tokens by
stemming them manually and creating a new search string.
Searches for hersen* would be modified to hers* and return what you expect.
Con is of course that you search for more than you specified.
Lars-Erik
> -Origin
This is a well-known problem: Wildcards cannot be analyzed by the query parser,
because the analysis would destroy the wildcard characters; also stemming of
parts of terms will never work. For Solr there is a workaround (MultiTermAware
component), but it is also very limited and only works when
Krupansky
-Original Message-
From: Paul Hill
Sent: Tuesday, June 12, 2012 7:43 PM
To: java-user@lucene.apache.org
Subject: RE: Stemming - limited index expansion
Thanks for the reply.
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Tuesday, June 1
Thanks for the reply.
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Tuesday, June 12, 2012 1:14 PM
> To: java-user@lucene.apache.org
> Subject: Re: Stemming - limited index expansion
>
> I don't completely follow precisel
I don't completely follow precisely what you want to do, but the
WordDelimiterFilter is an example of a token filter that outputs an extra
token at the same position, such as with its CATENATE_ALL/WORDS/NUMBERS
options.
https://builds.apache.org/job/Lucene-trunk/javadoc/analyzers-common/org/ap
Another approach to stemming at index time but still providing exact matches
when requested is to index the stemmed version AND the original version at
the same position (think synonyms). But here's the trick, index the original
token with a special character. For instance, indexing "running" would
Thanks, everyone!
--- On Thu, 5/20/10, Herbert Roitblat wrote:
> From: Herbert Roitblat
> Subject: Re: Stemming and Wildcard Queries
> To: java-user@lucene.apache.org
> Date: Thursday, May 20, 2010, 4:48 PM
> At a general level, we have found
> that stemming during indexin
At a general level, we have found that stemming during indexing is not
advisable. Sometimes users want the exact form and if you have removed the
exact form during indexing, obviously, you cannot provide that. Rather, we
have found that stemming during search is more useful, or maybe it should
> Is there a good way to combine the
> wildcard queries and stemming?
>
> As is, the field which is stemmed at index time, won't work
> with some wildcard queries.
org.apache.lucene.queryParser.analyzing.AnalyzingQueryParser may help?
---
Thanks for the advice. I want to keep the capitalization because in our
application we are mining specific contact and company names from news
articles. About 99% of the time if we match a contact or company and it's
capitalized we avoid false matches.
--Larry
On May 18, 2010, at 7:46 PM, Eric
You can construct your own analyzer by creating
it from a pre-existing Tokenizer
(e.g. WhiteSpaceTokenizer) and any number
of TokenfFilters (e.g. TokenFilter). You can
string any number of TokenFilters together
to get many different effects.
But I have to ask, why you want to keep capitalization?
Hi Larry-
> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
> problems with stemming. Does anyone have a recommendation for other
> text analyzers that handle stemming and also keep capitalization, stop words,
> and punctuation?
Have you tried the SnowballFilter? You co
On Fri, May 08, 2009 at 08:57:59AM -0400, Matthew Hall wrote:
> process your
> words into a more base form before they go into the stemmed
Malaga (http://home.arcor.de/bjoern-beutel/malaga/) can be used to
make a program that converts words to a base form.
--
Ganesh wrote:
My opinion is Stemming process is to get the base word. Here it is not
doing so.
Unfortunately this is where your problem lies, stemming doesn't do this,
it breaks words that are almost lexically equivalent down into a similar
root word. thus cat = cats.
From the wiki: "*Stemm
This is likely one of the many subtleties of the Porter stemmer. Dr.
Porter has chosen a particular way of doing things, but it isn't
necessarily right for everyone. You really have to measure the net
benefit across all your searches, not specifically just one. If you
can't live with thi
Wojtek H wrote:
>Snowball stemmers are part of Lucene, but for few languages only
>But maybe there is a better way or there are people working on
>something like that?
I use Malaga (http://home.arcor.de/bjoern-beutel/malaga/)
for lemmatization and index the result.
http://joyds1.joensuu.fi/progra
Wojtek H a écrit :
Hi all,
Snowball stemmers are part of Lucene, but for few languages only. We
have documents in various languages and so need stemmers for many
languages (in particular polish). One of the ideas is to use ispell
dictionaries. There are ispell dicts for many languages and so thi
Wojtek H skrev:
Snowball stemmers are part of Lucene, but for few languages only. We
org.apache.lucene.analysis contains a few more stemmers.
have documents in various languages and so need stemmers for many
languages (in particular polish).
Have you seen Stempel?
http://www.getopt.org/ste
Let's say for the query algorithm, the word algorith is also a match,
how do the highlighter know that it should also highlight
occurrences of the word algorith? (I am not sure it does this anyway)
The highlighter knows to highlight stemmed words because both the query
terms and the docume
On Freitag, 4. Januar 2008, Marjan Celikik wrote:
> I am a new Lucene user and I would like to know the following. How does
> Lucene bring together fuzzy queries and highlighting?
You need to call rewrite() on the fuzzy query. This will expand the fuzzy
query to all similar terms (e.g. belies~ -
I think the best way to tokening/stem is to use the analyzer directly. for
example:
TokenStream ts = analyzer.tokenStream(field, new StringReader(text));
Token token = null;
while ((token = ts.next()) != null) {
Term newTerm = new Term(field, token.termTe
Jonathan,
what should I say, I'm feeling like an idiot now. Of course you're
right. This actually solves the issue ;)
thanks and sorry for wasting time,
- Markus
Jonathan O'Connor wrote:
Markus,
As I'm sure you know, "sucht" is also an inflection of "suchen", e.g.
"er sucht etwas". Sadly, y
Jonathan O'Connor wrote:
Markus,
As I'm sure you know, "sucht" is also an inflection of "suchen", e.g.
"er sucht etwas". Sadly, you may be able to fix this one problem, but
there will be hundreds of other problems too. Stemmers are never
perfect. You just have to live with it.
Most users wo
Markus,
As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er sucht etwas". Sadly, you may be able to fix this one problem, but there will be hundreds of other problems too. Stemmers are never perfect. You just have to live with it.
Most users won't have a problem with tha
On Oct 10, 2005, at 1:44 AM, Anand Kishore wrote:
Does stemming result in failure of exact phrase matches???
It shouldn't. Please provide a simple scenario where you're seeing
such a failure. Stemming will allow you to find more than the exact
phrase, but it should always match an exact
In some cases, it does.
But you can choose to use some Analyzers which only cut words out just
by empty spaces.
Chris
Full-Text Search on Any Databases
http://www.dbsight.net
On 10/9/05, Anand Kishore <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Does stemming result in failu
If your stemmer worked on indexing, then won't the "breath" entry
automatically pick up all of these? So, isn't the project unnecessary
and otiose?
On 5/31/05, Daniel Naber <[EMAIL PROTECTED]> wrote:
> On Monday 30 May 2005 18:54, Andrew Boyd wrote:
>
> > Now that the QueryParser knows about pos
On Monday 30 May 2005 18:54, Andrew Boyd wrote:
> Now that the QueryParser knows about position increments has anyone
> used this to do stemming at query time and not at indexing time? I
> suppose one would need a reverse stemmer. Given the query breath it
> would need to inject breathe, breat
You'd only need position-increment if using phrase-query...
otherwise... positions are quite much ignored and you can expand the
query with an or.
Eg, I'd do expand the query for breath to:
Term(breath)^2 or (Term(breathes) or Term(breathe) or Term(breathing))
I am not sure you can make a phra
35 matches
Mail list logo