On Oct 11, 2005, at 10:04 AM, Hugo Lafayette wrote:
First of all, add maybe I make a false assumption here, but if you
strip
leading "j'", "t'" and so on, that means that if you make a search
like:
+text:"il m'aime"
you will get documents with the sentence "il m'aime" (french for "he
lov
Marvin Humphrey wrote:
> I'm curious: are there any cases in French where a string with an
> apostrophe in it ought to be split into two searchable tokens? I
> know of no such cases in English: you never want to search for the ll
> in you'll, or the O in O'Reilly, etc.
First of all, add ma
On Oct 11, 2005, at 7:52 AM, Hugo Lafayette wrote:
Why do not include that in the FrenchStemFilter "next()" method
itself ?
It will be a bad design ?
I agree with your assessment. Conceptually, this is a stemming
problem. By extension, it's not a tokenizing problem, and the
behavior o
On Oct 11, 2005, at 10:52 AM, Hugo Lafayette wrote:
Erik Hatcher wrote:
Rather than changing StandardAnalyzer, you could create a custom
Analyzer that is something along the lines of StandardTokenizer ->
custom apostrophe splitting filter -> ISOLatinFilter.
Why do not include that in the
Erik Hatcher wrote:
> Rather than changing StandardAnalyzer, you could create a custom
> Analyzer that is something along the lines of StandardTokenizer ->
> custom apostrophe splitting filter -> ISOLatinFilter.
Why do not include that in the FrenchStemFilter "next()" method itself ?
It wil
On Oct 11, 2005, at 9:22 AM, Hugo Lafayette wrote:
- accentuated characters: The french analyzer keep accents, which
could
be useful, but may also become boring. I just have to add the
ISOLatinFilter.java to correct that, but maybe adding an option to
keep
them or not could be useful.
- ap