-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED]
Sent: Dienstag, 2. November 2004 09:21
To: Lucene Users List
Subject: Re: jaspq: dashed numerical values tokenized differently
Daniel Taurat writes:
Hi,
I have just another stupid parser question:
There seems
On Nov 3, 2004, at 5:03 AM, Daniel Taurat wrote:
Query parser was changed to treat '-' within words as part of the
word.
Before that change a query 'dash-test' was parsed as 'dash AND NOT
test'.
Now QP reads one word 'dash-test' which is analyzed. If the analyzer
splits that to more than one
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Mittwoch, 3. November 2004 13:39
To: Lucene Users List
Subject: Re: jaspq: dashed numerical values tokenized differently
On Nov 3, 2004, at 5:03 AM, Daniel Taurat wrote:
Query parser was changed to treat
On Nov 3, 2004, at 8:51 AM, Daniel Taurat wrote:
Now my only question is, why the tokenizing works differently for
strings with numerical components, or if there is a way to make the
standardAnalyzer treat those dashed mixed-characters strings similar to
plain letter-strings.
Give me an example of
Give me an example of a string and how you'd like it to be tokenized.
But first, give the AnalyzerUtils (from my java.net article) a try and
get a feel for what different analyzers do.
Keep in mind that it can be tricky (see the AnalysisParalysis page on
the wiki and my java.net article
On Nov 3, 2004, at 10:21 AM, Daniel Taurat wrote:
Checked with Luke on the string
dash\-123\-01
and got
dash
123
01
with germanAnalyzer and standardAnalyzer
and
dash
with all the other, except for whitespaceAnalyser, of course.
This makes me think that an escaped dash is never a minus, somehow.
No
]
Sent: Mittwoch, 3. November 2004 16:49
To: Lucene Users List
Subject: Re: jaspq: dashed numerical values tokenized differently
On Nov 3, 2004, at 10:21 AM, Daniel Taurat wrote:
Checked with Luke on the string
dash\-123\-01
and got
dash
123
01
with germanAnalyzer
Daniel Taurat writes:
Hi,
I have just another stupid parser question:
There seems to be a special handling of the dash sign - different from
Lucene 1.2 at least in Lucene 1.4.RC3
StandardAnalyzer.
Examples (1.4RC3):
A document containing the string dash-test is matched by the following
Hi,
I have just another stupid parser question:
There seems to be a special handling of the dash sign - different from
Lucene 1.2 at least in Lucene 1.4.RC3
StandardAnalyzer.
Examples (1.4RC3):
A document containing the string dash-test is matched by the following
search expressions:
dash
test
Daniel Taurat wrote:
Hi,
I have just another stupid parser question:
There seems to be a special handling of the dash sign - different from
Lucene 1.2 at least in Lucene 1.4.RC3
StandardAnalyzer.
From the behaviour you describe I think that the dash sign is removed
from the text by the
10 matches
Mail list logo