RE: jaspq: dashed numerical values tokenized differently

2004-11-03 Thread Daniel Taurat
-Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Dienstag, 2. November 2004 09:21 To: Lucene Users List Subject: Re: jaspq: dashed numerical values tokenized differently Daniel Taurat writes: Hi, I have just another stupid parser question: There seems

Re: jaspq: dashed numerical values tokenized differently

2004-11-03 Thread Erik Hatcher
On Nov 3, 2004, at 5:03 AM, Daniel Taurat wrote: Query parser was changed to treat '-' within words as part of the word. Before that change a query 'dash-test' was parsed as 'dash AND NOT test'. Now QP reads one word 'dash-test' which is analyzed. If the analyzer splits that to more than one

RE: jaspq: dashed numerical values tokenized differently

2004-11-03 Thread Daniel Taurat
-Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Mittwoch, 3. November 2004 13:39 To: Lucene Users List Subject: Re: jaspq: dashed numerical values tokenized differently On Nov 3, 2004, at 5:03 AM, Daniel Taurat wrote: Query parser was changed to treat

Re: jaspq: dashed numerical values tokenized differently

2004-11-03 Thread Erik Hatcher
On Nov 3, 2004, at 8:51 AM, Daniel Taurat wrote: Now my only question is, why the tokenizing works differently for strings with numerical components, or if there is a way to make the standardAnalyzer treat those dashed mixed-characters strings similar to plain letter-strings. Give me an example of

RE: jaspq: dashed numerical values tokenized differently

2004-11-03 Thread Daniel Taurat
Give me an example of a string and how you'd like it to be tokenized. But first, give the AnalyzerUtils (from my java.net article) a try and get a feel for what different analyzers do. Keep in mind that it can be tricky (see the AnalysisParalysis page on the wiki and my java.net article

Re: jaspq: dashed numerical values tokenized differently

2004-11-03 Thread Erik Hatcher
On Nov 3, 2004, at 10:21 AM, Daniel Taurat wrote: Checked with Luke on the string dash\-123\-01 and got dash 123 01 with germanAnalyzer and standardAnalyzer and dash with all the other, except for whitespaceAnalyser, of course. This makes me think that an escaped dash is never a minus, somehow. No

RE: jaspq: dashed numerical values tokenized differently

2004-11-03 Thread Daniel Taurat
] Sent: Mittwoch, 3. November 2004 16:49 To: Lucene Users List Subject: Re: jaspq: dashed numerical values tokenized differently On Nov 3, 2004, at 10:21 AM, Daniel Taurat wrote: Checked with Luke on the string dash\-123\-01 and got dash 123 01 with germanAnalyzer

Re: jaspq: dashed numerical values tokenized differently

2004-11-02 Thread Morus Walter
Daniel Taurat writes: Hi, I have just another stupid parser question: There seems to be a special handling of the dash sign - different from Lucene 1.2 at least in Lucene 1.4.RC3 StandardAnalyzer. Examples (1.4RC3): A document containing the string dash-test is matched by the following

jaspq: dashed numerical values tokenized differently

2004-11-01 Thread Daniel Taurat
Hi, I have just another stupid parser question: There seems to be a special handling of the dash sign - different from Lucene 1.2 at least in Lucene 1.4.RC3 StandardAnalyzer. Examples (1.4RC3): A document containing the string dash-test is matched by the following search expressions: dash test

Re: jaspq: dashed numerical values tokenized differently

2004-11-01 Thread sergiu gordea
Daniel Taurat wrote: Hi, I have just another stupid parser question: There seems to be a special handling of the dash sign - different from Lucene 1.2 at least in Lucene 1.4.RC3 StandardAnalyzer. From the behaviour you describe I think that the dash sign is removed from the text by the