Re: StandardAnalyzer unit tests?

2005-01-18 Thread Erik Hatcher
On Jan 17, 2005, at 4:51 AM, Chris Lamprecht wrote:
I submitted a testcase --
http://issues.apache.org/bugzilla/show_bug.cgi?id=33134
I reviewed and applied your contributed unit test.  Thanks!
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: StandardAnalyzer unit tests?

2005-01-17 Thread Daan Hoogland
€ 0.02: Indexing code "++" is a stop term, it might be in english text 
as well. 'C' is a not very descriptive but very valid variable name. '#' 
is used in some old morse transcripts I think. I am not going to die or 
get fired, but I'd suggest not including those tokens in a standard 
anything.

Erik Hatcher wrote:

> I personally don't have a problem with that change, however I don't 
> like changing such things as they can lead to unexpected and confusing 
> issues later. Suppose someone upgrades their version of Lucene without 
> re-indexing and now queries that used to work no longer work? (sure, I 
> agree it is wise to re-index if you upgrade Lucene).
>
> Perhaps others could chime in on whether this change would adversely 
> affect them or if this a desirable change?
>
> Erik
>
>
>
> On Jan 17, 2005, at 4:51 AM, Chris Lamprecht wrote:
>
>> Erik, Paul, Daniel,
>>
>> I submitted a testcase --
>> http://issues.apache.org/bugzilla/show_bug.cgi?id=33134
>>
>> On a related note, what do you all think about updating the
>> StandardAnalyzer grammar to treat "C#" and "C++" as tokens? It's a
>> small modification to the grammar -- NutchAnalysis.jj has it.
>>
>> -Chris
>>
>> On Mon, 17 Jan 2005 03:23:41 -0500, Erik Hatcher
>> <[EMAIL PROTECTED]> wrote:
>>
>>> I don't see any tests of StandardAnalyzer either. Your contribution
>>> would be most welcome. There are tests that use StandardAnalyzer, but
>>> not to test it directly.
>>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-- 
The information contained in this communication and any attachments is 
confidential and may be privileged, and is for the sole use of the intended 
recipient(s). Any unauthorized review, use, disclosure or distribution is 
prohibited. If you are not the intended recipient, please notify the sender 
immediately by replying to this message and destroy all copies of this message 
and any attachments. ASML is neither liable for the proper and complete 
transmission of the information contained in this communication, nor for any 
delay in its receipt.


Re: StandardAnalyzer unit tests?

2005-01-17 Thread Erik Hatcher
I personally don't have a problem with that change, however I don't 
like changing such things as they can lead to unexpected and confusing 
issues later.  Suppose someone upgrades their version of Lucene without 
re-indexing and now queries that used to work no longer work?  (sure, I 
agree it is wise to re-index if you upgrade Lucene).

Perhaps others could chime in on whether this change would adversely 
affect them or if this a desirable change?

Erik

On Jan 17, 2005, at 4:51 AM, Chris Lamprecht wrote:
Erik, Paul, Daniel,
I submitted a testcase --
http://issues.apache.org/bugzilla/show_bug.cgi?id=33134
On a related note, what do you all think about updating the
StandardAnalyzer grammar to treat "C#" and "C++" as tokens?  It's a
small modification to the grammar -- NutchAnalysis.jj has it.
-Chris
On Mon, 17 Jan 2005 03:23:41 -0500, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
I don't see any tests of StandardAnalyzer either.  Your contribution
would be most welcome.  There are tests that use StandardAnalyzer, but
not to test it directly.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: StandardAnalyzer unit tests?

2005-01-17 Thread Chris Lamprecht
Erik, Paul, Daniel,

I submitted a testcase --
http://issues.apache.org/bugzilla/show_bug.cgi?id=33134

On a related note, what do you all think about updating the
StandardAnalyzer grammar to treat "C#" and "C++" as tokens?  It's a
small modification to the grammar -- NutchAnalysis.jj has it.

-Chris

On Mon, 17 Jan 2005 03:23:41 -0500, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
> I don't see any tests of StandardAnalyzer either.  Your contribution
> would be most welcome.  There are tests that use StandardAnalyzer, but
> not to test it directly.
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: StandardAnalyzer unit tests?

2005-01-17 Thread Erik Hatcher
I don't see any tests of StandardAnalyzer either.  Your contribution 
would be most welcome.  There are tests that use StandardAnalyzer, but 
not to test it directly.

Erik
On Jan 16, 2005, at 11:48 PM, Chris Lamprecht wrote:
Does anyone have a unit test for StandardAnalyzer?  I've modified the
StandardAnalyzer javacc grammar to tokenize "c#" and "c++" without
removing the "#" and "++" parts, using pieces of the grammar from
Nutch.  Now I'd like to make sure I didn't change the way it parses
any other tokens.  thanks,
-Chris
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: StandardAnalyzer unit tests?

2005-01-17 Thread Paul Elschot
Chris,

On Monday 17 January 2005 05:49, Chris Lamprecht wrote:
> PS-I didn't find any in lucene CVS head, and I'd be glad to contribute
> some unit tests.

Under Unix this will give you the cvs head:

cvs -d :pserver:[EMAIL PROTECTED]:/home/cvspublic checkout jakarta-lucene

The tests are in the jakarta-lucene/src/test directory.
There are some tests that might be interesting in the queryParser and analysis 
directories below the obligatory org/apache/lucene.

In case these tests are not covering what you need, or you need help to run 
the tests, could you continue on lucene-dev?

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: StandardAnalyzer unit tests?

2005-01-16 Thread Chris Lamprecht
PS-I didn't find any in lucene CVS head, and I'd be glad to contribute
some unit tests.


> Does anyone have a unit test for StandardAnalyzer?  I've modified the

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



StandardAnalyzer unit tests?

2005-01-16 Thread Chris Lamprecht
Does anyone have a unit test for StandardAnalyzer?  I've modified the
StandardAnalyzer javacc grammar to tokenize "c#" and "c++" without
removing the "#" and "++" parts, using pieces of the grammar from
Nutch.  Now I'd like to make sure I didn't change the way it parses
any other tokens.  thanks,

-Chris

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]