Re: StandardAnalyzer unit tests?
On Jan 17, 2005, at 4:51 AM, Chris Lamprecht wrote: I submitted a testcase -- http://issues.apache.org/bugzilla/show_bug.cgi?id=33134 I reviewed and applied your contributed unit test. Thanks! Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StandardAnalyzer unit tests?
€ 0.02: Indexing code "++" is a stop term, it might be in english text as well. 'C' is a not very descriptive but very valid variable name. '#' is used in some old morse transcripts I think. I am not going to die or get fired, but I'd suggest not including those tokens in a standard anything. Erik Hatcher wrote: > I personally don't have a problem with that change, however I don't > like changing such things as they can lead to unexpected and confusing > issues later. Suppose someone upgrades their version of Lucene without > re-indexing and now queries that used to work no longer work? (sure, I > agree it is wise to re-index if you upgrade Lucene). > > Perhaps others could chime in on whether this change would adversely > affect them or if this a desirable change? > > Erik > > > > On Jan 17, 2005, at 4:51 AM, Chris Lamprecht wrote: > >> Erik, Paul, Daniel, >> >> I submitted a testcase -- >> http://issues.apache.org/bugzilla/show_bug.cgi?id=33134 >> >> On a related note, what do you all think about updating the >> StandardAnalyzer grammar to treat "C#" and "C++" as tokens? It's a >> small modification to the grammar -- NutchAnalysis.jj has it. >> >> -Chris >> >> On Mon, 17 Jan 2005 03:23:41 -0500, Erik Hatcher >> <[EMAIL PROTECTED]> wrote: >> >>> I don't see any tests of StandardAnalyzer either. Your contribution >>> would be most welcome. There are tests that use StandardAnalyzer, but >>> not to test it directly. >>> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. ASML is neither liable for the proper and complete transmission of the information contained in this communication, nor for any delay in its receipt.
Re: StandardAnalyzer unit tests?
I personally don't have a problem with that change, however I don't like changing such things as they can lead to unexpected and confusing issues later. Suppose someone upgrades their version of Lucene without re-indexing and now queries that used to work no longer work? (sure, I agree it is wise to re-index if you upgrade Lucene). Perhaps others could chime in on whether this change would adversely affect them or if this a desirable change? Erik On Jan 17, 2005, at 4:51 AM, Chris Lamprecht wrote: Erik, Paul, Daniel, I submitted a testcase -- http://issues.apache.org/bugzilla/show_bug.cgi?id=33134 On a related note, what do you all think about updating the StandardAnalyzer grammar to treat "C#" and "C++" as tokens? It's a small modification to the grammar -- NutchAnalysis.jj has it. -Chris On Mon, 17 Jan 2005 03:23:41 -0500, Erik Hatcher <[EMAIL PROTECTED]> wrote: I don't see any tests of StandardAnalyzer either. Your contribution would be most welcome. There are tests that use StandardAnalyzer, but not to test it directly. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StandardAnalyzer unit tests?
Erik, Paul, Daniel, I submitted a testcase -- http://issues.apache.org/bugzilla/show_bug.cgi?id=33134 On a related note, what do you all think about updating the StandardAnalyzer grammar to treat "C#" and "C++" as tokens? It's a small modification to the grammar -- NutchAnalysis.jj has it. -Chris On Mon, 17 Jan 2005 03:23:41 -0500, Erik Hatcher <[EMAIL PROTECTED]> wrote: > I don't see any tests of StandardAnalyzer either. Your contribution > would be most welcome. There are tests that use StandardAnalyzer, but > not to test it directly. > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StandardAnalyzer unit tests?
I don't see any tests of StandardAnalyzer either. Your contribution would be most welcome. There are tests that use StandardAnalyzer, but not to test it directly. Erik On Jan 16, 2005, at 11:48 PM, Chris Lamprecht wrote: Does anyone have a unit test for StandardAnalyzer? I've modified the StandardAnalyzer javacc grammar to tokenize "c#" and "c++" without removing the "#" and "++" parts, using pieces of the grammar from Nutch. Now I'd like to make sure I didn't change the way it parses any other tokens. thanks, -Chris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StandardAnalyzer unit tests?
Chris, On Monday 17 January 2005 05:49, Chris Lamprecht wrote: > PS-I didn't find any in lucene CVS head, and I'd be glad to contribute > some unit tests. Under Unix this will give you the cvs head: cvs -d :pserver:[EMAIL PROTECTED]:/home/cvspublic checkout jakarta-lucene The tests are in the jakarta-lucene/src/test directory. There are some tests that might be interesting in the queryParser and analysis directories below the obligatory org/apache/lucene. In case these tests are not covering what you need, or you need help to run the tests, could you continue on lucene-dev? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StandardAnalyzer unit tests?
PS-I didn't find any in lucene CVS head, and I'd be glad to contribute some unit tests. > Does anyone have a unit test for StandardAnalyzer? I've modified the - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
StandardAnalyzer unit tests?
Does anyone have a unit test for StandardAnalyzer? I've modified the StandardAnalyzer javacc grammar to tokenize "c#" and "c++" without removing the "#" and "++" parts, using pieces of the grammar from Nutch. Now I'd like to make sure I didn't change the way it parses any other tokens. thanks, -Chris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]