Does Escaping Really Work?
I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry
Bug in current CVS source with DateField
I found that the current code in CVS prevents a org.apache.lucene.search.DateFilter from functioning properly. This fragment is taken from org.apache.lucene.document.DateField // Pad with leading zeros if (s.length() DATE_LEN) { StringBuffer sb = new StringBuffer(s); while (sb.length() DATE_LEN) sb.insert(0, ' '); s = sb.toString(); } The code is padding ' ' (space) instead of zeros. Line 5 should be: sb.insert(0, '0'); Making this change and recompiling gave the expected results. Looking back, the lucene-1.2 source uses the following fragment: while (s.length() DATE_LEN) s = 0 + s; // pad with leading zeros _ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Does Escaping Really Work?
Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
optimize()
How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: optimize()
This was just mentioned a few days ago. Check the archives. Not needed for indexing, good to do after you are done indexing, as the index reader needs to open and search through less files. Otis --- Leo Galambos [EMAIL PROTECTED] wrote: How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Does Escaping Really Work?
Well, pardon me for breathing, Otis. I didn't make the connection (partly 'cause you changed the subject line). But anyway, I don't understand your rather oblique answer - does escaping work or not? Are you saying that, in order for it to work (the way the docs say it does), I need to insert this module in the chain? Or what? Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 3:07 PM Subject: Re: Does Escaping Really Work? Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Does Escaping Really Work?
My understanding is that escaping may not work (as Terry and I believe) however a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer when parsing a query. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 1:48 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Well, pardon me for breathing, Otis. I didn't make the connection (partly 'cause you changed the subject line). But anyway, I don't understand your rather oblique answer - does escaping work or not? Are you saying that, in order for it to work (the way the docs say it does), I need to insert this module in the chain? Or what? Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 3:07 PM Subject: Re: Does Escaping Really Work? Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: optimize()
Did you try any tests in this area? (figures, charts...) AFAIK reader reads identical number of (giga)bytes. BTW, it could read segments in many threads. I do not see why it would be slower (until you do many delete()-s). If reader opens 1 or 50 files, it is still nothing. -g- On Tue, 26 Nov 2002, Otis Gospodnetic wrote: This was just mentioned a few days ago. Check the archives. Not needed for indexing, good to do after you are done indexing, as the index reader needs to open and search through less files. Otis --- Leo Galambos [EMAIL PROTECTED] wrote: How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Does Escaping Really Work?
Dave, I would say you seem to be right. But this is getting very frustrating. Here is what the Lucene docs say: docs quote Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - || ! ( ) { } [ ] ^ ~ * ? : \ To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \(1\+1\)\:2 /docs quote Is the Lucene documentation in error? Does it work but only using something other than the standard configuration? If so, precisely what non-standard configuration is necessary? Why can't these questions be answered simply and clearly? Terry - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 5:02 PM Subject: RE: Does Escaping Really Work? My understanding is that escaping may not work (as Terry and I believe) however a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer when parsing a query. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 1:48 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Well, pardon me for breathing, Otis. I didn't make the connection (partly 'cause you changed the subject line). But anyway, I don't understand your rather oblique answer - does escaping work or not? Are you saying that, in order for it to work (the way the docs say it does), I need to insert this module in the chain? Or what? Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 3:07 PM Subject: Re: Does Escaping Really Work? Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Does Escaping Really Work?
Documentation is not detailed enough. Analyzers analyze their input (at indexing and searching time). They are just Java classes that do not know about QueryParser.jj, which is the only place where '\' is defined as an escape characters (plus the .java files generated by running QueryParser.jj through JavaCC). Hence, I believe that if your Analyzer is not explicitly instructed to leave '\' alone you will think that escaping doesn't work. Whitespace analyzer I believe works because it doesn't throw out characters like '\', as I think it only splits token on spaces. HTH. Otis --- Terry Steichen [EMAIL PROTECTED] wrote: Dave, I would say you seem to be right. But this is getting very frustrating. Here is what the Lucene docs say: docs quote Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - || ! ( ) { } [ ] ^ ~ * ? : \ To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \(1\+1\)\:2 /docs quote Is the Lucene documentation in error? Does it work but only using something other than the standard configuration? If so, precisely what non-standard configuration is necessary? Why can't these questions be answered simply and clearly? Terry - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 5:02 PM Subject: RE: Does Escaping Really Work? My understanding is that escaping may not work (as Terry and I believe) however a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer when parsing a query. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 1:48 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Well, pardon me for breathing, Otis. I didn't make the connection (partly 'cause you changed the subject line). But anyway, I don't understand your rather oblique answer - does escaping work or not? Are you saying that, in order for it to work (the way the docs say it does), I need to insert this module in the chain? Or what? Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 3:07 PM Subject: Re: Does Escaping Really Work? Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands,
RE: Does Escaping Really Work?
I suspect to dig deeper we'll have to look at QueryParser.jj. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 3:11 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Dave, I would say you seem to be right. But this is getting very frustrating. Here is what the Lucene docs say: docs quote Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - || ! ( ) { } [ ] ^ ~ * ? : \ To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \(1\+1\)\:2 /docs quote Is the Lucene documentation in error? Does it work but only using something other than the standard configuration? If so, precisely what non-standard configuration is necessary? Why can't these questions be answered simply and clearly? Terry - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 5:02 PM Subject: RE: Does Escaping Really Work? My understanding is that escaping may not work (as Terry and I believe) however a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer when parsing a query. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 1:48 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Well, pardon me for breathing, Otis. I didn't make the connection (partly 'cause you changed the subject line). But anyway, I don't understand your rather oblique answer - does escaping work or not? Are you saying that, in order for it to work (the way the docs say it does), I need to insert this module in the chain? Or what? Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 3:07 PM Subject: Re: Does Escaping Really Work? Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: optimize()
I don't know if this answers your question, but I had alot of problems with lucene bombing out with out of memory errors. I was not using the optimize, I tried this and hey presto no more problems. -Original Message- From: Leo Galambos [mailto:[EMAIL PROTECTED]] Sent: Wednesday, 27 November 2002 5:22 AM To: [EMAIL PROTECTED] Subject: optimize() How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Does Escaping Really Work?
I think all you have to do is write your own Analyzer. You can copy one of the supplied ones, and remove the piece that calls isLetter(char) or some similar function. That may be in StandardTokenizer, I can't look at the code now to confirm. If you want to thread certain fields differently (e.g. exception to the rule) you can see an example of such an Analyzer in jGuru's Lucene FAQ. Good luck, Otis --- Terry Steichen [EMAIL PROTECTED] wrote: Yes, Otis - that does help. But a little more advice would help even more. For example, I'm currently using the standard Lucene code without any customization. That means I am using StandardAnalyzer. Internally, what StandardAnalyzer does is (1) create a StandardTokenizer, (2) StandardFilter, (3) LowerCaseFilter, and (4) StopFilter. StandardTokenizer is generated from StandardTokenizer.jj, but when generated, it extends Tokenizer. Now WhitespaceAnalyzer (which you've mentioned several times) creates a WhitespaceTokenizer (which in turn extends CharTokenizer, which extends Tokenizer). This all makes me a bit dizzy, since I don't really understand (and hope I don't have to learn) all the internal Lucene architecture. It would help enormously if you could tell me precisely I have to do to make the escape character work with all the functionality of StandardAnalyzer retained. The WhitespaceAnalyzer - should it be used in lieu of the StandardTokenizer? If so, would any functionality be lost? (It seems like it would lose a ton of functionality to me.) Would it be better to modify StandardTokenizer.jj, and if so, where/how? TIA, Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 6:45 PM Subject: Re: Does Escaping Really Work? Documentation is not detailed enough. Analyzers analyze their input (at indexing and searching time). They are just Java classes that do not know about QueryParser.jj, which is the only place where '\' is defined as an escape characters (plus the .java files generated by running QueryParser.jj through JavaCC). Hence, I believe that if your Analyzer is not explicitly instructed to leave '\' alone you will think that escaping doesn't work. Whitespace analyzer I believe works because it doesn't throw out characters like '\', as I think it only splits token on spaces. HTH. Otis --- Terry Steichen [EMAIL PROTECTED] wrote: Dave, I would say you seem to be right. But this is getting very frustrating. Here is what the Lucene docs say: docs quote Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - || ! ( ) { } [ ] ^ ~ * ? : \ To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \(1\+1\)\:2 /docs quote Is the Lucene documentation in error? Does it work but only using something other than the standard configuration? If so, precisely what non-standard configuration is necessary? Why can't these questions be answered simply and clearly? Terry - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 5:02 PM Subject: RE: Does Escaping Really Work? My understanding is that escaping may not work (as Terry and I believe) however a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer when parsing a query. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 1:48 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Well, pardon me for breathing, Otis. I didn't make the connection (partly 'cause you changed the subject line). But anyway, I don't understand your rather oblique answer - does escaping work or not? Are you saying that, in order for it to work (the way the docs say it does), I need to insert this module in the chain? Or what? Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 3:07 PM Subject: Re: Does Escaping Really Work? Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in