Does Escaping Really Work?

2002-11-26 Thread Terry Steichen
I'm confused about how to use escape characters in Lucene.  My Lucene configuration is 
1.3-dev1 and I use the StandardAnalyzer and QueryParser.  

My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml.  
This field is indexed but not tokenized.  Here are the various queries I've tried and 
their results:

1) When a dash is included in the query, Lucene interprets this as a space. 
(path:1102/a55402-2002nov2.xml is interpreted as  path:1102/a55402 
-body:2002nov2.xml)

2) When a backslash is inserted before the dash (and the query does *not* contain a 
wildcard), Lucene interprets this by inserting a space in lieu of the next character. 
('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml 
[note the space where the dash was]')

3) When a backslash is inserted before the dash (and the query *does* contain a 
wildcard), Lucene interprets this literally, without any conversion. 
(path:1102/55407\-2002nov* is interpreted literally).

4) When a backslash is inserted before the dash and immediately followed by a 
wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: 
Encountered EOF after :)

My overall observation is that it appears it is not possible to escape a dash - is 
this true?

A previous post (yesterday) suggests that it is also not possible to escape a 
backslash.  If that's also true, what characters can be escaped?


Regards,

Terry






Bug in current CVS source with DateField

2002-11-26 Thread Chris D

I found that the current code in CVS prevents a 
org.apache.lucene.search.DateFilter from functioning properly.

This fragment is taken from org.apache.lucene.document.DateField


   // Pad with leading zeros
   if (s.length()  DATE_LEN) {
 StringBuffer sb = new StringBuffer(s);
 while (sb.length()  DATE_LEN)
   sb.insert(0, ' ');
 s = sb.toString();
   }


The code is padding ' ' (space) instead of zeros.

Line 5 should be:  sb.insert(0, '0');

Making this change and recompiling gave the expected results.

Looking back,  the lucene-1.2 source uses the following fragment:

   while (s.length()  DATE_LEN)
 s = 0 + s;  // pad with leading zeros





_
Add photos to your messages with MSN 8. Get 2 months FREE*. 
http://join.msn.com/?page=features/featuredemail


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]



Re: Does Escaping Really Work?

2002-11-26 Thread Otis Gospodnetic
Didn't I just answer this last night?
WhitespaceAnalyzer?

Otis

--- Terry Steichen [EMAIL PROTECTED] wrote:
 I'm confused about how to use escape characters in Lucene.  My Lucene
 configuration is 1.3-dev1 and I use the StandardAnalyzer and
 QueryParser.  
 
 My documents have a field called 'path' with a value like
 1102/a55407-2002nov2.xml.  This field is indexed but not tokenized.
  Here are the various queries I've tried and their results:
 
 1) When a dash is included in the query, Lucene interprets this as a
 space. (path:1102/a55402-2002nov2.xml is interpreted as 
 path:1102/a55402 -body:2002nov2.xml)
 
 2) When a backslash is inserted before the dash (and the query does
 *not* contain a wildcard), Lucene interprets this by inserting a
 space in lieu of the next character.
 ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402
 2002nov2.xml [note the space where the dash was]')
 
 3) When a backslash is inserted before the dash (and the query *does*
 contain a wildcard), Lucene interprets this literally, without any
 conversion. (path:1102/55407\-2002nov* is interpreted literally).
 
 4) When a backslash is inserted before the dash and immediately
 followed by a wildcard, Lucene reports an error.
 ('path:1102/a55407-*'causes lexical error: Encountered EOF
 after :)
 
 My overall observation is that it appears it is not possible to
 escape a dash - is this true?
 
 A previous post (yesterday) suggests that it is also not possible to
 escape a backslash.  If that's also true, what characters can be
 escaped?
 
 
 Regards,
 
 Terry
 
 
 
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




optimize()

2002-11-26 Thread Leo Galambos
How does it affect overall performance, when I do not call optimize()?

THX

-g-



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: optimize()

2002-11-26 Thread Otis Gospodnetic
This was just mentioned a few days ago. Check the archives.
Not needed for indexing, good to do after you are done indexing, as the
index reader needs to open and search through less files.

Otis

--- Leo Galambos [EMAIL PROTECTED] wrote:
 How does it affect overall performance, when I do not call
 optimize()?
 
 THX
 
 -g-
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Does Escaping Really Work?

2002-11-26 Thread Terry Steichen
Well, pardon me for breathing, Otis.

I didn't make the connection (partly 'cause you changed the subject line).
But anyway, I don't understand your rather oblique answer - does escaping
work or not?  Are you saying that, in order for it to work (the way the docs
say it does), I need to insert this module in the chain? Or what?

Terry

- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 26, 2002 3:07 PM
Subject: Re: Does Escaping Really Work?


 Didn't I just answer this last night?
 WhitespaceAnalyzer?

 Otis

 --- Terry Steichen [EMAIL PROTECTED] wrote:
  I'm confused about how to use escape characters in Lucene.  My Lucene
  configuration is 1.3-dev1 and I use the StandardAnalyzer and
  QueryParser.
 
  My documents have a field called 'path' with a value like
  1102/a55407-2002nov2.xml.  This field is indexed but not tokenized.
   Here are the various queries I've tried and their results:
 
  1) When a dash is included in the query, Lucene interprets this as a
  space. (path:1102/a55402-2002nov2.xml is interpreted as
  path:1102/a55402 -body:2002nov2.xml)
 
  2) When a backslash is inserted before the dash (and the query does
  *not* contain a wildcard), Lucene interprets this by inserting a
  space in lieu of the next character.
  ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402
  2002nov2.xml [note the space where the dash was]')
 
  3) When a backslash is inserted before the dash (and the query *does*
  contain a wildcard), Lucene interprets this literally, without any
  conversion. (path:1102/55407\-2002nov* is interpreted literally).
 
  4) When a backslash is inserted before the dash and immediately
  followed by a wildcard, Lucene reports an error.
  ('path:1102/a55407-*'causes lexical error: Encountered EOF
  after :)
 
  My overall observation is that it appears it is not possible to
  escape a dash - is this true?
 
  A previous post (yesterday) suggests that it is also not possible to
  escape a backslash.  If that's also true, what characters can be
  escaped?
 
 
  Regards,
 
  Terry
 
 
 
 


 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com

 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Does Escaping Really Work?

2002-11-26 Thread Spencer, Dave
My understanding is that escaping may not work (as Terry and I believe)
however
 a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer
when
parsing a query.


-Original Message-
From: Terry Steichen [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, November 26, 2002 1:48 PM
To: Lucene Users List
Subject: Re: Does Escaping Really Work?


Well, pardon me for breathing, Otis.

I didn't make the connection (partly 'cause you changed the subject
line).
But anyway, I don't understand your rather oblique answer - does
escaping
work or not?  Are you saying that, in order for it to work (the way the
docs
say it does), I need to insert this module in the chain? Or what?

Terry

- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 26, 2002 3:07 PM
Subject: Re: Does Escaping Really Work?


 Didn't I just answer this last night?
 WhitespaceAnalyzer?

 Otis

 --- Terry Steichen [EMAIL PROTECTED] wrote:
  I'm confused about how to use escape characters in Lucene.  My
Lucene
  configuration is 1.3-dev1 and I use the StandardAnalyzer and
  QueryParser.
 
  My documents have a field called 'path' with a value like
  1102/a55407-2002nov2.xml.  This field is indexed but not
tokenized.
   Here are the various queries I've tried and their results:
 
  1) When a dash is included in the query, Lucene interprets this as a
  space. (path:1102/a55402-2002nov2.xml is interpreted as
  path:1102/a55402 -body:2002nov2.xml)
 
  2) When a backslash is inserted before the dash (and the query does
  *not* contain a wildcard), Lucene interprets this by inserting a
  space in lieu of the next character.
  ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402
  2002nov2.xml [note the space where the dash was]')
 
  3) When a backslash is inserted before the dash (and the query
*does*
  contain a wildcard), Lucene interprets this literally, without any
  conversion. (path:1102/55407\-2002nov* is interpreted literally).
 
  4) When a backslash is inserted before the dash and immediately
  followed by a wildcard, Lucene reports an error.
  ('path:1102/a55407-*'causes lexical error: Encountered EOF
  after :)
 
  My overall observation is that it appears it is not possible to
  escape a dash - is this true?
 
  A previous post (yesterday) suggests that it is also not possible to
  escape a backslash.  If that's also true, what characters can be
  escaped?
 
 
  Regards,
 
  Terry
 
 
 
 


 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com

 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: optimize()

2002-11-26 Thread Leo Galambos
Did you try any tests in this area? (figures, charts...)

AFAIK reader reads identical number of (giga)bytes. BTW, it could read
segments in many threads. I do not see why it would be slower (until you
do many delete()-s). If reader opens 1 or 50 files, it is still nothing.

-g-

On Tue, 26 Nov 2002, Otis Gospodnetic wrote:

 This was just mentioned a few days ago. Check the archives.
 Not needed for indexing, good to do after you are done indexing, as the
 index reader needs to open and search through less files.
 
 Otis
 
 --- Leo Galambos [EMAIL PROTECTED] wrote:
  How does it affect overall performance, when I do not call
  optimize()?
  
  THX
  
  -g-
  
  
  
  --
  To unsubscribe, e-mail:  
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
  
 
 
 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Does Escaping Really Work?

2002-11-26 Thread Terry Steichen
Dave,

I would say you seem to be right.  But this is getting very frustrating.
Here is what the Lucene docs say:

docs quote
Lucene supports escaping special characters that are part of the query
syntax. The current list special characters are

+ -  || ! ( ) { } [ ] ^  ~ * ? : \

To escape these character use the \ before the character. For example to
search for (1+1):2 use the query:

 \(1\+1\)\:2

/docs quote

Is the Lucene documentation in error?  Does it work but only using something
other than the standard configuration?  If so, precisely what non-standard
configuration is necessary?

Why can't these questions be answered simply and clearly?

Terry


- Original Message -
From: Spencer, Dave [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 26, 2002 5:02 PM
Subject: RE: Does Escaping Really Work?


My understanding is that escaping may not work (as Terry and I believe)
however
 a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer
when
parsing a query.


-Original Message-
From: Terry Steichen [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, November 26, 2002 1:48 PM
To: Lucene Users List
Subject: Re: Does Escaping Really Work?


Well, pardon me for breathing, Otis.

I didn't make the connection (partly 'cause you changed the subject
line).
But anyway, I don't understand your rather oblique answer - does
escaping
work or not?  Are you saying that, in order for it to work (the way the
docs
say it does), I need to insert this module in the chain? Or what?

Terry

- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 26, 2002 3:07 PM
Subject: Re: Does Escaping Really Work?


 Didn't I just answer this last night?
 WhitespaceAnalyzer?

 Otis

 --- Terry Steichen [EMAIL PROTECTED] wrote:
  I'm confused about how to use escape characters in Lucene.  My
Lucene
  configuration is 1.3-dev1 and I use the StandardAnalyzer and
  QueryParser.
 
  My documents have a field called 'path' with a value like
  1102/a55407-2002nov2.xml.  This field is indexed but not
tokenized.
   Here are the various queries I've tried and their results:
 
  1) When a dash is included in the query, Lucene interprets this as a
  space. (path:1102/a55402-2002nov2.xml is interpreted as
  path:1102/a55402 -body:2002nov2.xml)
 
  2) When a backslash is inserted before the dash (and the query does
  *not* contain a wildcard), Lucene interprets this by inserting a
  space in lieu of the next character.
  ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402
  2002nov2.xml [note the space where the dash was]')
 
  3) When a backslash is inserted before the dash (and the query
*does*
  contain a wildcard), Lucene interprets this literally, without any
  conversion. (path:1102/55407\-2002nov* is interpreted literally).
 
  4) When a backslash is inserted before the dash and immediately
  followed by a wildcard, Lucene reports an error.
  ('path:1102/a55407-*'causes lexical error: Encountered EOF
  after :)
 
  My overall observation is that it appears it is not possible to
  escape a dash - is this true?
 
  A previous post (yesterday) suggests that it is also not possible to
  escape a backslash.  If that's also true, what characters can be
  escaped?
 
 
  Regards,
 
  Terry
 
 
 
 


 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com

 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Does Escaping Really Work?

2002-11-26 Thread Otis Gospodnetic
Documentation is not detailed enough.
Analyzers analyze their input (at indexing and searching time).
They are just Java classes that do not know about QueryParser.jj, which
is the only place where '\' is defined as an escape characters (plus
the .java files generated by running QueryParser.jj through JavaCC).
Hence, I believe that if your Analyzer is not explicitly instructed to
leave '\' alone you will think that escaping doesn't work.
Whitespace analyzer I believe works because it doesn't throw out
characters like '\', as I think it only splits token on spaces.

HTH.
Otis


--- Terry Steichen [EMAIL PROTECTED] wrote:
 Dave,
 
 I would say you seem to be right.  But this is getting very
 frustrating.
 Here is what the Lucene docs say:
 
 docs quote
 Lucene supports escaping special characters that are part of the
 query
 syntax. The current list special characters are
 
 + -  || ! ( ) { } [ ] ^  ~ * ? : \
 
 To escape these character use the \ before the character. For example
 to
 search for (1+1):2 use the query:
 
  \(1\+1\)\:2
 
 /docs quote
 
 Is the Lucene documentation in error?  Does it work but only using
 something
 other than the standard configuration?  If so, precisely what
 non-standard
 configuration is necessary?
 
 Why can't these questions be answered simply and clearly?
 
 Terry
 
 
 - Original Message -
 From: Spencer, Dave [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Tuesday, November 26, 2002 5:02 PM
 Subject: RE: Does Escaping Really Work?
 
 
 My understanding is that escaping may not work (as Terry and I
 believe)
 however
  a workaround for most 'reasonable' cases is to use
 WhitespaceAnalyzer
 when
 parsing a query.
 
 
 -Original Message-
 From: Terry Steichen [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, November 26, 2002 1:48 PM
 To: Lucene Users List
 Subject: Re: Does Escaping Really Work?
 
 
 Well, pardon me for breathing, Otis.
 
 I didn't make the connection (partly 'cause you changed the subject
 line).
 But anyway, I don't understand your rather oblique answer - does
 escaping
 work or not?  Are you saying that, in order for it to work (the way
 the
 docs
 say it does), I need to insert this module in the chain? Or what?
 
 Terry
 
 - Original Message -
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Tuesday, November 26, 2002 3:07 PM
 Subject: Re: Does Escaping Really Work?
 
 
  Didn't I just answer this last night?
  WhitespaceAnalyzer?
 
  Otis
 
  --- Terry Steichen [EMAIL PROTECTED] wrote:
   I'm confused about how to use escape characters in Lucene.  My
 Lucene
   configuration is 1.3-dev1 and I use the StandardAnalyzer and
   QueryParser.
  
   My documents have a field called 'path' with a value like
   1102/a55407-2002nov2.xml.  This field is indexed but not
 tokenized.
Here are the various queries I've tried and their results:
  
   1) When a dash is included in the query, Lucene interprets this
 as a
   space. (path:1102/a55402-2002nov2.xml is interpreted as
   path:1102/a55402 -body:2002nov2.xml)
  
   2) When a backslash is inserted before the dash (and the query
 does
   *not* contain a wildcard), Lucene interprets this by inserting a
   space in lieu of the next character.
   ('path:1102/a55402\-2002nov2.xml' interpreted as
 'path:1102/a55402
   2002nov2.xml [note the space where the dash was]')
  
   3) When a backslash is inserted before the dash (and the query
 *does*
   contain a wildcard), Lucene interprets this literally, without
 any
   conversion. (path:1102/55407\-2002nov* is interpreted
 literally).
  
   4) When a backslash is inserted before the dash and immediately
   followed by a wildcard, Lucene reports an error.
   ('path:1102/a55407-*'causes lexical error: Encountered EOF
   after :)
  
   My overall observation is that it appears it is not possible to
   escape a dash - is this true?
  
   A previous post (yesterday) suggests that it is also not possible
 to
   escape a backslash.  If that's also true, what characters can be
   escaped?
  
  
   Regards,
  
   Terry
  
  
  
  
 
 
  __
  Do you Yahoo!?
  Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
  http://mailplus.yahoo.com
 
  --
  To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 
 --
 To unsubscribe, e-mail:
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, 

RE: Does Escaping Really Work?

2002-11-26 Thread Spencer, Dave
I suspect to dig deeper we'll have to look
at QueryParser.jj.

-Original Message-
From: Terry Steichen [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, November 26, 2002 3:11 PM
To: Lucene Users List
Subject: Re: Does Escaping Really Work?


Dave,

I would say you seem to be right.  But this is getting very frustrating.
Here is what the Lucene docs say:

docs quote
Lucene supports escaping special characters that are part of the query
syntax. The current list special characters are

+ -  || ! ( ) { } [ ] ^  ~ * ? : \

To escape these character use the \ before the character. For example to
search for (1+1):2 use the query:

 \(1\+1\)\:2

/docs quote

Is the Lucene documentation in error?  Does it work but only using
something
other than the standard configuration?  If so, precisely what
non-standard
configuration is necessary?

Why can't these questions be answered simply and clearly?

Terry


- Original Message -
From: Spencer, Dave [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 26, 2002 5:02 PM
Subject: RE: Does Escaping Really Work?


My understanding is that escaping may not work (as Terry and I believe)
however
 a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer
when
parsing a query.


-Original Message-
From: Terry Steichen [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, November 26, 2002 1:48 PM
To: Lucene Users List
Subject: Re: Does Escaping Really Work?


Well, pardon me for breathing, Otis.

I didn't make the connection (partly 'cause you changed the subject
line).
But anyway, I don't understand your rather oblique answer - does
escaping
work or not?  Are you saying that, in order for it to work (the way the
docs
say it does), I need to insert this module in the chain? Or what?

Terry

- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 26, 2002 3:07 PM
Subject: Re: Does Escaping Really Work?


 Didn't I just answer this last night?
 WhitespaceAnalyzer?

 Otis

 --- Terry Steichen [EMAIL PROTECTED] wrote:
  I'm confused about how to use escape characters in Lucene.  My
Lucene
  configuration is 1.3-dev1 and I use the StandardAnalyzer and
  QueryParser.
 
  My documents have a field called 'path' with a value like
  1102/a55407-2002nov2.xml.  This field is indexed but not
tokenized.
   Here are the various queries I've tried and their results:
 
  1) When a dash is included in the query, Lucene interprets this as a
  space. (path:1102/a55402-2002nov2.xml is interpreted as
  path:1102/a55402 -body:2002nov2.xml)
 
  2) When a backslash is inserted before the dash (and the query does
  *not* contain a wildcard), Lucene interprets this by inserting a
  space in lieu of the next character.
  ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402
  2002nov2.xml [note the space where the dash was]')
 
  3) When a backslash is inserted before the dash (and the query
*does*
  contain a wildcard), Lucene interprets this literally, without any
  conversion. (path:1102/55407\-2002nov* is interpreted literally).
 
  4) When a backslash is inserted before the dash and immediately
  followed by a wildcard, Lucene reports an error.
  ('path:1102/a55407-*'causes lexical error: Encountered EOF
  after :)
 
  My overall observation is that it appears it is not possible to
  escape a dash - is this true?
 
  A previous post (yesterday) suggests that it is also not possible to
  escape a backslash.  If that's also true, what characters can be
  escaped?
 
 
  Regards,
 
  Terry
 
 
 
 


 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com

 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: optimize()

2002-11-26 Thread Stephen Eaton
I don't know if this answers your question, but I had alot of problems
with lucene bombing out with out of memory errors.  I was not using the
optimize, I tried this and hey presto no more problems.

-Original Message-
From: Leo Galambos [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 27 November 2002 5:22 AM
To: [EMAIL PROTECTED]
Subject: optimize()


How does it affect overall performance, when I do not call optimize()?

THX

-g-



--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Does Escaping Really Work?

2002-11-26 Thread Otis Gospodnetic
I think all you have to do is write your own Analyzer.
You can copy one of the supplied ones, and remove the piece that calls
isLetter(char) or some similar function.  That may be in
StandardTokenizer, I can't look at the code now to confirm.
If you want to thread certain fields differently (e.g. exception to the
rule) you can see an example of such an Analyzer in jGuru's Lucene FAQ.

Good luck,
Otis

--- Terry Steichen [EMAIL PROTECTED] wrote:
 Yes, Otis - that does help.  But a little more advice would help even
 more.
 
 For example, I'm currently using the standard Lucene code without any
 customization.  That means I am using StandardAnalyzer.  Internally,
 what
 StandardAnalyzer does is (1) create a StandardTokenizer, (2)
 StandardFilter,
 (3) LowerCaseFilter, and (4) StopFilter.  StandardTokenizer is
 generated
 from StandardTokenizer.jj, but when generated, it extends Tokenizer.
 
 Now WhitespaceAnalyzer (which you've mentioned several times) creates
 a
 WhitespaceTokenizer (which in turn extends CharTokenizer, which
 extends
 Tokenizer).
 
 This all makes me a bit dizzy, since I don't really understand (and
 hope I
 don't have to learn) all the internal Lucene architecture.  It would
 help
 enormously if you could tell me precisely I have to do to make the
 escape
 character work with all the functionality of StandardAnalyzer
 retained.  The
 WhitespaceAnalyzer - should it be used in lieu of the
 StandardTokenizer?  If
 so, would any functionality be lost?  (It seems like it would lose a
 ton of
 functionality to me.)  Would it be better to modify
 StandardTokenizer.jj,
 and if so, where/how?
 
 TIA,
 
 Terry
 
 - Original Message -
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Tuesday, November 26, 2002 6:45 PM
 Subject: Re: Does Escaping Really Work?
 
 
  Documentation is not detailed enough.
  Analyzers analyze their input (at indexing and searching time).
  They are just Java classes that do not know about QueryParser.jj,
 which
  is the only place where '\' is defined as an escape characters
 (plus
  the .java files generated by running QueryParser.jj through
 JavaCC).
  Hence, I believe that if your Analyzer is not explicitly instructed
 to
  leave '\' alone you will think that escaping doesn't work.
  Whitespace analyzer I believe works because it doesn't throw out
  characters like '\', as I think it only splits token on spaces.
 
  HTH.
  Otis
 
 
  --- Terry Steichen [EMAIL PROTECTED] wrote:
   Dave,
  
   I would say you seem to be right.  But this is getting very
   frustrating.
   Here is what the Lucene docs say:
  
   docs quote
   Lucene supports escaping special characters that are part of the
   query
   syntax. The current list special characters are
  
   + -  || ! ( ) { } [ ] ^  ~ * ? : \
  
   To escape these character use the \ before the character. For
 example
   to
   search for (1+1):2 use the query:
  
\(1\+1\)\:2
  
   /docs quote
  
   Is the Lucene documentation in error?  Does it work but only
 using
   something
   other than the standard configuration?  If so, precisely what
   non-standard
   configuration is necessary?
  
   Why can't these questions be answered simply and clearly?
  
   Terry
  
  
   - Original Message -
   From: Spencer, Dave [EMAIL PROTECTED]
   To: Lucene Users List [EMAIL PROTECTED]
   Sent: Tuesday, November 26, 2002 5:02 PM
   Subject: RE: Does Escaping Really Work?
  
  
   My understanding is that escaping may not work (as Terry and I
   believe)
   however
a workaround for most 'reasonable' cases is to use
   WhitespaceAnalyzer
   when
   parsing a query.
  
  
   -Original Message-
   From: Terry Steichen [mailto:[EMAIL PROTECTED]]
   Sent: Tuesday, November 26, 2002 1:48 PM
   To: Lucene Users List
   Subject: Re: Does Escaping Really Work?
  
  
   Well, pardon me for breathing, Otis.
  
   I didn't make the connection (partly 'cause you changed the
 subject
   line).
   But anyway, I don't understand your rather oblique answer - does
   escaping
   work or not?  Are you saying that, in order for it to work (the
 way
   the
   docs
   say it does), I need to insert this module in the chain? Or what?
  
   Terry
  
   - Original Message -
   From: Otis Gospodnetic [EMAIL PROTECTED]
   To: Lucene Users List [EMAIL PROTECTED]
   Sent: Tuesday, November 26, 2002 3:07 PM
   Subject: Re: Does Escaping Really Work?
  
  
Didn't I just answer this last night?
WhitespaceAnalyzer?
   
Otis
   
--- Terry Steichen [EMAIL PROTECTED] wrote:
 I'm confused about how to use escape characters in Lucene. 
 My
   Lucene
 configuration is 1.3-dev1 and I use the StandardAnalyzer and
 QueryParser.

 My documents have a field called 'path' with a value like
 1102/a55407-2002nov2.xml.  This field is indexed but not
   tokenized.
  Here are the various queries I've tried and their results:

 1) When a dash is included in