RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
Great info Morus,
 
After making the escape the dash change to the QueryParser:
 
Query query = QueryParser.parse(+category:HW\\-NCI_TOPICS AND SPACE,
  description,
  analyzer);
  Hits hits = searcher.search(query);
  System.out.println(query.ToString =  + query.toString(description));
  assertEquals(HW-NCI_TOPICS kept as-is,
   +category:HW\\-NCI_TOPICS +space, query.toString(description)); 
 --note that this passes with the escape put in, so not as-is.
  assertEquals(doc found!, 1, hits.length());
 
I'm still getting this output:
 
 domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
 
query.ToString = +category:HW\-NCI_TOPICS +space
 
junit.framework.AssertionFailedError: doc found! expected:1 but was:0
 
It look like bug, http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 , was fixed today:
 
--- Additional Comments From Otis Gospodnetic mailto:[EMAIL PROTECTED]  
2004-03-24 10:10 ---

Although tft-monitor should not really result in a phrase query tft monitor, I
agree that this is better than converting it to tft AND NOT monitor (tft -monitor).
Moreover, I have seen query syntax where '-' characters are used for phrase
queries instead or in addition to quotes, so one could use either morus-walter
or morus walter.

I applied your change, as it doesn't look like it breaks anything, and I hope
nobody relied on ill behaviour where tft-monitor would result in AND NOT query.
---
But I assume this fix won't come out for some time.  Is there a way I can get this fix 
sooner?  
I'm up against a deadline and would very much like this functionality. 
 
And to go one more step with the KeywordAnalyzer that I wrote, changing this method to 
skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
query.ToString = +category:HW -NCI_TOPICS +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:HW -NCI_TOPICS +space   note space where escape was.
thanks,
chad.

-Original Message- 
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Wed 3/24/2004 1:43 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



Chad Small writes:
 Here is my attempt at a KeywordAnalyzer - although is not working?  Excuse 
the length of the message, but wanted to give actual code.
 
 With this output:
 
 Analzying HW-NCI_TOPICS
  org.apache.lucene.analysis.WhitespaceAnalyzer:
   [HW-NCI_TOPICS]
  org.apache.lucene.analysis.SimpleAnalyzer:
   [hw] [nci] [topics]
  org.apache.lucene.analysis.StopAnalyzer:
   [hw] [nci] [topics]
  org.apache.lucene.analysis.standard.StandardAnalyzer:
   [hw] [nci] [topics]
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS]
 
 query.ToString = category:HW -nci topics +space

 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
 Expected:+category:HW-NCI_TOPICS +space
 Actual  :category:HW -nci topics +space
 

Well query parser does not allow `-' within words currently.
So before your analyzer is called, query parser reads one word HW, a `-'
operator, one word NCI_TOPICS.
The latter is analyzed as nci topics because it's not in field category
anymore, I guess.

I suggested to change this. See
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491

Either you escape the - using category:HW\-NCI_TOPICS in your query
(untested. and I don't know where the escape character will be removed)
or you apply my suggested change.

Another option for using keywords with query parser might be adding a
keyword syntax to the query parser.
Something like category:key(HW-NCI_TOPICS) or category=HW-NCI_TOPICS.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Morus Walter
Hi Chad,

 But I assume this fix won't come out for some time.  Is there a way I can get this 
 fix sooner?  
 I'm up against a deadline and would very much like this functionality. 

Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think
this can be found in the archives.

  
 And to go one more step with the KeywordAnalyzer that I wrote, changing this method 
 to skip the escape:
 protected boolean isTokenChar(char c)
 {
  if (c == '\\')
  {
 return false;
  }
  else
  {
 return true;
  }
   }
 The test then returns with a space:
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS] 
 query.ToString = +category:HW -NCI_TOPICS +space
 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
 Expected:+category:HW\-NCI_TOPICS +space
 Actual  :+category:HW -NCI_TOPICS +space   note space where escape was.

Sure. If \ isn't a token char, it end's the token.
So you will have to look for a different way of implementing the
analyzer. Shouldn't be that difficult since you have only one token.

Maybe it should be the job of the query parser to remove the escape character
(would make more sense to me at least) but that would be another change
of the query parser...

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
thanks.  I was in the process of getting javacc3.2 setup.  I'll have to hunt for 2.x.
 
chad.

-Original Message- 
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Wed 3/24/2004 8:00 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



Hi Chad,

 But I assume this fix won't come out for some time.  Is there a way I can 
get this fix sooner? 
 I'm up against a deadline and would very much like this functionality.

Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think
this can be found in the archives.

 
 And to go one more step with the KeywordAnalyzer that I wrote, changing this 
method to skip the escape:
 protected boolean isTokenChar(char c)
 {
  if (c == '\\')
  {
 return false;
  }
  else
  {
 return true;
  }
   }
 The test then returns with a space:
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS]
 query.ToString = +category:HW -NCI_TOPICS +space
 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
 Expected:+category:HW\-NCI_TOPICS +space
 Actual  :+category:HW -NCI_TOPICS +space   note space where escape 
was.

Sure. If \ isn't a token char, it end's the token.
So you will have to look for a different way of implementing the
analyzer. Shouldn't be that difficult since you have only one token.

Maybe it should be the job of the query parser to remove the escape character
(would make more sense to me at least) but that would be another change
of the query parser...

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
For others reference - here is the old version url:
 
https://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=212

-Original Message- 
From: Chad Small 
Sent: Wed 3/24/2004 8:07 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



thanks.  I was in the process of getting javacc3.2 setup.  I'll have to hunt 
for 2.x.

chad.

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED]
Sent: Wed 3/24/2004 8:00 AM
To: Lucene Users List
Cc:
Subject: RE: Query syntax on Keyword field question
   
   

Hi Chad,
   
 But I assume this fix won't come out for some time.  Is there a way 
I can get this fix sooner?
 I'm up against a deadline and would very much like this 
functionality.
   
Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I 
think
this can be found in the archives.
   

 And to go one more step with the KeywordAnalyzer that I wrote, 
changing this method to skip the escape:
 protected boolean isTokenChar(char c)
 {
  if (c == '\\')
  {
 return false;
  }
  else
  {
 return true;
  }
   }
 The test then returns with a space:
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS]
 query.ToString = +category:HW -NCI_TOPICS +space
 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
 Expected:+category:HW\-NCI_TOPICS +space
 Actual  :+category:HW -NCI_TOPICS +space   note space where 
escape was.
   
Sure. If \ isn't a token char, it end's the token.
So you will have to look for a different way of implementing the
analyzer. Shouldn't be that difficult since you have only one token.
   
Maybe it should be the job of the query parser to remove the escape 
character
(would make more sense to me at least) but that would be another change
of the query parser...
   
Morus
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Otis Gospodnetic
JavaCC 3.2 works for me.

Otis

--- Chad Small [EMAIL PROTECTED] wrote:
 thanks.  I was in the process of getting javacc3.2 setup.  I'll have
 to hunt for 2.x.
  
 chad.
 
   -Original Message- 
   From: Morus Walter [mailto:[EMAIL PROTECTED] 
   Sent: Wed 3/24/2004 8:00 AM 
   To: Lucene Users List 
   Cc: 
   Subject: RE: Query syntax on Keyword field question
   
   
 
   Hi Chad,
   
But I assume this fix won't come out for some time.  Is there a
 way I can get this fix sooner? 
I'm up against a deadline and would very much like this
 functionality.
   
   Just get lucenes sources, change the line and recompile.
   The difficult part is to get a copy of JavaCC 2 (3 won't do), but I
 think
   this can be found in the archives.
   

And to go one more step with the KeywordAnalyzer that I wrote,
 changing this method to skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS]
query.ToString = +category:HW -NCI_TOPICS +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:HW -NCI_TOPICS +space   note space where
 escape was.
   
   Sure. If \ isn't a token char, it end's the token.
   So you will have to look for a different way of implementing the
   analyzer. Shouldn't be that difficult since you have only one token.
   
   Maybe it should be the job of the query parser to remove the escape
 character
   (would make more sense to me at least) but that would be another
 change
   of the query parser...
   
   Morus
   
 
 -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
   
   
 
 
-
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
I'm getting this with 3.2:
 
javacc-check:
BUILD FAILED
file:D:/applications/lucene-1.3-final/build.xml:97:
  ##
  JavaCC not found.
  JavaCC Home: /applications/javacc-3.2/bin
  JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
  Please download and install JavaCC from:
  http://javacc.dev.java.net
  Then, create a build.properties file either in your home
  directory, or within the Lucene directory and set the javacc.home
  property to the path where JavaCC is installed. For example,
  if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
  javacc.home property to:
  javacc.home=/usr/local/java/javacc-3.2
  If you get an error like the one below, then you have not installed
  things correctly. Please check all your paths and try again.
  java.lang.NoClassDefFoundError: org.javacc.parser.Main
  ##
 
even though I put a build.properties file in my root lucene directory with this in it:
javacc.home=/applications/javacc-3.2/bin
 
hmm?

-Original Message- 
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Wed 3/24/2004 8:29 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



JavaCC 3.2 works for me.

Otis

--- Chad Small [EMAIL PROTECTED] wrote:
 thanks.  I was in the process of getting javacc3.2 setup.  I'll have
 to hunt for 2.x.
 
 chad.

   -Original Message-
   From: Morus Walter [mailto:[EMAIL PROTECTED]
   Sent: Wed 3/24/2004 8:00 AM
   To: Lucene Users List
   Cc:
   Subject: RE: Query syntax on Keyword field question
  
  

   Hi Chad,
  
But I assume this fix won't come out for some time.  Is there a
 way I can get this fix sooner?
I'm up against a deadline and would very much like this
 functionality.
  
   Just get lucenes sources, change the line and recompile.
   The difficult part is to get a copy of JavaCC 2 (3 won't do), but I
 think
   this can be found in the archives.
  
   
And to go one more step with the KeywordAnalyzer that I wrote,
 changing this method to skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS]
query.ToString = +category:HW -NCI_TOPICS +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:HW -NCI_TOPICS +space   note space where
 escape was.
  
   Sure. If \ isn't a token char, it end's the token.
   So you will have to look for a different way of implementing the
   analyzer. Shouldn't be that difficult since you have only one token.
  
   Maybe it should be the job of the query parser to remove the escape
 character
   (would make more sense to me at least) but that would be another
 change
   of the query parser...
  
   Morus
  

 -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  

 
-
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Morus Walter
Chad Small writes:
 I'm getting this with 3.2:
  
 javacc-check:
 BUILD FAILED
 file:D:/applications/lucene-1.3-final/build.xml:97:
   ##
   JavaCC not found.
   JavaCC Home: /applications/javacc-3.2/bin
   JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
   Please download and install JavaCC from:
   http://javacc.dev.java.net
   Then, create a build.properties file either in your home
   directory, or within the Lucene directory and set the javacc.home
   property to the path where JavaCC is installed. For example,
   if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
   javacc.home property to:
   javacc.home=/usr/local/java/javacc-3.2
   If you get an error like the one below, then you have not installed
   things correctly. Please check all your paths and try again.
   java.lang.NoClassDefFoundError: org.javacc.parser.Main
   ##
  
 even though I put a build.properties file in my root lucene directory with this in 
 it:
 javacc.home=/applications/javacc-3.2/bin
  
I never tried javacc 3.2 but I thought there were issues with query parser
and/or standard analyzer.
Seems I'm wrong or outdated.

In your case the problem seems to be installation of javacc.

I guess the /bin directory should not be part of javacc.home.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
Ahh, without the bin on the javacc.home - 3.2 seems to work for me to.

-Original Message- 
From: Chad Small 
Sent: Wed 3/24/2004 8:34 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



I'm getting this with 3.2:

javacc-check:
BUILD FAILED
file:D:/applications/lucene-1.3-final/build.xml:97:
  ##
  JavaCC not found.
  JavaCC Home: /applications/javacc-3.2/bin
  JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
  Please download and install JavaCC from:
  http://javacc.dev.java.net
  Then, create a build.properties file either in your home
  directory, or within the Lucene directory and set the javacc.home
  property to the path where JavaCC is installed. For example,
  if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
  javacc.home property to:
  javacc.home=/usr/local/java/javacc-3.2
  If you get an error like the one below, then you have not installed
  things correctly. Please check all your paths and try again.
  java.lang.NoClassDefFoundError: org.javacc.parser.Main
  ##

even though I put a build.properties file in my root lucene directory with 
this in it:
javacc.home=/applications/javacc-3.2/bin

hmm?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wed 3/24/2004 8:29 AM
To: Lucene Users List
Cc:
Subject: RE: Query syntax on Keyword field question
   
   

JavaCC 3.2 works for me.
   
Otis
   
--- Chad Small [EMAIL PROTECTED] wrote:
 thanks.  I was in the process of getting javacc3.2 setup.  I'll have
 to hunt for 2.x.

 chad.

   -Original Message-
   From: Morus Walter [mailto:[EMAIL PROTECTED]
   Sent: Wed 3/24/2004 8:00 AM
   To: Lucene Users List
   Cc:
   Subject: RE: Query syntax on Keyword field question
 
 

   Hi Chad,
 
But I assume this fix won't come out for some time.  Is 
there a
 way I can get this fix sooner?
I'm up against a deadline and would very much like this
 functionality.
 
   Just get lucenes sources, change the line and recompile.
   The difficult part is to get a copy of JavaCC 2 (3 won't do), 
but I
 think
   this can be found in the archives.
 
   
And to go one more step with the KeywordAnalyzer that I 
wrote,
 changing this method to skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS]
query.ToString = +category:HW -NCI_TOPICS +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:HW -NCI_TOPICS +space   note space 
where
 escape was.
 
   Sure. If \ isn't a token char, it end's the token.
   So you will have to look for a different way of implementing 
the
   analyzer. Shouldn't be that difficult since you have only one 
token.
 
   Maybe it should be the job of the query parser to remove the 
escape
 character
   (would make more sense to me at least) but that would be 
another
 change
   of the 

lucene usage without website

2004-03-24 Thread Pleasant, Tracy

I want to create a knowledgebase but it needs to be something that does
not require a server to run constantly (like with using jsp). I just
needs to run on the Windows platform.  Lucene works well with Windows
using an applet right?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: lucene usage without website

2004-03-24 Thread Cocula Remi
Lucene is not dedicated to a special application type. 
Your can integrate it's fonctionnalities in any program that can invoke java APIs.

In particular I don't think that Lucene can be invoked from an applet as the applet 
API does not permit to read and write local files.



-Message d'origine-
De : Pleasant, Tracy [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 24 mars 2004 17:41
À : Lucene Users List
Objet : lucene usage without website



I want to create a knowledgebase but it needs to be something that does
not require a server to run constantly (like with using jsp). I just
needs to run on the Windows platform.  Lucene works well with Windows
using an applet right?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



multiple indices seacher

2004-03-24 Thread hui
Hi,

The MultiSearcher 1.3 final keeps throwing exception when rewriting query.

java.lang.UnsupportedOperationException
org.apache.lucene.search.Query:combine:139
org.apache.lucene.search.MultiSearcher:rewrite:203

I still use the Query object before the rewriting, so the search seems
working fine.

Does anyone know how to avoid this problem?  Thx. I have to call rewrite
in order to avoid the cached searcher's I/O problem.

Regards,
Hui



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



analyzer for word perfect?

2004-03-24 Thread Charlie Smith
Is there an analyzer for WordPerfect files?

I have a need to be able to index WP files as well as MS files, pdfs, etc.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query syntax on Keyword field question

2004-03-24 Thread Incze Lajos
On Tue, Mar 23, 2004 at 08:48:11PM -0600, Chad Small wrote:
 Thanks-you Erik and Incze.  I now understand the issue
 and I'm trying to create a KeywordAnalyzer as suggested
 from you book excerpt, Erik:
  
 http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727
  
 However, not being all that familiar with the Analyzer framework,
 I'm not sure how to implement the KeywordAnalyzer even though
 it might be trivial :)  Any hints, code, or messages to look at?
  

Actually, what I've written was not an analyzer, but a NotTokenizingTokenizer,
as I have a very specia analyzer (different needs for different
field catgories) and this is used in that (the code is far from the
phase of any kind of optimization, but you can see the logic):

---
package hu.emnl.lucene.analyzer;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.Tokenizer;

public class NotTokenizingTokenizer extends Tokenizer {

public NotTokenizingTokenizer() {
super();
}

public NotTokenizingTokenizer(Reader input) {
super(input);
}

public Token next() throws IOException {
Token t = null;
int c = input.read();
if (c = 0) {
StringBuffer sb = new StringBuffer();  
do {
sb.append((char) c);
c = input.read();
} while (c = 0);
t = new Token(new String(sb), 0, sb.length());
}
return t;
}
}
---

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching for a phrase that contains quote character

2004-03-24 Thread danrapp
I'd like to search for a phrase that contains the quote character. I've tried 
escaping the quote character, but am receiving a ParseException from the 
QueryParser:

For example to search for the phrase:

 this is a test

I'm trying the following

 QueryParser.parse(field:\This is a \\\test, field, new 
StandardAnalyzer());

This results in:

org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 31.  
Encountered: EOF after : 
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
...

What is the proper way to accomplish this?

--Dan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Cannot access hits

2004-03-24 Thread Russell S Koonts




Greetings.  I have recently had to re-install my web server.  Once
completed, however, I cannot get the Lucene search to work. It worked
before the crash and it works on my laptop.  When conducting searches now,
I get the following message:

org.apache.cocoon.ProcessingException: Cannot access hits:
java.io.IOException: Permission denied

for full message see:

http://archives.mc.duke.edu/search?queryString=Davison

Can anyone suggest a place to start looking to add the correct permissions?

Thank you,

Russell


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Changes to QueryParser.jj: Status?

2004-03-24 Thread Ravi Rao
Dear All,

Some time ago there was a discussion on modifying the definitions of
tokens in QueryParser so that the character '-' (dash), and others,
will be treated as part of a word.

Can someone please tell me the status of that discussion.  Will these
changes actually be reflected in the code...soon?

Thanks,
-- 
Ravi/

PS: The title of the thread in the previous discussion was
'Problem with search results'

Ravi(ndra) Rao
AlterPoint Inc.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Cannot access hits

2004-03-24 Thread Otis Gospodnetic
The source of your problem is simple UNIX permission:

java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:688)
at org.apache.lucene.store.FSDirectory$1.obtain(Unknown Source)

Figure out what directory Java's java.io.tmpdir system property points
to, and make sure that directory is writable by the user that runs that
Tomcat server.

Otis



--- Russell S Koonts [EMAIL PROTECTED] wrote:
 
 
 
 
 Greetings.  I have recently had to re-install my web server.  Once
 completed, however, I cannot get the Lucene search to work. It worked
 before the crash and it works on my laptop.  When conducting searches
 now,
 I get the following message:
 
 org.apache.cocoon.ProcessingException: Cannot access hits:
 java.io.IOException: Permission denied
 
 for full message see:
 
 http://archives.mc.duke.edu/search?queryString=Davison
 
 Can anyone suggest a place to start looking to add the correct
 permissions?
 
 Thank you,
 
 Russell
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: analyzer for word perfect?

2004-03-24 Thread Otis Gospodnetic
I just finished writing a chapter for Lucene in Action that deals with
that.

PDF: pdfbox.org
MS Word/Excel: jakarta.apache.org/poi
WP: http://www.google.com/search?q=java+word+perfect+parser

Note that what you need are parsers.  The term Analyzer has a special
meaning in Lucene realm.

Otis


--- Charlie Smith [EMAIL PROTECTED] wrote:
 Is there an analyzer for WordPerfect files?
 
 I have a need to be able to index WP files as well as MS files, pdfs,
 etc.
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Changes to QueryParser.jj: Status?

2004-03-24 Thread Otis Gospodnetic
I committed those changes to CVS today.  There is a bug entry in
Bugzilla from Morus Walter, which is now marked as fixed.

Otis

--- Ravi Rao [EMAIL PROTECTED] wrote:
 Dear All,
 
 Some time ago there was a discussion on modifying the definitions of
 tokens in QueryParser so that the character '-' (dash), and others,
 will be treated as part of a word.
 
 Can someone please tell me the status of that discussion.  Will these
 changes actually be reflected in the code...soon?
 
 Thanks,
 -- 
 Ravi/
 
 PS: The title of the thread in the previous discussion was
 'Problem with search results'
 
 Ravi(ndra) Rao
 AlterPoint Inc.
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Zero hits for queries ending with a number

2004-03-24 Thread Morris Mizrahi
Thanks to Otis, Morus, and Erik for their responses to my question.

I see that my question is also related to the posting: Query syntax on
Keyword field question.

I tried all of your suggestions. 
When using:
a) the tokens generated by the analyzer and
b) the parsed query (using the to_string method).
to debug StandardAnalyzer, I saw that it does properly pass in the
string with the number attached to it. I don't understand why Field.Text
did not work with StandardAnalyzer.

I tried WhitespaceAnalyzer and that did not work.

I have tried implementing a custom analyzer like KeywordAnalyzer, and
using PerFieldAnalyzerWrapper.

I think the custom analyzer I created is not properly doing what a
KeywordAnalyzer would do.

Erik, could you please post what KeywordAnalyzer should look like?

I can't wait until the book you guys are developing comes out.

Thanks very much.

   Morris


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 13, 2004 3:14 AM
To: Lucene Users List
Subject: Re: Zero hits for queries ending with a number

On Mar 13, 2004, at 6:02 AM, Morus Walter wrote:
 Otis Gospodnetic writes:
 Field.Keyword is suitable for storing data like Url.  Give that a
try.

 Hmm. I don't think keyword fields can be used with query parser,
 which is probably one of the problems here.
 He did try keyword fields.

Look in the archives for KeywordAnalyzer (custom) and 
PerFieldAnalyzerWrapper (built-in) using a combination of these you 
can use keyword fields.  Or, first try just using WhitespaceAnalyzer.

It is almost always the analyzer that is the cause of confusion - folks 
just get lulled into forgetting about its role because Lucene is so 
easy to use... until this type of issue bites you.

It is a wacky combination though - and notorious for causing confusion.

Perhaps someone could create a wiki page for this scenario where we can 
flesh out examples/solutions?

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



possible parse problem

2004-03-24 Thread Surowiec, William
I get distinctly different results (java exception versus request 
completion) for two queries:
 
this AND is
this OR is
 
I realize these are dumb queries, but they illustrate the problem. 
The first gets:
 
error: java.lang.ArrayIndexOutOfBoundsException: -1 at 
java.util.Vector.elementAt(Vector.java:434) at 
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java:181) at

org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:493) at 
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:525) at 
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:464) at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108) at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87) at ((MY
CODE))
 
the second finds no results.

Used the latest stable release downloaded today, 1,3 final.
 
Please accept this as an observation on a surprise, not a complaint.
 
Thanks
 
Bill


This communication is intended solely for the addressee and is
confidential and not for third party unauthorized distribution.



Re: Zero hits for queries ending with a number

2004-03-24 Thread Erik Hatcher
On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote:
I think the custom analyzer I created is not properly doing what a
KeywordAnalyzer would do.
Erik, could you please post what KeywordAnalyzer should look like?
It should simply tokenize the entire input as a single token.  Incze 
Lajos posted a NonTokenizingTokenizer early today, in fact, that does 
the trick.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Zero hits for queries ending with a number

2004-03-24 Thread Morris Mizrahi
Thanks Erik and Incze.
Sorry for this lengthy post.

Here is the class:
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.StandardFilter;

import java.io.Reader;

import java.util.Hashtable;

public class KeywordAnalyzer extends Analyzer {
public static final String[] STOP_WORDS =
StopAnalyzer.ENGLISH_STOP_WORDS;
private Hashtable stopTable;

public KeywordAnalyzer() {
this(STOP_WORDS);
}

public KeywordAnalyzer(String[] stopWords) {
stopTable = StopFilter.makeStopTable(stopWords);
}

public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new NotTokenizingTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopTable);

return result;
}
}


I have retried everything with the new KeywordAnalyzer class,
PerFieldAnalyzerWrapper, and with Field.Keyword. I don't get results for
any searches, it doesn't even matter whether there is a number at the
end or not.

Using query.toString(url):

Query query = QueryParser.parse(terms, contents, analyzer);   
logger.info(search method: query.toString for url=  +
query.toString(url));

I can see what the analyzer is searching for.

How do I determine what is the value stored in the index by
Field.Keyword?

I've tried:

doc.add(Field.Keyword(url, url)); 
System.out.println(url: doc toString method=  +
doc.toString());

But I don't know if this is the correct value that is compared with what
the analyzer sends in.

Thanks for the help.

Morris




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 24, 2004 4:45 PM
To: Lucene Users List
Subject: Re: Zero hits for queries ending with a number

On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote:
 I think the custom analyzer I created is not properly doing what a
 KeywordAnalyzer would do.

 Erik, could you please post what KeywordAnalyzer should look like?

It should simply tokenize the entire input as a single token.  Incze 
Lajos posted a NonTokenizingTokenizer early today, in fact, that does 
the trick.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How to order search results by Field value?

2004-03-24 Thread Chad Small
Was there any conclusion to message:
 
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6762
 
Regarding Ordering by a Field?  I have a similar need and didn't see the resolusion 
in that thread.  Is it a current patch to the 1.3-final, I could see one?  
 
My other option, I guess, is just to code a comparator on a collection build off of 
the Hits.
 
thanks,
chad.


Re: How to order search results by Field value?

2004-03-24 Thread Joachim Schreiber
Chad,


 Was there any conclusion to message:


http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6762

 Regarding Ordering by a Field?  I have a similar need and didn't see the
resolusion in that thread.  Is it a current patch to the 1.3-final, I could
see one?

You can see the resolution in the latest CVS ;-)

yo



 My other option, I guess, is just to code a comparator on a collection
build off of the Hits.

 thanks,
 chad.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]