RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
Great info Morus,
 
After making the escape the dash change to the QueryParser:
 
Query query = QueryParser.parse(+category:HW\\-NCI_TOPICS AND SPACE,
  description,
  analyzer);
  Hits hits = searcher.search(query);
  System.out.println(query.ToString =  + query.toString(description));
  assertEquals(HW-NCI_TOPICS kept as-is,
   +category:HW\\-NCI_TOPICS +space, query.toString(description)); 
 --note that this passes with the escape put in, so not as-is.
  assertEquals(doc found!, 1, hits.length());
 
I'm still getting this output:
 
 domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
 
query.ToString = +category:HW\-NCI_TOPICS +space
 
junit.framework.AssertionFailedError: doc found! expected:1 but was:0
 
It look like bug, http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 , was fixed today:
 
--- Additional Comments From Otis Gospodnetic mailto:[EMAIL PROTECTED]  
2004-03-24 10:10 ---

Although tft-monitor should not really result in a phrase query tft monitor, I
agree that this is better than converting it to tft AND NOT monitor (tft -monitor).
Moreover, I have seen query syntax where '-' characters are used for phrase
queries instead or in addition to quotes, so one could use either morus-walter
or morus walter.

I applied your change, as it doesn't look like it breaks anything, and I hope
nobody relied on ill behaviour where tft-monitor would result in AND NOT query.
---
But I assume this fix won't come out for some time.  Is there a way I can get this fix 
sooner?  
I'm up against a deadline and would very much like this functionality. 
 
And to go one more step with the KeywordAnalyzer that I wrote, changing this method to 
skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
query.ToString = +category:HW -NCI_TOPICS +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:HW -NCI_TOPICS +space   note space where escape was.
thanks,
chad.

-Original Message- 
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Wed 3/24/2004 1:43 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



Chad Small writes:
 Here is my attempt at a KeywordAnalyzer - although is not working?  Excuse 
the length of the message, but wanted to give actual code.
 
 With this output:
 
 Analzying HW-NCI_TOPICS
  org.apache.lucene.analysis.WhitespaceAnalyzer:
   [HW-NCI_TOPICS]
  org.apache.lucene.analysis.SimpleAnalyzer:
   [hw] [nci] [topics]
  org.apache.lucene.analysis.StopAnalyzer:
   [hw] [nci] [topics]
  org.apache.lucene.analysis.standard.StandardAnalyzer:
   [hw] [nci] [topics]
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS]
 
 query.ToString = category:HW -nci topics +space

 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
 Expected:+category:HW-NCI_TOPICS +space
 Actual  :category:HW -nci topics +space
 

Well query parser does not allow `-' within words currently.
So before your analyzer is called, query parser reads one word HW, a `-'
operator, one word NCI_TOPICS.
The latter is analyzed as nci topics because it's not in field category
anymore, I guess.

I suggested to change this. See
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491

Either you escape the - using category:HW\-NCI_TOPICS in your query
(untested. and I don't know where the escape character will be removed)
or you apply my suggested change.

Another option for using keywords with query parser might be adding a
keyword syntax to the query parser.
Something like category:key(HW-NCI_TOPICS) or category=HW-NCI_TOPICS.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Morus Walter
Hi Chad,

 But I assume this fix won't come out for some time.  Is there a way I can get this 
 fix sooner?  
 I'm up against a deadline and would very much like this functionality. 

Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think
this can be found in the archives.

  
 And to go one more step with the KeywordAnalyzer that I wrote, changing this method 
 to skip the escape:
 protected boolean isTokenChar(char c)
 {
  if (c == '\\')
  {
 return false;
  }
  else
  {
 return true;
  }
   }
 The test then returns with a space:
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS] 
 query.ToString = +category:HW -NCI_TOPICS +space
 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
 Expected:+category:HW\-NCI_TOPICS +space
 Actual  :+category:HW -NCI_TOPICS +space   note space where escape was.

Sure. If \ isn't a token char, it end's the token.
So you will have to look for a different way of implementing the
analyzer. Shouldn't be that difficult since you have only one token.

Maybe it should be the job of the query parser to remove the escape character
(would make more sense to me at least) but that would be another change
of the query parser...

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
thanks.  I was in the process of getting javacc3.2 setup.  I'll have to hunt for 2.x.
 
chad.

-Original Message- 
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Wed 3/24/2004 8:00 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



Hi Chad,

 But I assume this fix won't come out for some time.  Is there a way I can 
get this fix sooner? 
 I'm up against a deadline and would very much like this functionality.

Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think
this can be found in the archives.

 
 And to go one more step with the KeywordAnalyzer that I wrote, changing this 
method to skip the escape:
 protected boolean isTokenChar(char c)
 {
  if (c == '\\')
  {
 return false;
  }
  else
  {
 return true;
  }
   }
 The test then returns with a space:
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS]
 query.ToString = +category:HW -NCI_TOPICS +space
 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
 Expected:+category:HW\-NCI_TOPICS +space
 Actual  :+category:HW -NCI_TOPICS +space   note space where escape 
was.

Sure. If \ isn't a token char, it end's the token.
So you will have to look for a different way of implementing the
analyzer. Shouldn't be that difficult since you have only one token.

Maybe it should be the job of the query parser to remove the escape character
(would make more sense to me at least) but that would be another change
of the query parser...

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
For others reference - here is the old version url:
 
https://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=212

-Original Message- 
From: Chad Small 
Sent: Wed 3/24/2004 8:07 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



thanks.  I was in the process of getting javacc3.2 setup.  I'll have to hunt 
for 2.x.

chad.

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED]
Sent: Wed 3/24/2004 8:00 AM
To: Lucene Users List
Cc:
Subject: RE: Query syntax on Keyword field question
   
   

Hi Chad,
   
 But I assume this fix won't come out for some time.  Is there a way 
I can get this fix sooner?
 I'm up against a deadline and would very much like this 
functionality.
   
Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I 
think
this can be found in the archives.
   

 And to go one more step with the KeywordAnalyzer that I wrote, 
changing this method to skip the escape:
 protected boolean isTokenChar(char c)
 {
  if (c == '\\')
  {
 return false;
  }
  else
  {
 return true;
  }
   }
 The test then returns with a space:
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS]
 query.ToString = +category:HW -NCI_TOPICS +space
 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
 Expected:+category:HW\-NCI_TOPICS +space
 Actual  :+category:HW -NCI_TOPICS +space   note space where 
escape was.
   
Sure. If \ isn't a token char, it end's the token.
So you will have to look for a different way of implementing the
analyzer. Shouldn't be that difficult since you have only one token.
   
Maybe it should be the job of the query parser to remove the escape 
character
(would make more sense to me at least) but that would be another change
of the query parser...
   
Morus
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Otis Gospodnetic
JavaCC 3.2 works for me.

Otis

--- Chad Small [EMAIL PROTECTED] wrote:
 thanks.  I was in the process of getting javacc3.2 setup.  I'll have
 to hunt for 2.x.
  
 chad.
 
   -Original Message- 
   From: Morus Walter [mailto:[EMAIL PROTECTED] 
   Sent: Wed 3/24/2004 8:00 AM 
   To: Lucene Users List 
   Cc: 
   Subject: RE: Query syntax on Keyword field question
   
   
 
   Hi Chad,
   
But I assume this fix won't come out for some time.  Is there a
 way I can get this fix sooner? 
I'm up against a deadline and would very much like this
 functionality.
   
   Just get lucenes sources, change the line and recompile.
   The difficult part is to get a copy of JavaCC 2 (3 won't do), but I
 think
   this can be found in the archives.
   

And to go one more step with the KeywordAnalyzer that I wrote,
 changing this method to skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS]
query.ToString = +category:HW -NCI_TOPICS +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:HW -NCI_TOPICS +space   note space where
 escape was.
   
   Sure. If \ isn't a token char, it end's the token.
   So you will have to look for a different way of implementing the
   analyzer. Shouldn't be that difficult since you have only one token.
   
   Maybe it should be the job of the query parser to remove the escape
 character
   (would make more sense to me at least) but that would be another
 change
   of the query parser...
   
   Morus
   
 
 -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
   
   
 
 
-
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
I'm getting this with 3.2:
 
javacc-check:
BUILD FAILED
file:D:/applications/lucene-1.3-final/build.xml:97:
  ##
  JavaCC not found.
  JavaCC Home: /applications/javacc-3.2/bin
  JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
  Please download and install JavaCC from:
  http://javacc.dev.java.net
  Then, create a build.properties file either in your home
  directory, or within the Lucene directory and set the javacc.home
  property to the path where JavaCC is installed. For example,
  if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
  javacc.home property to:
  javacc.home=/usr/local/java/javacc-3.2
  If you get an error like the one below, then you have not installed
  things correctly. Please check all your paths and try again.
  java.lang.NoClassDefFoundError: org.javacc.parser.Main
  ##
 
even though I put a build.properties file in my root lucene directory with this in it:
javacc.home=/applications/javacc-3.2/bin
 
hmm?

-Original Message- 
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Wed 3/24/2004 8:29 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



JavaCC 3.2 works for me.

Otis

--- Chad Small [EMAIL PROTECTED] wrote:
 thanks.  I was in the process of getting javacc3.2 setup.  I'll have
 to hunt for 2.x.
 
 chad.

   -Original Message-
   From: Morus Walter [mailto:[EMAIL PROTECTED]
   Sent: Wed 3/24/2004 8:00 AM
   To: Lucene Users List
   Cc:
   Subject: RE: Query syntax on Keyword field question
  
  

   Hi Chad,
  
But I assume this fix won't come out for some time.  Is there a
 way I can get this fix sooner?
I'm up against a deadline and would very much like this
 functionality.
  
   Just get lucenes sources, change the line and recompile.
   The difficult part is to get a copy of JavaCC 2 (3 won't do), but I
 think
   this can be found in the archives.
  
   
And to go one more step with the KeywordAnalyzer that I wrote,
 changing this method to skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS]
query.ToString = +category:HW -NCI_TOPICS +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:HW -NCI_TOPICS +space   note space where
 escape was.
  
   Sure. If \ isn't a token char, it end's the token.
   So you will have to look for a different way of implementing the
   analyzer. Shouldn't be that difficult since you have only one token.
  
   Maybe it should be the job of the query parser to remove the escape
 character
   (would make more sense to me at least) but that would be another
 change
   of the query parser...
  
   Morus
  

 -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  

 
-
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Morus Walter
Chad Small writes:
 I'm getting this with 3.2:
  
 javacc-check:
 BUILD FAILED
 file:D:/applications/lucene-1.3-final/build.xml:97:
   ##
   JavaCC not found.
   JavaCC Home: /applications/javacc-3.2/bin
   JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
   Please download and install JavaCC from:
   http://javacc.dev.java.net
   Then, create a build.properties file either in your home
   directory, or within the Lucene directory and set the javacc.home
   property to the path where JavaCC is installed. For example,
   if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
   javacc.home property to:
   javacc.home=/usr/local/java/javacc-3.2
   If you get an error like the one below, then you have not installed
   things correctly. Please check all your paths and try again.
   java.lang.NoClassDefFoundError: org.javacc.parser.Main
   ##
  
 even though I put a build.properties file in my root lucene directory with this in 
 it:
 javacc.home=/applications/javacc-3.2/bin
  
I never tried javacc 3.2 but I thought there were issues with query parser
and/or standard analyzer.
Seems I'm wrong or outdated.

In your case the problem seems to be installation of javacc.

I guess the /bin directory should not be part of javacc.home.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
Ahh, without the bin on the javacc.home - 3.2 seems to work for me to.

-Original Message- 
From: Chad Small 
Sent: Wed 3/24/2004 8:34 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



I'm getting this with 3.2:

javacc-check:
BUILD FAILED
file:D:/applications/lucene-1.3-final/build.xml:97:
  ##
  JavaCC not found.
  JavaCC Home: /applications/javacc-3.2/bin
  JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
  Please download and install JavaCC from:
  http://javacc.dev.java.net
  Then, create a build.properties file either in your home
  directory, or within the Lucene directory and set the javacc.home
  property to the path where JavaCC is installed. For example,
  if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
  javacc.home property to:
  javacc.home=/usr/local/java/javacc-3.2
  If you get an error like the one below, then you have not installed
  things correctly. Please check all your paths and try again.
  java.lang.NoClassDefFoundError: org.javacc.parser.Main
  ##

even though I put a build.properties file in my root lucene directory with 
this in it:
javacc.home=/applications/javacc-3.2/bin

hmm?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wed 3/24/2004 8:29 AM
To: Lucene Users List
Cc:
Subject: RE: Query syntax on Keyword field question
   
   

JavaCC 3.2 works for me.
   
Otis
   
--- Chad Small [EMAIL PROTECTED] wrote:
 thanks.  I was in the process of getting javacc3.2 setup.  I'll have
 to hunt for 2.x.

 chad.

   -Original Message-
   From: Morus Walter [mailto:[EMAIL PROTECTED]
   Sent: Wed 3/24/2004 8:00 AM
   To: Lucene Users List
   Cc:
   Subject: RE: Query syntax on Keyword field question
 
 

   Hi Chad,
 
But I assume this fix won't come out for some time.  Is 
there a
 way I can get this fix sooner?
I'm up against a deadline and would very much like this
 functionality.
 
   Just get lucenes sources, change the line and recompile.
   The difficult part is to get a copy of JavaCC 2 (3 won't do), 
but I
 think
   this can be found in the archives.
 
   
And to go one more step with the KeywordAnalyzer that I 
wrote,
 changing this method to skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS]
query.ToString = +category:HW -NCI_TOPICS +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:HW -NCI_TOPICS +space   note space 
where
 escape was.
 
   Sure. If \ isn't a token char, it end's the token.
   So you will have to look for a different way of implementing 
the
   analyzer. Shouldn't be that difficult since you have only one 
token.
 
   Maybe it should be the job of the query parser to remove the 
escape
 character
   (would make more sense to me at least) but that would be 
another
 change

Re: Query syntax on Keyword field question

2004-03-24 Thread Incze Lajos
On Tue, Mar 23, 2004 at 08:48:11PM -0600, Chad Small wrote:
 Thanks-you Erik and Incze.  I now understand the issue
 and I'm trying to create a KeywordAnalyzer as suggested
 from you book excerpt, Erik:
  
 http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727
  
 However, not being all that familiar with the Analyzer framework,
 I'm not sure how to implement the KeywordAnalyzer even though
 it might be trivial :)  Any hints, code, or messages to look at?
  

Actually, what I've written was not an analyzer, but a NotTokenizingTokenizer,
as I have a very specia analyzer (different needs for different
field catgories) and this is used in that (the code is far from the
phase of any kind of optimization, but you can see the logic):

---
package hu.emnl.lucene.analyzer;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.Tokenizer;

public class NotTokenizingTokenizer extends Tokenizer {

public NotTokenizingTokenizer() {
super();
}

public NotTokenizingTokenizer(Reader input) {
super(input);
}

public Token next() throws IOException {
Token t = null;
int c = input.read();
if (c = 0) {
StringBuffer sb = new StringBuffer();  
do {
sb.append((char) c);
c = input.read();
} while (c = 0);
t = new Token(new String(sb), 0, sb.length());
}
return t;
}
}
---

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-23 Thread Chad Small
I have since learned that using the TermQuery instead of the MultiFieldQueryParser 
works for the keyword field in question below (HW-NCI_TOPICS).
 
apiQuery = new BooleanQuery();
apiQuery.add(new TermQuery(new Term(category, HW-NCI_TOPICS)), true, false);
 
This finds a match.
 
I found a message that talked about having to use the the Query API when searching 
Keyword fields in the index.  Is this true?
 
Is there not a way to get the MultiFieldQueryParser to find a match on this keyword?
 
thanks,
chad.

-Original Message- 
From: Chad Small 
Sent: Tue 3/23/2004 10:57 AM 
To: [EMAIL PROTECTED] 
Cc: 
Subject: Query syntax on Keyword field question



Hello,

How can I format a query to get a hit?

I'm using the StandardAnalyzer() at both index and search time.

If I'm indexing a field like this:

luceneDocument.add(Field.Keyword(category,HW-NCI_TOPICS));

I've tried the following with no success:

//  String searchArgs = HW\\-NCI_TOPICS;
//  String searchArgs = HW\\-NCI_TOPICS.toLowerCase();
//  String searchArgs = +HW+NCI+TOPICS;
  //this works with .Text field
//  String searchArgs = +hw+nci+topics;
//  String searchArgs = hw nci topics;

thanks,
chad.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Query syntax on Keyword field question

2004-03-23 Thread Erik Hatcher
QueryParser and Field.Keyword fields are a strange mix.  For some 
background, check the archives as this has been covered pretty 
extensively.

A quick answer is yes you can use MFQP and QP with keyword fields, 
however you need to be careful which analyzer you use.  
PerFieldAnalyzerWrapper is a good solution - you'll just need to use an 
analyzer for your keyword field which simply tokenizes the whole string 
as one chunk.  Perhaps such an analyzer should be made part of the 
core?

	Erik

On Mar 23, 2004, at 12:58 PM, Chad Small wrote:

I have since learned that using the TermQuery instead of the 
MultiFieldQueryParser works for the keyword field in question below 
(HW-NCI_TOPICS).

apiQuery = new BooleanQuery();
apiQuery.add(new TermQuery(new Term(category, HW-NCI_TOPICS)), 
true, false);

This finds a match.

I found a message that talked about having to use the the Query API 
when searching Keyword fields in the index.  Is this true?

Is there not a way to get the MultiFieldQueryParser to find a match on 
this keyword?

thanks,
chad.
-Original Message-
From: Chad Small
Sent: Tue 3/23/2004 10:57 AM
To: [EMAIL PROTECTED]
Cc:
Subject: Query syntax on Keyword field question


Hello,

How can I format a query to get a hit?

I'm using the StandardAnalyzer() at both index and search time.

If I'm indexing a field like this:

luceneDocument.add(Field.Keyword(category,HW-NCI_TOPICS));

I've tried the following with no success:

//  String searchArgs = HW\\-NCI_TOPICS;
//  String searchArgs = HW\\-NCI_TOPICS.toLowerCase();
//  String searchArgs = +HW+NCI+TOPICS;
  //this works with .Text field
//  String searchArgs = +hw+nci+topics;
//  String searchArgs = hw nci topics;

thanks,
chad.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Query syntax on Keyword field question

2004-03-23 Thread Chad Small
Thanks-you Erik and Incze.  I now understand the issue and I'm trying to create a 
KeywordAnalyzer as suggested from you book excerpt, Erik:
 
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727
 
However, not being all that familiar with the Analyzer framework, I'm not sure how to 
implement the KeywordAnalyzer even though it might be trivial :)  Any hints, code, 
or messages to look at?
 
from message link above
Ok, here is the section from Lucene in Action.  I'll leave the 
development of KeywordAnalyzer as an exercise for the reader (although 
its implementation is trivial, one of the simplest analyzers possible - 
only emit one token of the entire contents).  I hope this helps.

Erik


thanks again,
chad.

-Original Message- 
From: Incze Lajos [mailto:[EMAIL PROTECTED] 
Sent: Tue 3/23/2004 8:08 PM 
To: Lucene Users List 
Cc: 
Subject: Re: Query syntax on Keyword field question



On Tue, Mar 23, 2004 at 08:10:15PM -0500, Erik Hatcher wrote:
 QueryParser and Field.Keyword fields are a strange mix.  For some
 background, check the archives as this has been covered pretty
 extensively.

 A quick answer is yes you can use MFQP and QP with keyword fields,
 however you need to be careful which analyzer you use. 
 PerFieldAnalyzerWrapper is a good solution - you'll just need to use an
 analyzer for your keyword field which simply tokenizes the whole string
 as one chunk.  Perhaps such an analyzer should be made part of the
 core?

   Erik

I've implemented suche an analyzer but it's only partial solution
if your keyword field contains spaces, as the QP would split
the query, e.g.:

NOTTOKNIZED:(term with spaces*)

would give you no hit even with an not tokenized field
term with spaces and other useful things. The full solution
would be to be able to tell the QP not to split at spaces,
either by 'do not split till apos' syntax, or by the good ol'
backslash: do\ not\ notice\ these\ spaces.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-23 Thread Chad Small
Here is my attempt at a KeywordAnalyzer - although is not working?  Excuse the length 
of the message, but wanted to give actual code.
 
package domain.lucenesearch;
 
import java.io.*;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.CharTokenizer;
import org.apache.lucene.analysis.TokenStream;
 
public class KeywordAnalyzer extends Analyzer
{
   public TokenStream tokenStream(String s, Reader reader)
   {
  return new KeywordTokenizer(reader);
   }
 
   private class KeywordTokenizer extends CharTokenizer
   {
  public KeywordTokenizer(Reader in)
  {
 super(in);
  }
  /**
   * Collects all characters.
   */
  protected boolean isTokenChar(char c)
  {
 return true;
  }
   }

However, this test: fails
 
public class KeywordAnalyzerTest extends TestCase
{
   RAMDirectory directory;
   private IndexSearcher searcher;
 
   public void setUp() throws Exception
   {
  directory = new RAMDirectory();
  IndexWriter writer = new IndexWriter(directory,
   new StandardAnalyzer(),
   true);
  Document doc = new Document();
  doc.add(Field.Keyword(category, HW-NCI_TOPICS));
  doc.add(Field.Text(description, Illidium Space Modulator));
  writer.addDocument(doc);
  writer.close();
  searcher = new IndexSearcher(directory);
   }
 
public void testPerFieldAnalyzer() throws Exception
   {
  analyze(HW-NCI_TOPICS);
 
  PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new 
StandardAnalyzer());
  analyzer.addAnalyzer(category, new KeywordAnalyzer());   //|#1
  Query query = QueryParser.parse(category:HW-NCI_TOPICS AND SPACE,
  description,
  analyzer);
  Hits hits = searcher.search(query);
  System.out.println(query.ToString =  + query.toString(description));
  assertEquals(HW-NCI_TOPICS kept as-is,
   category:HW-NCI_TOPICS +space, query.toString(description));
  assertEquals(doc found!, 1, hits.length());
   }
 
   private void analyze(String text) throws Exception
   {
  Analyzer[] analyzers = new Analyzer[]{
 new WhitespaceAnalyzer(),
 new SimpleAnalyzer(),
 new StopAnalyzer(),
 new StandardAnalyzer(),
 new KeywordAnalyzer(),
 //new SnowballAnalyzer(English, StopAnalyzer.ENGLISH_STOP_WORDS)
  };
  System.out.println(Analzying \ + text + \);
  for (int i = 0; i  analyzers.length; i++)
  {
 Analyzer analyzer = analyzers[i];
 System.out.println(\t + analyzer.getClass().getName() + :);
 System.out.print(\t\t);
 TokenStream stream = analyzer.tokenStream(category, new StringReader(text));
 while (true)
 {
Token token = stream.next();
if (token == null) break;
System.out.print([ + token.termText() + ] );
 }
 System.out.println(\n);
  }
   }
}
 
With this output:
 
Analzying HW-NCI_TOPICS
 org.apache.lucene.analysis.WhitespaceAnalyzer:
  [HW-NCI_TOPICS] 
 org.apache.lucene.analysis.SimpleAnalyzer:
  [hw] [nci] [topics] 
 org.apache.lucene.analysis.StopAnalyzer:
  [hw] [nci] [topics] 
 org.apache.lucene.analysis.standard.StandardAnalyzer:
  [hw] [nci] [topics] 
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
 
query.ToString = category:HW -nci topics +space

junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
Expected:+category:HW-NCI_TOPICS +space
Actual  :category:HW -nci topics +space
 
See anything?
thanks,
chad.

-Original Message- 
From: Chad Small 
Sent: Tue 3/23/2004 8:48 PM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



Thanks-you Erik and Incze.  I now understand the issue and I'm trying to 
create a KeywordAnalyzer as suggested from you book excerpt, Erik:

http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727

However, not being all that familiar with the Analyzer framework, I'm not sure 
how to implement the KeywordAnalyzer even though it might be trivial :)  Any 
hints, code, or messages to look at?

from message link above
Ok, here is the section from Lucene in Action.  I'll leave the
development of KeywordAnalyzer as an exercise for the reader (although
its implementation is trivial, one of the simplest analyzers possible -
only emit one token of the entire contents).  I hope this helps.

Erik


thanks again,
chad.

-Original Message-
From: Incze Lajos [mailto:[EMAIL PROTECTED]
Sent: Tue 3/23/2004 8:08 PM
To: Lucene Users List
Cc

RE: Query syntax on Keyword field question

2004-03-23 Thread Morus Walter
Chad Small writes:
 Here is my attempt at a KeywordAnalyzer - although is not working?  Excuse the 
 length of the message, but wanted to give actual code.
  
 With this output:
  
 Analzying HW-NCI_TOPICS
  org.apache.lucene.analysis.WhitespaceAnalyzer:
   [HW-NCI_TOPICS] 
  org.apache.lucene.analysis.SimpleAnalyzer:
   [hw] [nci] [topics] 
  org.apache.lucene.analysis.StopAnalyzer:
   [hw] [nci] [topics] 
  org.apache.lucene.analysis.standard.StandardAnalyzer:
   [hw] [nci] [topics] 
  healthecare.domain.lucenesearch.KeywordAnalyzer:
   [HW-NCI_TOPICS] 
  
 query.ToString = category:HW -nci topics +space
 
 junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
 Expected:+category:HW-NCI_TOPICS +space
 Actual  :category:HW -nci topics +space
  

Well query parser does not allow `-' within words currently.
So before your analyzer is called, query parser reads one word HW, a `-'
operator, one word NCI_TOPICS.
The latter is analyzed as nci topics because it's not in field category
anymore, I guess.

I suggested to change this. See
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491

Either you escape the - using category:HW\-NCI_TOPICS in your query
(untested. and I don't know where the escape character will be removed)
or you apply my suggested change.

Another option for using keywords with query parser might be adding a
keyword syntax to the query parser.
Something like category:key(HW-NCI_TOPICS) or category=HW-NCI_TOPICS.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]