RE: Query syntax on Keyword field question
Great info Morus, After making the escape the dash change to the QueryParser: Query query = QueryParser.parse(+category:HW\\-NCI_TOPICS AND SPACE, description, analyzer); Hits hits = searcher.search(query); System.out.println(query.ToString = + query.toString(description)); assertEquals(HW-NCI_TOPICS kept as-is, +category:HW\\-NCI_TOPICS +space, query.toString(description)); --note that this passes with the escape put in, so not as-is. assertEquals(doc found!, 1, hits.length()); I'm still getting this output: domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW\-NCI_TOPICS +space junit.framework.AssertionFailedError: doc found! expected:1 but was:0 It look like bug, http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 , was fixed today: --- Additional Comments From Otis Gospodnetic mailto:[EMAIL PROTECTED] 2004-03-24 10:10 --- Although tft-monitor should not really result in a phrase query tft monitor, I agree that this is better than converting it to tft AND NOT monitor (tft -monitor). Moreover, I have seen query syntax where '-' characters are used for phrase queries instead or in addition to quotes, so one could use either morus-walter or morus walter. I applied your change, as it doesn't look like it breaks anything, and I hope nobody relied on ill behaviour where tft-monitor would result in AND NOT query. --- But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. thanks, chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 1:43 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Chad Small writes: Here is my attempt at a KeywordAnalyzer - although is not working? Excuse the length of the message, but wanted to give actual code. With this output: Analzying HW-NCI_TOPICS org.apache.lucene.analysis.WhitespaceAnalyzer: [HW-NCI_TOPICS] org.apache.lucene.analysis.SimpleAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.StopAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.standard.StandardAnalyzer: [hw] [nci] [topics] healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = category:HW -nci topics +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW-NCI_TOPICS +space Actual :category:HW -nci topics +space Well query parser does not allow `-' within words currently. So before your analyzer is called, query parser reads one word HW, a `-' operator, one word NCI_TOPICS. The latter is analyzed as nci topics because it's not in field category anymore, I guess. I suggested to change this. See http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 Either you escape the - using category:HW\-NCI_TOPICS in your query (untested. and I don't know where the escape character will be removed) or you apply my suggested change. Another option for using keywords with query parser might be adding a keyword syntax to the query parser. Something like category:key(HW-NCI_TOPICS) or category=HW-NCI_TOPICS. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
For others reference - here is the old version url: https://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=212 -Original Message- From: Chad Small Sent: Wed 3/24/2004 8:07 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
JavaCC 3.2 works for me. Otis --- Chad Small [EMAIL PROTECTED] wrote: thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
I'm getting this with 3.2: javacc-check: BUILD FAILED file:D:/applications/lucene-1.3-final/build.xml:97: ## JavaCC not found. JavaCC Home: /applications/javacc-3.2/bin JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar Please download and install JavaCC from: http://javacc.dev.java.net Then, create a build.properties file either in your home directory, or within the Lucene directory and set the javacc.home property to the path where JavaCC is installed. For example, if you installed JavaCC in /usr/local/java/javacc-3.2, then set the javacc.home property to: javacc.home=/usr/local/java/javacc-3.2 If you get an error like the one below, then you have not installed things correctly. Please check all your paths and try again. java.lang.NoClassDefFoundError: org.javacc.parser.Main ## even though I put a build.properties file in my root lucene directory with this in it: javacc.home=/applications/javacc-3.2/bin hmm? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:29 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question JavaCC 3.2 works for me. Otis --- Chad Small [EMAIL PROTECTED] wrote: thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Chad Small writes: I'm getting this with 3.2: javacc-check: BUILD FAILED file:D:/applications/lucene-1.3-final/build.xml:97: ## JavaCC not found. JavaCC Home: /applications/javacc-3.2/bin JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar Please download and install JavaCC from: http://javacc.dev.java.net Then, create a build.properties file either in your home directory, or within the Lucene directory and set the javacc.home property to the path where JavaCC is installed. For example, if you installed JavaCC in /usr/local/java/javacc-3.2, then set the javacc.home property to: javacc.home=/usr/local/java/javacc-3.2 If you get an error like the one below, then you have not installed things correctly. Please check all your paths and try again. java.lang.NoClassDefFoundError: org.javacc.parser.Main ## even though I put a build.properties file in my root lucene directory with this in it: javacc.home=/applications/javacc-3.2/bin I never tried javacc 3.2 but I thought there were issues with query parser and/or standard analyzer. Seems I'm wrong or outdated. In your case the problem seems to be installation of javacc. I guess the /bin directory should not be part of javacc.home. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Ahh, without the bin on the javacc.home - 3.2 seems to work for me to. -Original Message- From: Chad Small Sent: Wed 3/24/2004 8:34 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question I'm getting this with 3.2: javacc-check: BUILD FAILED file:D:/applications/lucene-1.3-final/build.xml:97: ## JavaCC not found. JavaCC Home: /applications/javacc-3.2/bin JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar Please download and install JavaCC from: http://javacc.dev.java.net Then, create a build.properties file either in your home directory, or within the Lucene directory and set the javacc.home property to the path where JavaCC is installed. For example, if you installed JavaCC in /usr/local/java/javacc-3.2, then set the javacc.home property to: javacc.home=/usr/local/java/javacc-3.2 If you get an error like the one below, then you have not installed things correctly. Please check all your paths and try again. java.lang.NoClassDefFoundError: org.javacc.parser.Main ## even though I put a build.properties file in my root lucene directory with this in it: javacc.home=/applications/javacc-3.2/bin hmm? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:29 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question JavaCC 3.2 works for me. Otis --- Chad Small [EMAIL PROTECTED] wrote: thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change
Re: Query syntax on Keyword field question
On Tue, Mar 23, 2004 at 08:48:11PM -0600, Chad Small wrote: Thanks-you Erik and Incze. I now understand the issue and I'm trying to create a KeywordAnalyzer as suggested from you book excerpt, Erik: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727 However, not being all that familiar with the Analyzer framework, I'm not sure how to implement the KeywordAnalyzer even though it might be trivial :) Any hints, code, or messages to look at? Actually, what I've written was not an analyzer, but a NotTokenizingTokenizer, as I have a very specia analyzer (different needs for different field catgories) and this is used in that (the code is far from the phase of any kind of optimization, but you can see the logic): --- package hu.emnl.lucene.analyzer; import java.io.IOException; import java.io.Reader; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; public class NotTokenizingTokenizer extends Tokenizer { public NotTokenizingTokenizer() { super(); } public NotTokenizingTokenizer(Reader input) { super(input); } public Token next() throws IOException { Token t = null; int c = input.read(); if (c = 0) { StringBuffer sb = new StringBuffer(); do { sb.append((char) c); c = input.read(); } while (c = 0); t = new Token(new String(sb), 0, sb.length()); } return t; } } --- incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
I have since learned that using the TermQuery instead of the MultiFieldQueryParser works for the keyword field in question below (HW-NCI_TOPICS). apiQuery = new BooleanQuery(); apiQuery.add(new TermQuery(new Term(category, HW-NCI_TOPICS)), true, false); This finds a match. I found a message that talked about having to use the the Query API when searching Keyword fields in the index. Is this true? Is there not a way to get the MultiFieldQueryParser to find a match on this keyword? thanks, chad. -Original Message- From: Chad Small Sent: Tue 3/23/2004 10:57 AM To: [EMAIL PROTECTED] Cc: Subject: Query syntax on Keyword field question Hello, How can I format a query to get a hit? I'm using the StandardAnalyzer() at both index and search time. If I'm indexing a field like this: luceneDocument.add(Field.Keyword(category,HW-NCI_TOPICS)); I've tried the following with no success: // String searchArgs = HW\\-NCI_TOPICS; // String searchArgs = HW\\-NCI_TOPICS.toLowerCase(); // String searchArgs = +HW+NCI+TOPICS; //this works with .Text field // String searchArgs = +hw+nci+topics; // String searchArgs = hw nci topics; thanks, chad. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query syntax on Keyword field question
QueryParser and Field.Keyword fields are a strange mix. For some background, check the archives as this has been covered pretty extensively. A quick answer is yes you can use MFQP and QP with keyword fields, however you need to be careful which analyzer you use. PerFieldAnalyzerWrapper is a good solution - you'll just need to use an analyzer for your keyword field which simply tokenizes the whole string as one chunk. Perhaps such an analyzer should be made part of the core? Erik On Mar 23, 2004, at 12:58 PM, Chad Small wrote: I have since learned that using the TermQuery instead of the MultiFieldQueryParser works for the keyword field in question below (HW-NCI_TOPICS). apiQuery = new BooleanQuery(); apiQuery.add(new TermQuery(new Term(category, HW-NCI_TOPICS)), true, false); This finds a match. I found a message that talked about having to use the the Query API when searching Keyword fields in the index. Is this true? Is there not a way to get the MultiFieldQueryParser to find a match on this keyword? thanks, chad. -Original Message- From: Chad Small Sent: Tue 3/23/2004 10:57 AM To: [EMAIL PROTECTED] Cc: Subject: Query syntax on Keyword field question Hello, How can I format a query to get a hit? I'm using the StandardAnalyzer() at both index and search time. If I'm indexing a field like this: luceneDocument.add(Field.Keyword(category,HW-NCI_TOPICS)); I've tried the following with no success: // String searchArgs = HW\\-NCI_TOPICS; // String searchArgs = HW\\-NCI_TOPICS.toLowerCase(); // String searchArgs = +HW+NCI+TOPICS; //this works with .Text field // String searchArgs = +hw+nci+topics; // String searchArgs = hw nci topics; thanks, chad. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Thanks-you Erik and Incze. I now understand the issue and I'm trying to create a KeywordAnalyzer as suggested from you book excerpt, Erik: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727 However, not being all that familiar with the Analyzer framework, I'm not sure how to implement the KeywordAnalyzer even though it might be trivial :) Any hints, code, or messages to look at? from message link above Ok, here is the section from Lucene in Action. I'll leave the development of KeywordAnalyzer as an exercise for the reader (although its implementation is trivial, one of the simplest analyzers possible - only emit one token of the entire contents). I hope this helps. Erik thanks again, chad. -Original Message- From: Incze Lajos [mailto:[EMAIL PROTECTED] Sent: Tue 3/23/2004 8:08 PM To: Lucene Users List Cc: Subject: Re: Query syntax on Keyword field question On Tue, Mar 23, 2004 at 08:10:15PM -0500, Erik Hatcher wrote: QueryParser and Field.Keyword fields are a strange mix. For some background, check the archives as this has been covered pretty extensively. A quick answer is yes you can use MFQP and QP with keyword fields, however you need to be careful which analyzer you use. PerFieldAnalyzerWrapper is a good solution - you'll just need to use an analyzer for your keyword field which simply tokenizes the whole string as one chunk. Perhaps such an analyzer should be made part of the core? Erik I've implemented suche an analyzer but it's only partial solution if your keyword field contains spaces, as the QP would split the query, e.g.: NOTTOKNIZED:(term with spaces*) would give you no hit even with an not tokenized field term with spaces and other useful things. The full solution would be to be able to tell the QP not to split at spaces, either by 'do not split till apos' syntax, or by the good ol' backslash: do\ not\ notice\ these\ spaces. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Here is my attempt at a KeywordAnalyzer - although is not working? Excuse the length of the message, but wanted to give actual code. package domain.lucenesearch; import java.io.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.CharTokenizer; import org.apache.lucene.analysis.TokenStream; public class KeywordAnalyzer extends Analyzer { public TokenStream tokenStream(String s, Reader reader) { return new KeywordTokenizer(reader); } private class KeywordTokenizer extends CharTokenizer { public KeywordTokenizer(Reader in) { super(in); } /** * Collects all characters. */ protected boolean isTokenChar(char c) { return true; } } However, this test: fails public class KeywordAnalyzerTest extends TestCase { RAMDirectory directory; private IndexSearcher searcher; public void setUp() throws Exception { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), true); Document doc = new Document(); doc.add(Field.Keyword(category, HW-NCI_TOPICS)); doc.add(Field.Text(description, Illidium Space Modulator)); writer.addDocument(doc); writer.close(); searcher = new IndexSearcher(directory); } public void testPerFieldAnalyzer() throws Exception { analyze(HW-NCI_TOPICS); PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); analyzer.addAnalyzer(category, new KeywordAnalyzer()); //|#1 Query query = QueryParser.parse(category:HW-NCI_TOPICS AND SPACE, description, analyzer); Hits hits = searcher.search(query); System.out.println(query.ToString = + query.toString(description)); assertEquals(HW-NCI_TOPICS kept as-is, category:HW-NCI_TOPICS +space, query.toString(description)); assertEquals(doc found!, 1, hits.length()); } private void analyze(String text) throws Exception { Analyzer[] analyzers = new Analyzer[]{ new WhitespaceAnalyzer(), new SimpleAnalyzer(), new StopAnalyzer(), new StandardAnalyzer(), new KeywordAnalyzer(), //new SnowballAnalyzer(English, StopAnalyzer.ENGLISH_STOP_WORDS) }; System.out.println(Analzying \ + text + \); for (int i = 0; i analyzers.length; i++) { Analyzer analyzer = analyzers[i]; System.out.println(\t + analyzer.getClass().getName() + :); System.out.print(\t\t); TokenStream stream = analyzer.tokenStream(category, new StringReader(text)); while (true) { Token token = stream.next(); if (token == null) break; System.out.print([ + token.termText() + ] ); } System.out.println(\n); } } } With this output: Analzying HW-NCI_TOPICS org.apache.lucene.analysis.WhitespaceAnalyzer: [HW-NCI_TOPICS] org.apache.lucene.analysis.SimpleAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.StopAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.standard.StandardAnalyzer: [hw] [nci] [topics] healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = category:HW -nci topics +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW-NCI_TOPICS +space Actual :category:HW -nci topics +space See anything? thanks, chad. -Original Message- From: Chad Small Sent: Tue 3/23/2004 8:48 PM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Thanks-you Erik and Incze. I now understand the issue and I'm trying to create a KeywordAnalyzer as suggested from you book excerpt, Erik: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727 However, not being all that familiar with the Analyzer framework, I'm not sure how to implement the KeywordAnalyzer even though it might be trivial :) Any hints, code, or messages to look at? from message link above Ok, here is the section from Lucene in Action. I'll leave the development of KeywordAnalyzer as an exercise for the reader (although its implementation is trivial, one of the simplest analyzers possible - only emit one token of the entire contents). I hope this helps. Erik thanks again, chad. -Original Message- From: Incze Lajos [mailto:[EMAIL PROTECTED] Sent: Tue 3/23/2004 8:08 PM To: Lucene Users List Cc
RE: Query syntax on Keyword field question
Chad Small writes: Here is my attempt at a KeywordAnalyzer - although is not working? Excuse the length of the message, but wanted to give actual code. With this output: Analzying HW-NCI_TOPICS org.apache.lucene.analysis.WhitespaceAnalyzer: [HW-NCI_TOPICS] org.apache.lucene.analysis.SimpleAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.StopAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.standard.StandardAnalyzer: [hw] [nci] [topics] healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = category:HW -nci topics +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW-NCI_TOPICS +space Actual :category:HW -nci topics +space Well query parser does not allow `-' within words currently. So before your analyzer is called, query parser reads one word HW, a `-' operator, one word NCI_TOPICS. The latter is analyzed as nci topics because it's not in field category anymore, I guess. I suggested to change this. See http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 Either you escape the - using category:HW\-NCI_TOPICS in your query (untested. and I don't know where the escape character will be removed) or you apply my suggested change. Another option for using keywords with query parser might be adding a keyword syntax to the query parser. Something like category:key(HW-NCI_TOPICS) or category=HW-NCI_TOPICS. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]