Re: How to order search results by Field value?

2004-03-24 Thread Joachim Schreiber
Chad,


> Was there any conclusion to message:
>
>
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=6762
>
> Regarding "Ordering by a Field"?  I have a similar need and didn't see the
resolusion in that thread.  Is it a current patch to the 1.3-final, I could
see one?

You can see the resolution in the latest CVS ;-)

yo


>
> My other option, I guess, is just to code a comparator on a collection
build off of the Hits.
>
> thanks,
> chad.
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How to order search results by Field value?

2004-03-24 Thread Chad Small
Was there any conclusion to message:
 
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=6762
 
Regarding "Ordering by a Field"?  I have a similar need and didn't see the resolusion 
in that thread.  Is it a current patch to the 1.3-final, I could see one?  
 
My other option, I guess, is just to code a comparator on a collection build off of 
the Hits.
 
thanks,
chad.


RE: Zero hits for queries ending with a number

2004-03-24 Thread Morris Mizrahi
Thanks Erik and Incze.
Sorry for this lengthy post.

Here is the class:
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.StandardFilter;

import java.io.Reader;

import java.util.Hashtable;

public class KeywordAnalyzer extends Analyzer {
public static final String[] STOP_WORDS =
StopAnalyzer.ENGLISH_STOP_WORDS;
private Hashtable stopTable;

public KeywordAnalyzer() {
this(STOP_WORDS);
}

public KeywordAnalyzer(String[] stopWords) {
stopTable = StopFilter.makeStopTable(stopWords);
}

public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new NotTokenizingTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopTable);

return result;
}
}


I have retried everything with the new KeywordAnalyzer class,
PerFieldAnalyzerWrapper, and with Field.Keyword. I don't get results for
any searches, it doesn't even matter whether there is a number at the
end or not.

Using query.toString("url"):

Query query = QueryParser.parse(terms, "contents", analyzer);   
logger.info("search method: query.toString for url= " +
query.toString("url"));

I can see what the analyzer is searching for.

How do I determine what is the value stored in the index by
Field.Keyword?

I've tried:

doc.add(Field.Keyword("url", url)); 
System.out.println("url: doc toString method= " +
doc.toString());

But I don't know if this is the correct value that is compared with what
the analyzer sends in.

Thanks for the help.

Morris




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 24, 2004 4:45 PM
To: Lucene Users List
Subject: Re: Zero hits for queries ending with a number

On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote:
> I think the custom analyzer I created is not properly doing what a
> KeywordAnalyzer would do.
>
> Erik, could you please post what KeywordAnalyzer should look like?

It should simply "tokenize" the entire input as a single token.  Incze 
Lajos posted a NonTokenizingTokenizer early today, in fact, that does 
the trick.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Zero hits for queries ending with a number

2004-03-24 Thread Erik Hatcher
On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote:
I think the custom analyzer I created is not properly doing what a
KeywordAnalyzer would do.
Erik, could you please post what KeywordAnalyzer should look like?
It should simply "tokenize" the entire input as a single token.  Incze 
Lajos posted a NonTokenizingTokenizer early today, in fact, that does 
the trick.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: possible parse problem

2004-03-24 Thread Otis Gospodnetic
Known issue, already files as a bug, and may even have a patch in
Bugzilla.

Otis

--- "Surowiec, William" <[EMAIL PROTECTED]> wrote:
> I get distinctly different results (java exception versus request 
> completion) for two queries:
>  
> this AND is
> this OR is
>  
> I realize these are "dumb" queries, but they illustrate the problem. 
> The first gets:
>  
> error: java.lang.ArrayIndexOutOfBoundsException: -1 at 
> java.util.Vector.elementAt(Vector.java:434) at 
>
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java:181)
> at
> 
> org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:493)
> at 
>
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:525)
> at 
> org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:464)
> at 
> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108)
> at 
> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
> at ((MY
> CODE))
>  
> the second finds no results.
> 
> Used the latest stable release downloaded today, 1,3 final.
>  
> Please accept this as an observation on a surprise, not a complaint.
>  
> Thanks
>  
> Bill
> 
> "This communication is intended solely for the addressee and is
> confidential and not for third party unauthorized distribution."
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



possible parse problem

2004-03-24 Thread Surowiec, William
I get distinctly different results (java exception versus request 
completion) for two queries:
 
this AND is
this OR is
 
I realize these are "dumb" queries, but they illustrate the problem. 
The first gets:
 
error: java.lang.ArrayIndexOutOfBoundsException: -1 at 
java.util.Vector.elementAt(Vector.java:434) at 
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java:181) at

org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:493) at 
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:525) at 
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:464) at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108) at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87) at ((MY
CODE))
 
the second finds no results.

Used the latest stable release downloaded today, 1,3 final.
 
Please accept this as an observation on a surprise, not a complaint.
 
Thanks
 
Bill


"This communication is intended solely for the addressee and is
confidential and not for third party unauthorized distribution."



RE: Zero hits for queries ending with a number

2004-03-24 Thread Morris Mizrahi
Thanks to Otis, Morus, and Erik for their responses to my question.

I see that my question is also related to the posting: "Query syntax on
Keyword field question".

I tried all of your suggestions. 
When using:
a) the tokens generated by the analyzer and
b) the parsed query (using the to_string method).
to debug StandardAnalyzer, I saw that it does properly pass in the
string with the number attached to it. I don't understand why Field.Text
did not work with StandardAnalyzer.

I tried WhitespaceAnalyzer and that did not work.

I have tried implementing a custom analyzer like KeywordAnalyzer, and
using PerFieldAnalyzerWrapper.

I think the custom analyzer I created is not properly doing what a
KeywordAnalyzer would do.

Erik, could you please post what KeywordAnalyzer should look like?

I can't wait until the book you guys are developing comes out.

Thanks very much.

   Morris


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 13, 2004 3:14 AM
To: Lucene Users List
Subject: Re: Zero hits for queries ending with a number

On Mar 13, 2004, at 6:02 AM, Morus Walter wrote:
> Otis Gospodnetic writes:
>> Field.Keyword is suitable for storing data like Url.  Give that a
try.
>>
> Hmm. I don't think keyword fields can be used with query parser,
> which is probably one of the problems here.
> He did try keyword fields.

Look in the archives for KeywordAnalyzer (custom) and 
PerFieldAnalyzerWrapper (built-in) using a combination of these you 
can use keyword fields.  Or, first try just using WhitespaceAnalyzer.

It is almost always the analyzer that is the cause of confusion - folks 
just get lulled into forgetting about its role because Lucene is so 
easy to use... until this type of issue bites you.

It is a wacky combination though - and notorious for causing confusion.

Perhaps someone could create a wiki page for this scenario where we can 
flesh out examples/solutions?

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Changes to QueryParser.jj: Status?

2004-03-24 Thread Otis Gospodnetic
I committed those changes to CVS today.  There is a bug entry in
Bugzilla from Morus Walter, which is now marked as fixed.

Otis

--- Ravi Rao <[EMAIL PROTECTED]> wrote:
> Dear All,
> 
> Some time ago there was a discussion on modifying the definitions of
> tokens in QueryParser so that the character '-' (dash), and others,
> will be treated as part of a word.
> 
> Can someone please tell me the status of that discussion.  Will these
> changes actually be reflected in the code...soon?
> 
> Thanks,
> -- 
> Ravi/
> 
> PS: The title of the thread in the previous discussion was
> 'Problem with search results'
> 
> Ravi(ndra) Rao
> AlterPoint Inc.
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: analyzer for word perfect?

2004-03-24 Thread Otis Gospodnetic
I just finished writing a chapter for Lucene in Action that deals with
that.

PDF: pdfbox.org
MS Word/Excel: jakarta.apache.org/poi
WP: http://www.google.com/search?q=java+word+perfect+parser

Note that what you need are parsers.  The term Analyzer has a special
meaning in Lucene realm.

Otis


--- Charlie Smith <[EMAIL PROTECTED]> wrote:
> Is there an analyzer for WordPerfect files?
> 
> I have a need to be able to index WP files as well as MS files, pdfs,
> etc.
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Cannot access hits

2004-03-24 Thread Otis Gospodnetic
The source of your problem is simple UNIX permission:

java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:688)
at org.apache.lucene.store.FSDirectory$1.obtain(Unknown Source)

Figure out what directory Java's java.io.tmpdir system property points
to, and make sure that directory is writable by the user that runs that
Tomcat server.

Otis



--- Russell S Koonts <[EMAIL PROTECTED]> wrote:
> 
> 
> 
> 
> Greetings.  I have recently had to re-install my web server.  Once
> completed, however, I cannot get the Lucene search to work. It worked
> before the crash and it works on my laptop.  When conducting searches
> now,
> I get the following message:
> 
> org.apache.cocoon.ProcessingException: Cannot access hits:
> java.io.IOException: Permission denied
> 
> for full message see:
> 
> http://archives.mc.duke.edu/search?queryString=Davison
> 
> Can anyone suggest a place to start looking to add the correct
> permissions?
> 
> Thank you,
> 
> Russell
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Changes to QueryParser.jj: Status?

2004-03-24 Thread Ravi Rao
Dear All,

Some time ago there was a discussion on modifying the definitions of
tokens in QueryParser so that the character '-' (dash), and others,
will be treated as part of a word.

Can someone please tell me the status of that discussion.  Will these
changes actually be reflected in the code...soon?

Thanks,
-- 
Ravi/

PS: The title of the thread in the previous discussion was
'Problem with search results'

Ravi(ndra) Rao
AlterPoint Inc.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Cannot access hits

2004-03-24 Thread Russell S Koonts




Greetings.  I have recently had to re-install my web server.  Once
completed, however, I cannot get the Lucene search to work. It worked
before the crash and it works on my laptop.  When conducting searches now,
I get the following message:

org.apache.cocoon.ProcessingException: Cannot access hits:
java.io.IOException: Permission denied

for full message see:

http://archives.mc.duke.edu/search?queryString=Davison

Can anyone suggest a place to start looking to add the correct permissions?

Thank you,

Russell


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching for a phrase that contains quote character

2004-03-24 Thread danrapp
I'd like to search for a phrase that contains the quote character. I've tried 
escaping the quote character, but am receiving a ParseException from the 
QueryParser:

For example to search for the phrase:

 this is a "test"

I'm trying the following

 QueryParser.parse("field:\"This is a \\\"test\\\"\"", "field", new 
StandardAnalyzer());

This results in:

org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 31.  
Encountered:  after : ""
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
...

What is the proper way to accomplish this?

--Dan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query syntax on Keyword field question

2004-03-24 Thread Incze Lajos
On Tue, Mar 23, 2004 at 08:48:11PM -0600, Chad Small wrote:
> Thanks-you Erik and Incze.  I now understand the issue
> and I'm trying to create a "KeywordAnalyzer" as suggested
> from you book excerpt, Erik:
>  
> http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=6727
>  
> However, not being all that familiar with the Analyzer framework,
> I'm not sure how to implement the "KeywordAnalyzer" even though
> it might be "trivial" :)  Any hints, code, or messages to look at?
>  

Actually, what I've written was not an analyzer, but a NotTokenizingTokenizer,
as I have a very specia analyzer (different needs for different
field catgories) and this is used in that (the code is far from the
phase of any kind of optimization, but you can see the logic):

---
package hu.emnl.lucene.analyzer;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.Tokenizer;

public class NotTokenizingTokenizer extends Tokenizer {

public NotTokenizingTokenizer() {
super();
}

public NotTokenizingTokenizer(Reader input) {
super(input);
}

public Token next() throws IOException {
Token t = null;
int c = input.read();
if (c >= 0) {
StringBuffer sb = new StringBuffer();  
do {
sb.append((char) c);
c = input.read();
} while (c >= 0);
t = new Token(new String(sb), 0, sb.length());
}
return t;
}
}
---

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



analyzer for word perfect?

2004-03-24 Thread Charlie Smith
Is there an analyzer for WordPerfect files?

I have a need to be able to index WP files as well as MS files, pdfs, etc.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



multiple indices seacher

2004-03-24 Thread hui
Hi,

The MultiSearcher 1.3 final keeps throwing exception when rewriting query.

java.lang.UnsupportedOperationException
org.apache.lucene.search.Query:combine:139
org.apache.lucene.search.MultiSearcher:rewrite:203

I still use the Query object before the rewriting, so the search seems
working fine.

Does anyone know how to avoid this problem?  Thx. I have to call "rewrite"
in order to avoid the cached searcher's I/O problem.

Regards,
Hui



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: lucene usage without website

2004-03-24 Thread Cocula Remi
Lucene is not dedicated to a special application type. 
Your can integrate it's fonctionnalities in any program that can invoke java APIs.

In particular I don't think that Lucene can be invoked from an applet as the applet 
API does not permit to read and write local files.



-Message d'origine-
De : Pleasant, Tracy [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 24 mars 2004 17:41
À : Lucene Users List
Objet : lucene usage without website



I want to create a knowledgebase but it needs to be something that does
not require a server to run constantly (like with using jsp). I just
needs to run on the Windows platform.  Lucene works well with Windows
using an applet right?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



lucene usage without website

2004-03-24 Thread Pleasant, Tracy

I want to create a knowledgebase but it needs to be something that does
not require a server to run constantly (like with using jsp). I just
needs to run on the Windows platform.  Lucene works well with Windows
using an applet right?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
Ahh, without the bin on the javacc.home - 3.2 seems to work for me to.

-Original Message- 
From: Chad Small 
Sent: Wed 3/24/2004 8:34 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



I'm getting this with 3.2:

javacc-check:
BUILD FAILED
file:D:/applications/lucene-1.3-final/build.xml:97:
  ##
  JavaCC not found.
  JavaCC Home: /applications/javacc-3.2/bin
  JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
  Please download and install JavaCC from:
  
  Then, create a build.properties file either in your home
  directory, or within the Lucene directory and set the javacc.home
  property to the path where JavaCC is installed. For example,
  if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
  javacc.home property to:
  javacc.home=/usr/local/java/javacc-3.2
  If you get an error like the one below, then you have not installed
  things correctly. Please check all your paths and try again.
  java.lang.NoClassDefFoundError: org.javacc.parser.Main
  ##

even though I put a build.properties file in my root lucene directory with 
this in it:
javacc.home=/applications/javacc-3.2/bin

hmm?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wed 3/24/2004 8:29 AM
To: Lucene Users List
Cc:
Subject: RE: Query syntax on Keyword field question
   
   

JavaCC 3.2 works for me.
   
Otis
   
--- Chad Small <[EMAIL PROTECTED]> wrote:
> thanks.  I was in the process of getting javacc3.2 setup.  I'll have
> to hunt for 2.x.
>
> chad.
>
>   -Original Message-
>   From: Morus Walter [mailto:[EMAIL PROTECTED]
>   Sent: Wed 3/24/2004 8:00 AM
>   To: Lucene Users List
>   Cc:
>   Subject: RE: Query syntax on Keyword field question
> 
> 
>
>   Hi Chad,
> 
>   > But I assume this fix won't come out for some time.  Is 
there a
> way I can get this fix sooner?
>   > I'm up against a deadline and would very much like this
> functionality.
> 
>   Just get lucenes sources, change the line and recompile.
>   The difficult part is to get a copy of JavaCC 2 (3 won't do), 
but I
> think
>   this can be found in the archives.
> 
>   >
>   > And to go one more step with the KeywordAnalyzer that I 
wrote,
> changing this method to skip the escape:
>   > protected boolean isTokenChar(char c)
>   > {
>   >  if (c == '\\')
>   >  {
>   > return false;
>   >  }
>   >  else
>   >  {
>   > return true;
>   >  }
>   >   }
>   > The test then returns with a space:
>   >  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   >   [HW-NCI_TOPICS]
>   > query.ToString = +category:"HW -NCI_TOPICS" +space
>   > junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
>   > Expected:+category:HW\-NCI_TOPICS +space
>   > Actual  :+category:"HW -NCI_TOPICS" +space    escape was.
> 
>   Sure. If \ isn't a token char, it end's the token.
>   So you will have to look for a different way of implementing 
the
>   analyzer. Shouldn't be that difficult since you have only one 
token.
> 
>   Maybe it should be the job of the query parser to remove the 
escape
> character
>   (would make more sense to me at le

RE: Query syntax on Keyword field question

2004-03-24 Thread Morus Walter
Chad Small writes:
> I'm getting this with 3.2:
>  
> javacc-check:
> BUILD FAILED
> file:D:/applications/lucene-1.3-final/build.xml:97:
>   ##
>   JavaCC not found.
>   JavaCC Home: /applications/javacc-3.2/bin
>   JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
>   Please download and install JavaCC from:
>   
>   Then, create a build.properties file either in your home
>   directory, or within the Lucene directory and set the javacc.home
>   property to the path where JavaCC is installed. For example,
>   if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
>   javacc.home property to:
>   javacc.home=/usr/local/java/javacc-3.2
>   If you get an error like the one below, then you have not installed
>   things correctly. Please check all your paths and try again.
>   java.lang.NoClassDefFoundError: org.javacc.parser.Main
>   ##
>  
> even though I put a build.properties file in my root lucene directory with this in 
> it:
> javacc.home=/applications/javacc-3.2/bin
>  
I never tried javacc 3.2 but I thought there were issues with query parser
and/or standard analyzer.
Seems I'm wrong or outdated.

In your case the problem seems to be installation of javacc.

I guess the /bin directory should not be part of javacc.home.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
I'm getting this with 3.2:
 
javacc-check:
BUILD FAILED
file:D:/applications/lucene-1.3-final/build.xml:97:
  ##
  JavaCC not found.
  JavaCC Home: /applications/javacc-3.2/bin
  JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar
  Please download and install JavaCC from:
  
  Then, create a build.properties file either in your home
  directory, or within the Lucene directory and set the javacc.home
  property to the path where JavaCC is installed. For example,
  if you installed JavaCC in /usr/local/java/javacc-3.2, then set the
  javacc.home property to:
  javacc.home=/usr/local/java/javacc-3.2
  If you get an error like the one below, then you have not installed
  things correctly. Please check all your paths and try again.
  java.lang.NoClassDefFoundError: org.javacc.parser.Main
  ##
 
even though I put a build.properties file in my root lucene directory with this in it:
javacc.home=/applications/javacc-3.2/bin
 
hmm?

-Original Message- 
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Wed 3/24/2004 8:29 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



JavaCC 3.2 works for me.

Otis

--- Chad Small <[EMAIL PROTECTED]> wrote:
> thanks.  I was in the process of getting javacc3.2 setup.  I'll have
> to hunt for 2.x.
> 
> chad.
>
>   -Original Message-
>   From: Morus Walter [mailto:[EMAIL PROTECTED]
>   Sent: Wed 3/24/2004 8:00 AM
>   To: Lucene Users List
>   Cc:
>   Subject: RE: Query syntax on Keyword field question
>  
>  
>
>   Hi Chad,
>  
>   > But I assume this fix won't come out for some time.  Is there a
> way I can get this fix sooner?
>   > I'm up against a deadline and would very much like this
> functionality.
>  
>   Just get lucenes sources, change the line and recompile.
>   The difficult part is to get a copy of JavaCC 2 (3 won't do), but I
> think
>   this can be found in the archives.
>  
>   >
>   > And to go one more step with the KeywordAnalyzer that I wrote,
> changing this method to skip the escape:
>   > protected boolean isTokenChar(char c)
>   > {
>   >  if (c == '\\')
>   >  {
>   > return false;
>   >  }
>   >  else
>   >  {
>   > return true;
>   >  }
>   >   }
>   > The test then returns with a space:
>   >  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   >   [HW-NCI_TOPICS]
>   > query.ToString = +category:"HW -NCI_TOPICS" +space
>   > junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
>   > Expected:+category:HW\-NCI_TOPICS +space
>   > Actual  :+category:"HW -NCI_TOPICS" +space    escape was.
>  
>   Sure. If \ isn't a token char, it end's the token.
>   So you will have to look for a different way of implementing the
>   analyzer. Shouldn't be that difficult since you have only one token.
>  
>   Maybe it should be the job of the query parser to remove the escape
> character
>   (would make more sense to me at least) but that would be another
> change
>   of the query parser...
>  
>   Morus
>  
>
> -
>   To unsubscribe, e-mail: [EMAIL PROTECTED]
>   For additional commands, e-mail: [EMAIL PROTECTED]
>  
>  
>
> >
-
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Query syntax on Keyword field question

2004-03-24 Thread Otis Gospodnetic
JavaCC 3.2 works for me.

Otis

--- Chad Small <[EMAIL PROTECTED]> wrote:
> thanks.  I was in the process of getting javacc3.2 setup.  I'll have
> to hunt for 2.x.
>  
> chad.
> 
>   -Original Message- 
>   From: Morus Walter [mailto:[EMAIL PROTECTED] 
>   Sent: Wed 3/24/2004 8:00 AM 
>   To: Lucene Users List 
>   Cc: 
>   Subject: RE: Query syntax on Keyword field question
>   
>   
> 
>   Hi Chad,
>   
>   > But I assume this fix won't come out for some time.  Is there a
> way I can get this fix sooner? 
>   > I'm up against a deadline and would very much like this
> functionality.
>   
>   Just get lucenes sources, change the line and recompile.
>   The difficult part is to get a copy of JavaCC 2 (3 won't do), but I
> think
>   this can be found in the archives.
>   
>   > 
>   > And to go one more step with the KeywordAnalyzer that I wrote,
> changing this method to skip the escape:
>   > protected boolean isTokenChar(char c)
>   > {
>   >  if (c == '\\')
>   >  {
>   > return false;
>   >  }
>   >  else
>   >  {
>   > return true;
>   >  }
>   >   }
>   > The test then returns with a space:
>   >  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   >   [HW-NCI_TOPICS]
>   > query.ToString = +category:"HW -NCI_TOPICS" +space
>   > junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
>   > Expected:+category:HW\-NCI_TOPICS +space
>   > Actual  :+category:"HW -NCI_TOPICS" +space    escape was.
>   
>   Sure. If \ isn't a token char, it end's the token.
>   So you will have to look for a different way of implementing the
>   analyzer. Shouldn't be that difficult since you have only one token.
>   
>   Maybe it should be the job of the query parser to remove the escape
> character
>   (would make more sense to me at least) but that would be another
> change
>   of the query parser...
>   
>   Morus
>   
> 
> -
>   To unsubscribe, e-mail: [EMAIL PROTECTED]
>   For additional commands, e-mail: [EMAIL PROTECTED]
>   
>   
> 
> >
-
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
For others reference - here is the old version url:
 
https://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=212

-Original Message- 
From: Chad Small 
Sent: Wed 3/24/2004 8:07 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



thanks.  I was in the process of getting javacc3.2 setup.  I'll have to hunt 
for 2.x.

chad.

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED]
Sent: Wed 3/24/2004 8:00 AM
To: Lucene Users List
Cc:
Subject: RE: Query syntax on Keyword field question
   
   

Hi Chad,
   
> But I assume this fix won't come out for some time.  Is there a way 
I can get this fix sooner?
> I'm up against a deadline and would very much like this 
functionality.
   
Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I 
think
this can be found in the archives.
   
>
> And to go one more step with the KeywordAnalyzer that I wrote, 
changing this method to skip the escape:
> protected boolean isTokenChar(char c)
> {
>  if (c == '\\')
>  {
> return false;
>  }
>  else
>  {
> return true;
>  }
>   }
> The test then returns with a space:
>  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   [HW-NCI_TOPICS]
> query.ToString = +category:"HW -NCI_TOPICS" +space
> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
> Expected:+category:HW\-NCI_TOPICS +space
> Actual  :+category:"HW -NCI_TOPICS" +space   

RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
thanks.  I was in the process of getting javacc3.2 setup.  I'll have to hunt for 2.x.
 
chad.

-Original Message- 
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Wed 3/24/2004 8:00 AM 
To: Lucene Users List 
Cc: 
Subject: RE: Query syntax on Keyword field question



Hi Chad,

> But I assume this fix won't come out for some time.  Is there a way I can 
get this fix sooner? 
> I'm up against a deadline and would very much like this functionality.

Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think
this can be found in the archives.

> 
> And to go one more step with the KeywordAnalyzer that I wrote, changing this 
method to skip the escape:
> protected boolean isTokenChar(char c)
> {
>  if (c == '\\')
>  {
> return false;
>  }
>  else
>  {
> return true;
>  }
>   }
> The test then returns with a space:
>  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   [HW-NCI_TOPICS]
> query.ToString = +category:"HW -NCI_TOPICS" +space
> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
> Expected:+category:HW\-NCI_TOPICS +space
> Actual  :+category:"HW -NCI_TOPICS" +space   

RE: Query syntax on Keyword field question

2004-03-24 Thread Morus Walter
Hi Chad,

> But I assume this fix won't come out for some time.  Is there a way I can get this 
> fix sooner?  
> I'm up against a deadline and would very much like this functionality. 

Just get lucenes sources, change the line and recompile.
The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think
this can be found in the archives.

>  
> And to go one more step with the KeywordAnalyzer that I wrote, changing this method 
> to skip the escape:
> protected boolean isTokenChar(char c)
> {
>  if (c == '\\')
>  {
> return false;
>  }
>  else
>  {
> return true;
>  }
>   }
> The test then returns with a space:
>  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   [HW-NCI_TOPICS] 
> query.ToString = +category:"HW -NCI_TOPICS" +space
> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
> Expected:+category:HW\-NCI_TOPICS +space
> Actual  :+category:"HW -NCI_TOPICS" +space   

RE: Query syntax on Keyword field question

2004-03-24 Thread Otis Gospodnetic
If you can't wait for a release, you'll have to check out Lucene
directly from CVS, or get one of the nightly builds.

Otis

--- Chad Small <[EMAIL PROTECTED]> wrote:
> Great info Morus,
>  
> After making the "escape the dash" change to the QueryParser:
>  
> Query query = QueryParser.parse("+category:HW\\-NCI_TOPICS AND
> SPACE",
>   "description",
>   analyzer);
>   Hits hits = searcher.search(query);
>   System.out.println("query.ToString = " +
> query.toString("description"));
>   assertEquals("HW-NCI_TOPICS kept as-is",
>"+category:HW\\-NCI_TOPICS +space",
> query.toString("description"));  <--note that this passes with
> the escape put in, so not "as-is".
>   assertEquals("doc found!", 1, hits.length());
>  
> I'm still getting this output:
>  
>  domain.lucenesearch.KeywordAnalyzer:
>   [HW-NCI_TOPICS] 
>  
> query.ToString = +category:HW\-NCI_TOPICS +space
>  
> junit.framework.AssertionFailedError: doc found! expected:<1> but
> was:<0>
>  
> It look like bug,
> http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
>  , was fixed
> today:
>  
> --- Additional Comments From Otis Gospodnetic
>   2004-03-24 10:10 ---
> 
> Although tft-monitor should not really result in a phrase query "tft
> monitor", I
> agree that this is better than converting it to tft AND NOT monitor
> (tft -monitor).
> Moreover, I have seen query syntax where '-' characters are used for
> phrase
> queries instead or in addition to quotes, so one could use either
> morus-walter
> or "morus walter".
> 
> I applied your change, as it doesn't look like it breaks anything,
> and I hope
> nobody relied on ill behaviour where tft-monitor would result in AND
> NOT query.
> ---
> But I assume this fix won't come out for some time.  Is there a way I
> can get this fix sooner?  
> I'm up against a deadline and would very much like this
> functionality. 
>  
> And to go one more step with the KeywordAnalyzer that I wrote,
> changing this method to skip the escape:
> protected boolean isTokenChar(char c)
> {
>  if (c == '\\')
>  {
> return false;
>  }
>  else
>  {
> return true;
>  }
>   }
> The test then returns with a space:
>  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   [HW-NCI_TOPICS] 
> query.ToString = +category:"HW -NCI_TOPICS" +space
> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
> Expected:+category:HW\-NCI_TOPICS +space
> Actual  :+category:"HW -NCI_TOPICS" +space    escape was.
> thanks,
> chad.
> 
>   -Original Message- 
>   From: Morus Walter [mailto:[EMAIL PROTECTED] 
>   Sent: Wed 3/24/2004 1:43 AM 
>   To: Lucene Users List 
>   Cc: 
>   Subject: RE: Query syntax on Keyword field question
>   
>   
> 
>   Chad Small writes:
>   > Here is my attempt at a KeywordAnalyzer - although is not working?
>  Excuse the length of the message, but wanted to give actual code.
>   > 
>   > With this output:
>   > 
>   > Analzying "HW-NCI_TOPICS"
>   >  org.apache.lucene.analysis.WhitespaceAnalyzer:
>   >   [HW-NCI_TOPICS]
>   >  org.apache.lucene.analysis.SimpleAnalyzer:
>   >   [hw] [nci] [topics]
>   >  org.apache.lucene.analysis.StopAnalyzer:
>   >   [hw] [nci] [topics]
>   >  org.apache.lucene.analysis.standard.StandardAnalyzer:
>   >   [hw] [nci] [topics]
>   >  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   >   [HW-NCI_TOPICS]
>   > 
>   > query.ToString = category:HW -"nci topics" +space
>   >
>   > junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
>   > Expected:+category:HW-NCI_TOPICS +space
>   > Actual  :category:HW -"nci topics" +space
>   > 
>   
>   Well query parser does not allow `-' within words currently.
>   So before your analyzer is called, query parser reads one word HW, a
> `-'
>   operator, one word NCI_TOPICS.
>   The latter is analyzed as "nci topics" because it's not in field
> category
>   anymore, I guess.
>   
>   I suggested to change this. See
>   http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
>   
>   Either you escape the - using category:HW\-NCI_TOPICS in your query
>   (untested. and I don't know where the escape character will be
> removed)
>   or you apply my suggested change.
>   
>   Another option for using keywords with query parser might be adding
> a
>   keyword syntax to the query parser.
>   Something like category:key("HW-NCI_TOPICS") or
> category="HW-NCI_TOPICS".
>   
>   HTH
>   Morus
>   
> 
> -
>   To unsubscribe, e-mail: [EMAIL PROTE

RE: Query syntax on Keyword field question

2004-03-24 Thread Chad Small
Great info Morus,
 
After making the "escape the dash" change to the QueryParser:
 
Query query = QueryParser.parse("+category:HW\\-NCI_TOPICS AND SPACE",
  "description",
  analyzer);
  Hits hits = searcher.search(query);
  System.out.println("query.ToString = " + query.toString("description"));
  assertEquals("HW-NCI_TOPICS kept as-is",
   "+category:HW\\-NCI_TOPICS +space", query.toString("description")); 
 <--note that this passes with the escape put in, so not "as-is".
  assertEquals("doc found!", 1, hits.length());
 
I'm still getting this output:
 
 domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
 
query.ToString = +category:HW\-NCI_TOPICS +space
 
junit.framework.AssertionFailedError: doc found! expected:<1> but was:<0>
 
It look like bug, http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 
 , was fixed today:
 
--- Additional Comments From Otis Gospodnetic   
2004-03-24 10:10 ---

Although tft-monitor should not really result in a phrase query "tft monitor", I
agree that this is better than converting it to tft AND NOT monitor (tft -monitor).
Moreover, I have seen query syntax where '-' characters are used for phrase
queries instead or in addition to quotes, so one could use either morus-walter
or "morus walter".

I applied your change, as it doesn't look like it breaks anything, and I hope
nobody relied on ill behaviour where tft-monitor would result in AND NOT query.
---
But I assume this fix won't come out for some time.  Is there a way I can get this fix 
sooner?  
I'm up against a deadline and would very much like this functionality. 
 
And to go one more step with the KeywordAnalyzer that I wrote, changing this method to 
skip the escape:
protected boolean isTokenChar(char c)
{
 if (c == '\\')
 {
return false;
 }
 else
 {
return true;
 }
  }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
query.ToString = +category:"HW -NCI_TOPICS" +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:"HW -NCI_TOPICS" +space    Here is my attempt at a KeywordAnalyzer - although is not working?  Excuse 
the length of the message, but wanted to give actual code.
> 
> With this output:
> 
> Analzying "HW-NCI_TOPICS"
>  org.apache.lucene.analysis.WhitespaceAnalyzer:
>   [HW-NCI_TOPICS]
>  org.apache.lucene.analysis.SimpleAnalyzer:
>   [hw] [nci] [topics]
>  org.apache.lucene.analysis.StopAnalyzer:
>   [hw] [nci] [topics]
>  org.apache.lucene.analysis.standard.StandardAnalyzer:
>   [hw] [nci] [topics]
>  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   [HW-NCI_TOPICS]
> 
> query.ToString = category:HW -"nci topics" +space
>
> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
> Expected:+category:HW-NCI_TOPICS +space
> Actual  :category:HW -"nci topics" +space
> 

Well query parser does not allow `-' within words currently.
So before your analyzer is called, query parser reads one word HW, a `-'
operator, one word NCI_TOPICS.
The latter is analyzed as "nci topics" because it's not in field category
anymore, I guess.

I suggested to change this. See
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491

Either you escape the - using category:HW\-NCI_TOPICS in your query
(untested. and I don't know where the escape character will be removed)
or you apply my suggested change.

Another option for using keywords with query parser might be adding a
keyword syntax to the query parser.
Something like category:key("HW-NCI_TOPICS") or category="HW-NCI_TOPICS".

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]