Re: Query#rewrite Question

2004-11-11 Thread Erik Hatcher
On Nov 10, 2004, at 9:51 PM, Satoshi Hasegawa wrote: Our program accepts input in the form of Lucene query syntax from the user, but we wish to perform additional tasks such as thesaurus expansion. So I want to manipulate the Query object that results from parsing. You may want to consider using

Re: Locking issue

2004-11-11 Thread Erik Hatcher
On Nov 11, 2004, at 1:47 AM, [EMAIL PROTECTED] wrote: Yes, I tried that too and it worked. The issue is that our Operations folks plan to install this on a pretty busy box and I was hoping that Lucene wouldn't cause issues if it only had a small slice of the CPU. I don't think that Lucene is

Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-11 Thread Sanyi
Hi! First of all, I've read about BooleanQuery$TooManyClauses, so I know that it has a 1024 Clauses limit by default which is good enough for me, but I still think it works strange. Example: I have an index with about 20Million documents. Let's say that there is about 3000 variants in the

Re[2]: Faster highlighting with TermPositionVectors (update)

2004-11-11 Thread Maxim Patramanskij
Hello Mark. I'm just wondered about the following piece of code from your latest TokenSources class: public static TokenStream getAnyTokenStream(IndexReader reader,int docId, String field,Analyzer analyzer) throws IOException { TokenStream ts=null;

Re[2]: Faster highlighting with TermPositionVectors (update)

2004-11-11 Thread mark harwood
Thanks, Max. Another schoolboy error in TokenSources.java :) More haste, less speed required on my part. I have updated my code and will post to website tonight. This change doesn't appear to have made a noticeable difference in performance but the code is cleaner. Cheers Mark

Re: Search scalability

2004-11-11 Thread Otis Gospodnetic
If you load it explicitly, then all 800 MB will make it into RAM. It's easy to try, the API for this is super simple. Otis --- [EMAIL PROTECTED] wrote: Does it take 800MB of RAM to load that index into a RAMDirectory? Or are only some of the files loaded into RAM? --- Otis Gospodnetic

RE: Search scalability

2004-11-11 Thread Ravi
Thanks a lot. I'll use RAMDirectory and post my results. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, November 11, 2004 9:09 AM To: Lucene Users List Subject: Re: Search scalability If you load it explicitly, then all 800 MB will make it into

Re: Acedemic Question About Indexing

2004-11-11 Thread Luke Shannon
40 Million! Wow. Ok this is the kind of answer I was looking for. The site I am working on indexes maybe 1000 at any given time. I think I am ok with a single index. Thanks. - Original Message - From: Will Allen [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday,

Re: Acedemic Question About Indexing

2004-11-11 Thread Gard Arneson Haugen
Could I ask how fast the search goes against this index, both for simple words and more advanced phrase and boolean searches? And is there something smart you have done to make this go fast, both on the infrastructure or the system it selves? Best regards, Gard Arneson Haugen Email : [EMAIL

RE: Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-11 Thread Will Allen
Any wildcard search will automatically expand your query to the number of terms it find in the index that suit the wildcard. For example: wild*, would become wild OR wilderness OR wildman etc for each of the terms that exist in your index. It is because of this, that you quickly reach the

HTMLParser.getReader returning null

2004-11-11 Thread Luke Shannon
Hello; Things were working fine. I have been re-organizing my code to drop into QA when I noticed I was no longer getting search results for my HTML files. When I checked things out I confirmed I was still creating the Documents but realized no content was being indexed. HTMLParser parser = new

RE: Acedemic Question About Indexing

2004-11-11 Thread Will Allen
I have a servlet that instanciates a multisearcher on 6 indexes: (du -h) 7.2G./0 7.2G./1 7.2G./2 7.2G./3 7.2G./4 7.2G./5 43G . I recreate the index from scratch each month based upon a 50gig zip file with all of the 40 million documents. I wanted to keep my indexing

RE: Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-11 Thread Sanyi
Yes, I understand all of this, but I don't want to set it to MaxInt, since it can easily lead to (even accidental) DoS attacks. What I'm saying is that there is no reason for the optimizer to expand wild* to more than 1024 variations when I search for somerareword AND wild*, since somerareword

Re: Query#rewrite Question

2004-11-11 Thread Paul Elschot
On Thursday 11 November 2004 03:51, Satoshi Hasegawa wrote: Hello, Our program accepts input in the form of Lucene query syntax from the user, but we wish to perform additional tasks such as thesaurus expansion. So I want to manipulate the Query object that results from parsing. My

Re: Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-11 Thread Daniel Naber
On Thursday 11 November 2004 20:57, Sanyi wrote: What I'm saying is that there is no reason for the optimizer to expand wild* to more than 1024 variations That's the point: there is no query optimizer in Lucene. Regards Daniel -- http://www.danielnaber.de

weird things in 1.4.2 build

2004-11-11 Thread Hetan Shah
Hi guys, Thanks for the fantastic mailing list. Where all the questions get answered. Guys I have upgraded my installation from 1.3.final to 1.4.2 and now when I try to index the files using IndexHTML the commnad just hangs on the prompt or would parse some 4 - 5 files and would simply hang.

getting error message

2004-11-11 Thread Hetan Shah
Does anyone know what does the following error message mean? TIA. -H root cause java.lang.NullPointerException at org.apache.jsp.searchResults_jsp._jspService(searchResults_jsp.java:627) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137) at

Lucene : avoiding locking

2004-11-11 Thread Luke Shannon
Hi All; I have hit a snag in my Lucene integration and don't know what to do. My company has a content management product. Each time someone changes the directory structure or a file with in it that portion of the site needs to be re-indexed so the changes are reflected in future searches

Re: Lucene : avoiding locking

2004-11-11 Thread yahootintin-lucene
I'm working on a similar project... Make sure that only one call to the index method is occuring at a time. Synchronizing that method should do it. --- Luke Shannon [EMAIL PROTECTED] wrote: Hi All; I have hit a snag in my Lucene integration and don't know what to do. My company has a

Re: Lucene : avoiding locking

2004-11-11 Thread Luke Shannon
I will try that now. Thank you. - Original Message - From: [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, November 11, 2004 6:56 PM Subject: Re: Lucene : avoiding locking I'm working on a similar project... Make sure that only one call to the index method

Re: Lucene : avoiding locking

2004-11-11 Thread Luke Shannon
Syncronizing the method didn't seem to help. The lock is being detected right here in the code: while (uidIter.term() != null uidIter.term().field() == uid uidIter.term().text().compareTo(uid) 0) { //delete stale docs if (deleting) { reader.delete(uidIter.term());

Re: Query#rewrite Question

2004-11-11 Thread Satoshi Hasegawa
Thank you, Erik and Paul. I'm not sure what SpanQuery is, but anyway we've decided to freeze the version of Lucene we use. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

lucene file locking question

2004-11-11 Thread John Wang
Hi folks: My application builds a super-index around the lucene index, e.g. stores some additional information outside of lucene. I am using my own locking outside of the lucene index via FileLock object in the jdk1.4 nio package. My code does the following: FileLock

Re: lucene file locking question

2004-11-11 Thread yahootintin-lucene
Disabling locking is only recommended for read-only indexes that aren't being modified. I think there is a comment in the code about a good example of this being an index you read off of a CD-ROM. --- John Wang [EMAIL PROTECTED] wrote: Hi folks: My application builds a super-index

Re: Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-11 Thread Sanyi
That's the point: there is no query optimizer in Lucene. Sorry, I'm not very much into Lucene's internal Classes, I'm just telling your the viewpoint of a user. You know my users aren't technicians, so answers like yours won't make them happy. They will only see that I randomly don't allow

Phrase search for more than 4 words throws exception in QueryParser

2004-11-11 Thread Sanyi
Hi! How to perform phrase searches for more than four words? This works well with 1.4.2: aa bb cc dd I pass the query as a command line parameter on XP: \aa bb cc dd\ QueryParser translates it to: text:aa text:bb text:cc text:dd Runs, searches, finds proper matches. This throws exeption in

Re: Phrase search for more than 4 words throws exception in QueryParser

2004-11-11 Thread Morus Walter
Sanyi writes: How to perform phrase searches for more than four words? This works well with 1.4.2: aa bb cc dd I pass the query as a command line parameter on XP: \aa bb cc dd\ QueryParser translates it to: text:aa text:bb text:cc text:dd Runs, searches, finds proper matches. This