Re: QueryParser
On Tue, 2002-11-19 at 23:07, stephane vaucher wrote: I've tested the following. I don't know if I'm hitting expected behaviour, but it seems suspicious: Hi, You might like to see this thread to lucene-dev from a couple of months ago: http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01843.html I provided a test patch and what I thought was a fix, but as Doug explains in his replies, setBoost() isn't currently implemented for boolean queries. Regards, -- Lee Mallabone. public void testPhraseBoost() throws Exception{ assertQueryEquals((a AND b) OR c, null, (+a +b) c); assertQueryEquals((a AND b)^2 c, null, (+a +b)^2.0 c); } -- junit result There was 1 failure: 1) testPhraseBoost(org.apache.lucene.queryParser.TestQueryParser)junit.framework.AssertionFailedError: Query /(a AND b)^2 c/ yielded /+a +b/, expecting /(+a +b)^2.0 c/ at org.apache.lucene.queryParser.TestQueryParser.assertQueryEquals(Unknown -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
1
- Get a bigger mailbox -- choose a size that fits your needs.
RE: Several fields with the same name
Maybe you can show the actual output of this piece of code. What do you get? Show... --- Rob Outar [EMAIL PROTECTED] wrote: Otis, Tried this: f = doc.get(key); while (f != null ) { l.add(f); //get next value for same key f = doc.get(key); System.out.println(f); } I got an outofmemory error after a while so it looks like it will keep returning the same value, and not null; Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 06, 2002 2:57 PM To: Lucene Users List Subject: Re: Several fields with the same name Looking at the source if looks like you can just call it multiple times until it returns null. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hello all, I have a relationship where for one key there are many values, basically a 1 to many relationship. For example with the key = name, value = bob, jim, etc.. When a client wants all the values that have been associated with the field name, how would I get that? The javadoc for Document.get(String name) states: Returns the string value of the field with the given name if any exist in this document, or null. If multiple fields may exist with this name, this method returns the last added suchadded. I don't need the last field's value, I need all values associated with that field. Any help would be appreciated. Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Problem building Lucene
Hi: I downloaded the lucene source and have been trying to build using ant. I am getting the following error message: -- - Buildfile: build.xml init: javacc_check: compile: [javacc] java was not found in /usr/local/apps/java/bin/sparc/native_threads/java BUILD FAILED /users/science/user/lucene/lucene-1.2-src/build.xml:96: java failed with return code 1 -- - The JavaCC version is 2.1. Platform is Sun sparc solaris. JAVA_HOME env variable has been set to /usr/local/apps/java. Any help will be most appreciated. Thanks ND -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Book
I would like to buy a book about Lucene. Who could write it ? : ) _ STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
A little date help
Hello all, I am indexing the date using the java.io.file.lastModified() method doc.add(new Field(MODIFIED_DT, DateField.timeToString(f.lastModified()), true, true, true)); I am trying to search on this field, but I am having a hard time formatting the date correctly. I am not sure what date format lastModified() uses so trying to come up with a query in milliseconds for the above date field is difficult. Has anyone run into this problem? Is there an easier way to do this? Let me know, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Book
William, On Wednesday 20 November 2002 21:14, you wrote: I would like to buy a book about Lucene. Who could write it ? : ) AFAIK there is no book, but some articles might help: http://citeseer.nj.nec.com/cs?q=doug+cuttingsubmit=Search+Documentscs=1 Optimizations for Dynamic Inverted Index Maintenance and An Object-Oriented Architecture for Text Retrieval are the ones I like best. Have fun, Ype -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Stress/scalability testing Lucene
Ah, for some reason i thought none of the Lucene methods were thread safe, or is this only in the case of reading and writing at the same time? I thought I read this in the FAQ. Roy. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 20, 2002 5:04 PM To: Lucene Users List Subject: Re: Stress/scalability testing Lucene * Replies will be sent through Spamex to [EMAIL PROTECTED] * For additional info click - http://www.spamex.com/i/?v=886513 Justin Greene wrote: We created a thread pool to read and parse the email messages. 10 threads seems to be the magic number here for us. We then created a queue of messages to be indexed onto which we push the parsed messages and have a single thread adding messages to the index. IndexWriter.addDocument(Document) is thread safe, so you don't need a separate indexing thread. So long as your analyzer is thread safe, you can index each messages in the thread that parses it, for even greater parallelism. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] This email and any attachments are confidential and may be legally privileged. No confidentiality or privilege is waived or lost by any transmission in error. If you are not the intended recipient you are hereby notified that any use, printing, copying or disclosure is strictly prohibited. Please delete this email and any attachments, without printing, copying, forwarding or saving them and notify the sender immediately by reply e-mail. Zurich Capital Markets and its affiliates reserve the right to monitor all e-mail communications through its networks. Unless otherwise stated, any pricing information in this e-mail is indicative only, is subject to change and does not constitute an offer to enter into any transaction at such price and any terms in relation to any proposed transaction are indicative only and subject to express final confirmation.
RE: Stress/scalability testing Lucene
Reding and writing at the same time is okay. Only one thread can modify the index at a time. Otis --- [EMAIL PROTECTED] wrote: Ah, for some reason i thought none of the Lucene methods were thread safe, or is this only in the case of reading and writing at the same time? I thought I read this in the FAQ. Roy. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 20, 2002 5:04 PM To: Lucene Users List Subject: Re: Stress/scalability testing Lucene * Replies will be sent through Spamex to [EMAIL PROTECTED] * For additional info click - http://www.spamex.com/i/?v=886513 Justin Greene wrote: We created a thread pool to read and parse the email messages. 10 threads seems to be the magic number here for us. We then created a queue of messages to be indexed onto which we push the parsed messages and have a single thread adding messages to the index. IndexWriter.addDocument(Document) is thread safe, so you don't need a separate indexing thread. So long as your analyzer is thread safe, you can index each messages in the thread that parses it, for even greater parallelism. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] This email and any attachments are confidential and may be legally privileged. No confidentiality or privilege is waived or lost by any transmission in error. If you are not the intended recipient you are hereby notified that any use, printing, copying or disclosure is strictly prohibited. Please delete this email and any attachments, without printing, copying, forwarding or saving them and notify the sender immediately by reply e-mail. Zurich Capital Markets and its affiliates reserve the right to monitor all e-mail communications through its networks. Unless otherwise stated, any pricing information in this e-mail is indicative only, is subject to change and does not constitute an offer to enter into any transaction at such price and any terms in relation to any proposed transaction are indicative only and subject to express final confirmation. __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Searching Ranges
doug, if you happen to remember this thread, i was wanting to know if you had any thoughts on improving this search in the situation below, my temp fix does not work in all situations, so i am back to square one. i have completely gutted the RangeQuery and created an additional RangeScorer to help eliminate some of the overheard incurred in the special situation below, but the search times are still unacceptable. currently i have reduced the logic down to simply iterating over the set of terms between the range and returning the set of termDocs for each, and then in turn maintaining an [] of the results. although my implementation is substantially faster than before it is still very slow. my thought was that i might be able to accomplish a more efficient range query at the Reader level, any thoughts? i am certain that some of the redundant iteration can be eliminated i am just not sure how. thanks alex Alex Winston wrote: lets say that i have a document named d1, which contains a field named references. within the references field i maintain a list of terms that represent my range from 001-005, more specifically the field would contain the terms 001 002 003 004 005. i would now like to search this range to determine if it falls within the range 003-010, so my query would look like references:[003 010]. signature.asc Description: This is a digitally signed message part
Re: Stress/scalability testing Lucene
Otis Gospodnetic wrote: Reding and writing at the same time is okay. Only one thread can modify the index at a time. Almost. Only one process can modify it at a time, other processes will be prevented by the write.lock file. Multiple threads can modify an index simultaneously. The bulk of the work of the updates will be serialized by synchronization in IndexWriter, but the analysis of the text into tokens is parallelizeable. Doug --- [EMAIL PROTECTED] wrote: Ah, for some reason i thought none of the Lucene methods were thread safe, or is this only in the case of reading and writing at the same time? I thought I read this in the FAQ. Roy. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 20, 2002 5:04 PM To: Lucene Users List Subject: Re: Stress/scalability testing Lucene * Replies will be sent through Spamex to [EMAIL PROTECTED] * For additional info click - http://www.spamex.com/i/?v=886513 Justin Greene wrote: We created a thread pool to read and parse the email messages. 10 threads seems to be the magic number here for us. We then created a queue of messages to be indexed onto which we push the parsed messages and have a single thread adding messages to the index. IndexWriter.addDocument(Document) is thread safe, so you don't need a separate indexing thread. So long as your analyzer is thread safe, you can index each messages in the thread that parses it, for even greater parallelism. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] This email and any attachments are confidential and may be legally privileged. No confidentiality or privilege is waived or lost by any transmission in error. If you are not the intended recipient you are hereby notified that any use, printing, copying or disclosure is strictly prohibited. Please delete this email and any attachments, without printing, copying, forwarding or saving them and notify the sender immediately by reply e-mail. Zurich Capital Markets and its affiliates reserve the right to monitor all e-mail communications through its networks. Unless otherwise stated, any pricing information in this e-mail is indicative only, is subject to change and does not constitute an offer to enter into any transaction at such price and any terms in relation to any proposed transaction are indicative only and subject to express final confirmation. __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Help on creating and maintaining an index that changes
Hi, I'm a lucene newbie. I wanted to ask someone's expert opinion on how to attack this issue. I have a set of documents (a catalog) that many clients want to register with the search server. While those clients are reachable their catalog should be available, but if they log off or disappear then I want to remove their catalog from the index. Currently, I have this implemented with two hashmaps. Their catalog is assigned a unique key in one hashmap, and their catalog contents is parsed out into keywords, and put into the master hashmap which indexes into the other one. When a client leaves I remove their catalog from the first hashmap, and I don't clean up the references in the master hashmap. If a search indexes a key that is null in the first hashmap I remove the reference at that point from the master Hashmap. I want to do something similiar with Lucene, but I don't know how to approach it. I thought maybe keeping the first hashmap as is, and building a Directory in lucene that replaces the master Hashmap. When I get hits back from lucene I look them up in the first hashmap, and return those. How do I put the needed information into Directory so I can look them up in the first hashmap. I would need the unique id identifying the client, and a key that identifies the document that the client has. Then how do I clean up the Directory when a client is not available? How do I remove a document from Lucene's Directory? thanks in advance, charlie __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Searching for keyword 0 [zero] using TermQuery
Is 0 in the list of your stop words? Otis --- Eric Fixler [EMAIL PROTECTED] wrote: Hello. I have a field in an index that stores item id's which can be zero. I use a TermQuery to search for these, and everything works fine except when I'm searching for things with id 0; these entries return no results. The index appears to have the correct data and the query looks proper as far as I can tell. Is this a known issue? Can anyone suggest a possible workaround? thanks eric -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Book
I wrote a few articles that I'm trying to publish somewhere now. Cheaper than a book :) Otis --- William W [EMAIL PROTECTED] wrote: I would like to buy a book about Lucene. Who could write it ? : ) __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Fun project?
I wish I had time to work on this for fun, but I was thinking about what could be a fun lucene project... One could build a peer-to-peer document search application. Each client would index the documents on its harddrive, or documents in a particular directory. When the user at the computer does a search it will look at the documents on its harddrive, but also send out a request for the search on the P2P network. First though, are there any P2P java frameworks out there? One could build one, perhaps with OpenJMS, but it would be nice if one already existed. Hmm... if anyone else thinks this would be cool I'd be willing to work on this with you. thanks, Robert A. Decker http://www.robdecker.com/ http://www.planetside.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Searching for keyword 0 [zero] using TermQuery
Hi. Thanks for the reply. I'm just using StandardAnalyzers with the no-args contructor; 0 does not appear to be one of the STOP_WORDS. eric On Wednesday, November 20, 2002, at 08:56 PM, Otis Gospodnetic wrote: Is 0 in the list of your stop words? Otis --- Eric Fixler [EMAIL PROTECTED] wrote: Hello. I have a field in an index that stores item id's which can be zero. I use a TermQuery to search for these, and everything works fine except when I'm searching for things with id 0; these entries return no results. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Searching for keyword 0 [zero] using TermQuery : PROBLEM SOLVED
My bug; checking to see if form fields not empty was a little but over-aggressive. As always, thanks for the help (and thanks for lucene!) eric -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]