Re: Default search to AND rather than OR

2006-05-17 Thread [EMAIL PROTECTED]
You're a Start Erik! Thanks heaps. -- View this message in context: http://www.nabble.com/Default-search-to-AND-rather-than-OR-t1640214.html#a4443102 Sent from the Lucene - Java Developer forum at Nabble.com. - To unsubscribe,

Re: Default search to AND rather than OR

2006-05-17 Thread Erik Hatcher
Owen, Have a look at the QueryParser API. setDefaultOperator :) However, be aware that you need to construct an instance of QueryParser and use the instance parse(String) method and not the static parse(String,String,Analyzer) method. Erik On May 17, 2006, at 9:04 PM, [EMAIL PRO

RE: FieldsReader synchronized access vs. ThreadLocal ?

2006-05-17 Thread Robert Engels
Does it use a ThreadLocal for the FieldsReader? If so, that is somewhat less efficient (than using a ThreadLocal on the streams in the FieldsReader - as the modified code I supplied does). In either case it is better than the synchronization on the document() call. It is just not needed. -Ori

Default search to AND rather than OR

2006-05-17 Thread [EMAIL PROTECTED]
I have users that want the search to work like google. So when a user enters a query, such as: Tram Pole The result should be documents that contains both "tram" and "pole". At the moment, the default seems to be documents that contain "tram" or "pole". I thought of paring the query before I su

Re: FieldsReader synchronized access vs. ThreadLocal ?

2006-05-17 Thread Grant Ingersoll
Somewhat related, the Lazy Field loading patch uses a ThreadLocal on the FieldsReader to handle. It is issue 545 in Jira. Yonik Seeley wrote: On 5/17/06, Robert Engels <[EMAIL PROTECTED]> wrote: If you run a concurrent searches over a million documents, returning only the matching 500 of eac

RE: non indexed field searching?

2006-05-17 Thread Robert Engels
Having an indexed-field seems to occur significant overhead when merging, and if the index is highly interactive, the merging process occurs quite often. Maybe I am incorrect regarding the overhead of indexed fields? I have attempted to keep the number of indexed fields to a minimum. I think it b

Re: non indexed field searching?

2006-05-17 Thread Erik Hatcher
On May 17, 2006, at 11:20 AM, Robert Engels wrote: I reviewed the solr source (at LOT of the code is amazingly similar to our own search server). I don't see anything related to searching using non-indexed fields. Could you maybe point me at the class(es) that implement this functionality

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Marvin Humphrey
On May 17, 2006, at 2:04 PM, Doug Cutting wrote: Detecting invalidly encoded text later doesn't help anything in and of itself; lifting the requirement that everything be converted to Unicode early on opens up some options. How useful are those options? Are they worth the price? Conv

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Doug Cutting
Marvin Humphrey wrote: I *think* that whether it was invalidly encoded or not wouldn't impact searching -- it doesn't in KinoSearch. It should only affect display. I think Java's approach of converting everything to unicode internally is useful. One must still handle dirty input, but it

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Marvin Humphrey
On May 17, 2006, at 11:08 AM, Doug Cutting wrote: Marvin Humphrey wrote: What I'd like to do is augment my existing patch by making it possible to specify a particular encoding, both for Lucene and Luke. What ensures that all documents in fact use the same encoding? In KinoSearch at this

[jira] Commented: (LUCENE-544) MultiFieldQueryParser field boost multiplier

2006-05-17 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-544?page=comments#action_12412232 ] Otis Gospodnetic commented on LUCENE-544: - Karl - can you submit this as a patch/diff, please? I could use this myself, so I'd love to commit this. svn diff -bBt ...

Re: FieldsReader synchronized access vs. ThreadLocal ?

2006-05-17 Thread Yonik Seeley
On 5/17/06, Robert Engels <[EMAIL PROTECTED]> wrote: If you run a concurrent searches over a million documents, returning only the matching 500 of each. There's the difference... we pretty much never retrieve 500 documents. We retrieve exactly the number needed to display a page of search resul

RE: FieldsReader synchronized access vs. ThreadLocal ?

2006-05-17 Thread Robert Engels
If you run a concurrent searches over a million documents, returning only the matching 500 of each, you will still encounter significant blocking if there are not many matches between the documents sets - the blocking is even worse due to the isDeleted() synchronization, as query performance will b

RE: non indexed field searching?

2006-05-17 Thread Chris Hostetter
: I don't see anything related to searching using non-indexed fields. Could : you maybe point me at the class(es) that implement this functionality? I think Erik was refering more specificly to the statement... : > it is just : > very difficult to perform some complex queries efficiently without

Re: weird behavior of IndexReader.indexExists()

2006-05-17 Thread Chris Hostetter
: I put Lock in IndexReader.indexExists function, and testes for a few days : It worked fine. I never had that mistery problem. : : How can put the patch in a JIRA issue? Please take a look at the recently added FAQ "How do I contribute an improvement?"... http://wiki.apache.org/jakarta-lucene/L

Re: FieldsReader synchronized access vs. ThreadLocal ?

2006-05-17 Thread Yonik Seeley
On 5/17/06, Robert Engels <[EMAIL PROTECTED]> wrote: Since reading a document is a relatively expensive operation Expensive relative to searching operations that cover 1 million documents (i.e. you don't want to call doc() a million times) Solr has a document cache, and I've found that it does

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Doug Cutting
Marvin Humphrey wrote: What I'd like to do is augment my existing patch by making it possible to specify a particular encoding, both for Lucene and Luke. What ensures that all documents in fact use the same encoding? The current approach of converting everything to Unicode and then writing U

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Marvin Humphrey
On May 16, 2006, at 11:58 PM, Paul Elschot wrote: Try and invoke luke with the a lucene jar of your choice on the classpath before luke itself: java -cp lucene-core-1.9-rc1-dev.jar:lukeall.jar org.getopt.luke.Luke I tried this on an index built with KinoSearch 0.05, which pre-dates the addi

RE: non indexed field searching?

2006-05-17 Thread Robert Engels
That is exactly what I was thinking. Without starting the "Lucene is not a database..." discussion... It is how a db works. Use whatever indexes you can to restrict the document set, then "scan" the documents looking for matches. If you stored EVERY field in the document (including indexed ones),

Re: non indexed field searching?

2006-05-17 Thread Yonik Seeley
On 5/17/06, Robert Engels <[EMAIL PROTECTED]> wrote: I reviewed the solr source (at LOT of the code is amazingly similar to our own search server). I don't see anything related to searching using non-indexed fields. Could you maybe point me at the class(es) that implement this functionality? So

RE: non indexed field searching?

2006-05-17 Thread Robert Engels
I reviewed the solr source (at LOT of the code is amazingly similar to our own search server). I don't see anything related to searching using non-indexed fields. Could you maybe point me at the class(es) that implement this functionality? -Original Message- From: Erik Hatcher [mailto:[EM

Re: weird behavior of IndexReader.indexExists()

2006-05-17 Thread wenjie zheng
I put Lock in IndexReader.indexExists function, and testes for a few days It worked fine. I never had that mistery problem. How can put the patch in a JIRA issue? Thanks, Wenjie On 5/9/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Somebody (ah, Andy) already mentioned that you may be getting

RE: FieldsReader synchronized access vs. ThreadLocal ?

2006-05-17 Thread Robert Engels
The test results seem hard to believe. Doubling the CPUs only increased through put by 20%??? Seems rather low for primarily a "read only" test. Peter did not seem to answer many of the follow-up questions (at least I could not find the answers) regarding whether or not the CPU usage was 100%. If

Re: java.lang.IndexOutOfBoundsException when querying Lucene

2006-05-17 Thread Alexandru Popescu
Hi Daniel! Thanks for the answer. I know what the query is and I definitely can provide it. Unfortunately, it is created using the JCR API and I am not sure how relevant it is. Here is an example: //element(*, cmed:nodetype)[EMAIL PROTECTED]:prop1 = 'some']/[EMAIL PROTECTED]:prop2 = 'else'] an