Re: URGENT: Help indexing large document set
On Wednesday 24 November 2004 00:37, John Wang wrote: Hi: I am trying to index 1M documents, with batches of 500 documents. Each document has an unique text key, which is added as a Field.KeyWord(name,value). For each batch of 500, I need to make sure I am not adding a document with a key that is already in the current index. To do this, I am calling IndexSearcher.docFreq for each document and delete the document currently in the index with the same key: while (keyIter.hasNext()) { String objectID = (String) keyIter.next(); term = new Term(key, objectID); int count = localSearcher.docFreq(term); To speed this up a bit make sure that the iterator gives the terms in sorted order. I'd use an index reader instead of a searcher, but that will probably not make a difference. Adding the documents can be done with multiple threads. Last time I checked that, there was a moderate speed up using three threads instead of one on a single CPU machine. Tuning the values of minMergeDocs and maxMergeDocs may also help to increase performance of adding documents. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene Scorers
On Wednesday 24 November 2004 01:31, Ken McCracken wrote: Hi, Thanks the pointers in your replies. Would it be possible to include some sort of accrual scorer interface somewhere in the Lucene Query APIs? This could be passed into a query similar to MaxDisjunctionQuery; and combine the sum, max, tieBreaker, etc., according to the implementor's discretion, to compute the overall score for a document. The DisjunctionScorer is currently not part of Lucene. You might try and subclass Similarity to provide what you need and pass that to your Query. I'm using a few subclasses of DisjunctionScorer to provide the actual score value ao. for max and sum. For each of these scorers, I use a separate Query and Weight. This gives a parallel class hierarchy for Query, Weight and Scorer. I guess it's time to have a look at Design Patterns and/or Refactoring on how to get rid of the parallel class hierarchy. That could also involve some sort of accrual scorer and Lucene's Similarity. Regards, Paul Elschot -Ken On Sat, 13 Nov 2004 12:07:05 +0100, Paul Elschot [EMAIL PROTECTED] wrote: On Friday 12 November 2004 22:56, Chuck Williams wrote: I had a similar need and wrote MaxDisjunctionQuery and MaxDisjunctionScorer. Unfortunately these are not available as a patch but I've included the original message below that has the code (modulo line breaks added by simple text email format). This code is functional -- I use it in my app. It is optimized for its stated use, which involves a small number of clauses. You'd want to improve the incremental sorting (e.g., using the bucket technique of BooleanQuery) if you need it for large numbers of clauses. When you're interested, you can also have a look here for yet another DisjunctionScorer: http://issues.apache.org/bugzilla/show_bug.cgi?id=31785 It has the advantage that it implements skipTo() so that it can be used as a subscorer of ConjunctionScorer, ie. it can be faster in situations like this: aa AND (bb OR cc) where bb and cc are treated by the DisjunctionScorer. When aa is a filter this can also be used to implement a filtering query. Re. Paul's suggested steps below, I did not integrate this with query parser as I didn't need that functionality (since I'm generating the multi-field expansions for which max is a much better scoring choice than sum). Chuck Included message: -Original Message- From: Chuck Williams [mailto:[EMAIL PROTECTED] Sent: Monday, October 11, 2004 9:55 PM To: [EMAIL PROTECTED] Subject: Contribution: better multi-field searching The files included below (MaxDisjunctionQuery.java and MaxDisjunctionScorer.java) provide a new mechanism for searching across multiple fields. The maximum indeed works well, also when the fields differ a lot length. Regards, Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: fetching similar wordlist as given word
:can I get the similar wordlist as output. so that I can show the end :user in the column --- do you mean foam? :How can I get similar word list in the given content? This is a non trivial problem, because the definition of similar is subject to interpretation. I would look into various dictionary implimentations, and see if you can find a good Java based dictionary that can suggest alternatives based on an input string. Once you have that, then you should be able to use IndexSearcher.docFreq to find out how many docs contains each alternate word, and compare that with the number of docs that contain the initial word ... if one of the alternates has a significantly higher number of matches, then you suggest it. NOTE: The DICT protocol defines a client/server approach to providing spell correction and definitions. Maybe you can leverage some of the spell correction code mentioned in the Server Software Written in Java section of this doc... http://www.dict.org/links.html In particular, you might want to take a look at JavaDict's Database.match function using the LevenshteinStrategy... http://ktulu.com.ar/javadict/docs/ar/com/ktulu/dict/Database.html#match(java.lang.String,%20ar.com.ktulu.dict.strategies.Strategy) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help on the Query Parser
On Wednesday 24 November 2004 08:16, Morus Walter wrote: Lucene itself doesn't handle wildcards within phrases. This can be added using PhrasePrefixQuery (which is slightly misnamed): http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/PhrasePrefixQuery.html Regards Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Index in RAM - is it realy worthy?
Thanks everybody for responds. What else can essentially improve queries performance? (I do not speak now about such things as keeping index optimized etc. - it's clear) As I experiensed on my 2 cpu box, during the query execution both processors were realy busy. The question is would it accelerate speed if I get 4 cpu box, 10 cpu... I mean real performance boost (at least factor 10), not just %-ge. Whould it help if I play with different query formulation, i.e. a and (b or c) instead of (b or c) and a Regards, j. Kevin A. Burton [EMAIL PROTECTED] 22.11.2004 21:40 Please respond to Lucene Users List To: Lucene Users List [EMAIL PROTECTED] cc: (bcc: Iouli Golovatyi/X/GP/Novartis) Subject:Re: Index in RAM - is it realy worthy? Category: Otis Gospodnetic wrote: For the Lucene book I wrote some test cases that compare FSDirectory and RAMDirectory. What I found was that with certain settings FSDirectory was almost as fast as RAMDirectory. Personally, I would push FSDirectory and hope that the OS and the Filesystem do their share of work and caching for me before looking for ways to optimize my code. Also another note is that doing an index merge in memory is probably faster if you just use a RAMDirectory and perform addIndexes to it. This would almost certainly be faster than optimizing on disk but I haven't benchmarked it. Kevin -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat. Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html If you're interested in RSS, Weblogs, Social Networking, etc... then you should work for Rojo! If you recommend someone and we hire them you'll get a free iPod! Kevin A. Burton, Location - San Francisco, CA AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Re: Help on the Query Parser
Hi Daniel, I couldn't figure out how to use the PharsePrefixQuery with a phase like java* developer. It only provides method to add terms. Can a term contain wildcard character in lucene? Thanks, Terence On Wednesday 24 November 2004 08:16, Morus Walter wrote: Lucene itself doesn't handle wildcards within phrases. This can be added using PhrasePrefixQuery (which is slightly misnamed): http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/PhrasePrefixQuery.html Regards Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Re: Help on the Query Parser
Hi Morus, I want to search for the string like below: - java developer - javascript developer By searching java*, it will return more than I want. That's why I am thinking java* developer. Terence Terence Lai writes: Look likes that the wildcard query disappeared. In fact, I am expecting text:java* developer to be returned. It seems to me that the QueryParser cannot handle the wildcard within a quoted String. That's not just QueryParser. Lucene itself doesn't handle wildcards within phrases. You could have a query text:java* developer if '*' isn't removed by the analyzer. But it would only search for the token 'java*' not any expansion of that. I guess this is not, what you want. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: modifying existing index
I am able to delete now the Index using the following if(indexDir.exists()) { IndexReader reader = IndexReader.open( indexDir ); uidIter = reader.terms(new Term(id, )); while (uidIter.term() != null uidIter.term().field() == id) { reader.delete(uidIter.term()); uidIter.next(); } reader.close(); } where id is the keyword field. But here also all the documents are deleted. How can I modify my code and delete particular document with given id Iam creating the index in the following way Document doc = new Document(); doc.add(Field.Text(text,text)); doc.add(Field.Keyword(id,Long.toString(id))); doc.add(Field.Keyword(title,title)); doc.add(Field.Keyword(keywords,keywords)); doc.add(Field.Keyword(type,type)); writer.addDocument(doc); - Original Message - From: Chuck Williams [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 1:06 PM Subject: RE: modifying existing index A good way to do this is to add a keyword field with whatever unique id you have for the document. Then you can delete the term containing a unique id to delete the document from the index (look at IndexReader.delete(Term)). You can look at the demo class IndexHTML to see how it does incremental indexing for an example. Chuck -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 23, 2004 11:34 PM To: Lucene Users List Subject: Re: modifying existing index I have gon through IndexReader , I got method : delete(int docNum) , but from where I will get document number? Is this predifined? or we have to give a number prior to indexing? - Original Message - From: Luke Francl [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 1:26 AM Subject: Re: modifying existing indexOn Tue, 2004-11-23 at 13:59, Santosh wrote: I am using lucene for indexing, when I am creating Index the docuemnts are added. but when I want to modify the single existing document and reIndex again, it is taking as new document and adding one more time, so that I am getting same document twice in the results. To overcome this I am deleting existing Index and again recreating whole Index. but is it possibe to index the modified document again and overwrite existing document without deleting and recreation. can I do this? If so how? You do not need to recreate the whole index. Just mark the document as deleted using the IndexReader and then add it again with the IndexWriter. Remember to close your IndexReader and IndexWriter after doing this. The deleted document will be removed the next time you optimize your index. Luke Francl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: modifying existing index
I haven't tried it but believe this should work: IndexReader reader; void delete(long id) { reader.delete(new Term(id, Long.toString(id))); } This also has the benefit that it does binary search rather than sequential search. You will want to pad you id's with leading zeroes if you are going to do incremental indexing (both when storing them and when looking them up). Sorting is by lexicographic order, not numerical order, and incremental indexing is much faster if the id's are kept sorted (as is done in IndexHTML). Chuck -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 9:54 AM To: Lucene Users List Subject: Re: modifying existing index I am able to delete now the Index using the following if(indexDir.exists()) { IndexReader reader = IndexReader.open( indexDir ); uidIter = reader.terms(new Term(id, )); while (uidIter.term() != null uidIter.term().field() == id) { reader.delete(uidIter.term()); uidIter.next(); } reader.close(); } where id is the keyword field. But here also all the documents are deleted. How can I modify my code and delete particular document with given id Iam creating the index in the following way Document doc = new Document(); doc.add(Field.Text(text,text)); doc.add(Field.Keyword(id,Long.toString(id))); doc.add(Field.Keyword(title,title)); doc.add(Field.Keyword(keywords,keywords)); doc.add(Field.Keyword(type,type)); writer.addDocument(doc); - Original Message - From: Chuck Williams [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 1:06 PM Subject: RE: modifying existing index A good way to do this is to add a keyword field with whatever unique id you have for the document. Then you can delete the term containing a unique id to delete the document from the index (look at IndexReader.delete(Term)). You can look at the demo class IndexHTML to see how it does incremental indexing for an example. Chuck -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 23, 2004 11:34 PM To: Lucene Users List Subject: Re: modifying existing index I have gon through IndexReader , I got method : delete(int docNum) , but from where I will get document number? Is this predifined? or we have to give a number prior to indexing? - Original Message - From: Luke Francl [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 1:26 AM Subject: Re: modifying existing indexOn Tue, 2004-11-23 at 13:59, Santosh wrote: I am using lucene for indexing, when I am creating Index the docuemnts are added. but when I want to modify the single existing document and reIndex again, it is taking as new document and adding one more time, so that I am getting same document twice in the results. To overcome this I am deleting existing Index and again recreating whole Index. but is it possibe to index the modified document again and overwrite existing document without deleting and recreation. can I do this? If so how? You do not need to recreate the whole index. Just mark the document as deleted using the IndexReader and then add it again with the IndexWriter. Remember to close your IndexReader and IndexWriter after doing this. The deleted document will be removed the next time you optimize your index. Luke Francl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: URGENT: Help indexing large document set
Thanks Paul! Using your suggestion, I have changed the update check code to use only the indexReader: try { localReader = IndexReader.open(path); while (keyIter.hasNext()) { key = (String) keyIter.next(); term = new Term(key, key); TermDocs tDocs = localReader.termDocs(term); if (tDocs != null) { try { while (tDocs.next()) { localReader.delete(tDocs.doc()); } } finally { tDocs.close(); } } } } finally { if (localReader != null) { localReader.close(); } } Unfortunately it didn't seem to make any dramatic difference. I also see the CPU is only 30-50% busy, so I am guessing it's spending a lot of time in IO. Anyway of making the CPU work harder? Is batch size of 500 too small for 1 million documents? Currently I am seeing a linear speed degredation of 0.3 milliseconds per document. Thanks -John On Wed, 24 Nov 2004 09:05:39 +0100, Paul Elschot [EMAIL PROTECTED] wrote: On Wednesday 24 November 2004 00:37, John Wang wrote: Hi: I am trying to index 1M documents, with batches of 500 documents. Each document has an unique text key, which is added as a Field.KeyWord(name,value). For each batch of 500, I need to make sure I am not adding a document with a key that is already in the current index. To do this, I am calling IndexSearcher.docFreq for each document and delete the document currently in the index with the same key: while (keyIter.hasNext()) { String objectID = (String) keyIter.next(); term = new Term(key, objectID); int count = localSearcher.docFreq(term); To speed this up a bit make sure that the iterator gives the terms in sorted order. I'd use an index reader instead of a searcher, but that will probably not make a difference. Adding the documents can be done with multiple threads. Last time I checked that, there was a moderate speed up using three threads instead of one on a single CPU machine. Tuning the values of minMergeDocs and maxMergeDocs may also help to increase performance of adding documents. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Too many open files issue
I have also seen this problem. In the Lucene code, I don't see where the reader speicified when creating a field is closed. That holds on to the file. I am looking at DocumentWriter.invertDocument() Thanks -John On Mon, 22 Nov 2004 16:21:35 -0600, Chris Lamprecht [EMAIL PROTECTED] wrote: A useful resource for increasing the number of file handles on various operating systems is the Volano Report: http://www.volano.com/report/ I had requested help on an issue we have been facing with the Too many open files Exception garbling the search indexes and crashing the search on the web site. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: URGENT: Help indexing large document set
Does keyIter return the keys in sorted order? This should reduce seeks, especially if the keys are dense. Also, you should be able to localReader.delete(term) instead of iterating over the docs (of which I presume there is only one doc since keys are unique). This won't improve performance as IndexReader.delete(Term) does exactly what your code does, but it will be cleaner. A linear slowdown with number of docs doesn't make sense, so something else must be wrong. I'm not sure what the default buffer size is (it appears it used to be 128 but is dynamic now I think). You might find the slowdown stops after a certain point, especially if you increase your batch size. Chuck -Original Message- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 12:21 PM To: Lucene Users List Subject: Re: URGENT: Help indexing large document set Thanks Paul! Using your suggestion, I have changed the update check code to use only the indexReader: try { localReader = IndexReader.open(path); while (keyIter.hasNext()) { key = (String) keyIter.next(); term = new Term(key, key); TermDocs tDocs = localReader.termDocs(term); if (tDocs != null) { try { while (tDocs.next()) { localReader.delete(tDocs.doc()); } } finally { tDocs.close(); } } } } finally { if (localReader != null) { localReader.close(); } } Unfortunately it didn't seem to make any dramatic difference. I also see the CPU is only 30-50% busy, so I am guessing it's spending a lot of time in IO. Anyway of making the CPU work harder? Is batch size of 500 too small for 1 million documents? Currently I am seeing a linear speed degredation of 0.3 milliseconds per document. Thanks -John On Wed, 24 Nov 2004 09:05:39 +0100, Paul Elschot [EMAIL PROTECTED] wrote: On Wednesday 24 November 2004 00:37, John Wang wrote: Hi: I am trying to index 1M documents, with batches of 500 documents. Each document has an unique text key, which is added as a Field.KeyWord(name,value). For each batch of 500, I need to make sure I am not adding a document with a key that is already in the current index. To do this, I am calling IndexSearcher.docFreq for each document and delete the document currently in the index with the same key: while (keyIter.hasNext()) { String objectID = (String) keyIter.next(); term = new Term(key, objectID); int count = localSearcher.docFreq(term); To speed this up a bit make sure that the iterator gives the terms in sorted order. I'd use an index reader instead of a searcher, but that will probably not make a difference. Adding the documents can be done with multiple threads. Last time I checked that, there was a moderate speed up using three threads instead of one on a single CPU machine. Tuning the values of minMergeDocs and maxMergeDocs may also help to increase performance of adding documents. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Index in RAM - is it realy worthy?
When comparing RAMDirectory and FSDirectory it is important to mention what OS you are using. When using linux it will cache the most recent disk access in memory. Here is a good article that describes its strategy: http://forums.gentoo.org/viewtopic.php?t=175419 The 2% difference you are seeing is the memory copy. With other OSes you may see a speed up when using the RAMDirectory, because not all OSes contain a disk cache in memory and must access the disk to read the index. Another consideration is there is currently a 2GB limitation with the size of the RAMDirectory. Indexes over 2GB causes a overflow in the int used to create the buffer. [see int len = (int) is.length(); in RamDirectory] I ended up using RAM directory for a very different reason. The index is 1 to 2MB and is rebuilt every few hours. It takes 3 to 4 minutes to query the database and rebuild the index. But the search should be available 100% of the time. Since the index is so small I do the following: on server startup: - look for semaphore, if it is there delete the index - if there is no index, build it to FSdirectory - load the index from FSDirectory into RAMDirectory on reindex: - create semaphore - rebuild index to FSDirectory - delete semaphore - load index from FSDirecttory into RAMDirectory to search: - search the RAMDirectory RAMDirectory could be replaced by a regular FSDirectory, but it seemed silly to copy the index from disk to disk, when it ultimately needs to be in memory. FSDirectory could be replaced by a RAMDirectory, but this means that it would take the server 3 to 4 minutes longer to startup every time. By persisting the index, this time would only be necessary if indexing was interrupted. Jonathan On Mon, 22 Nov 2004 12:39:07 -0800, Kevin A. Burton [EMAIL PROTECTED] wrote: Otis Gospodnetic wrote: For the Lucene book I wrote some test cases that compare FSDirectory and RAMDirectory. What I found was that with certain settings FSDirectory was almost as fast as RAMDirectory. Personally, I would push FSDirectory and hope that the OS and the Filesystem do their share of work and caching for me before looking for ways to optimize my code. Yes... I performed the same benchmark and in my situation RAMDirectory for searches was about 2% slower. I'm willing to bet that it has to do with the fact that its a Hashtable and not a HashMap (which isn't synchronized). Also adding a constructor for the term size could make loading a RAMDirectory faster since you could prevent rehash. If you're on a modern machine your filesystme cache will end up buffering your disk anyway which I'm sure was happening in my situation. Kevin -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat. Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html If you're interested in RSS, Weblogs, Social Networking, etc... then you should work for Rojo! If you recommend someone and we hire them you'll get a free iPod! Kevin A. Burton, Location - San Francisco, CA AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using multiple analysers within a query
Hi again, Thanks for everyone who replied. The PerFieldAnalyzerWrapper was a good suggestion, and one I had overlooked, but for our particular requirements it wouldn't quite work so I went with overriding getFieldQuery(). You were right, Paul. In 1.4.2 a whole heap of QueryParser changes were made, mostly removing the analyzer parameter from methods. In the end I built my changes on top of the NewMultiFieldQueryParser which was shared here recently and works wonders -- thanks Bill Janssen and sergiu gordea. I added support for slops and boosts to build together with the multi-fields array, and then overrode getFieldQuery to check the queryText for a start char (= for example) and if found remove it and switch to a non-tokenising analyser. Then I found that because that analyser always returns a single token (TermQuery) it would send through spaces into the final query string, causing problems. So also in getFieldQuery I check if it needs breaking up and converting into a PhraseQuery. Seems to work, just needs thorough testing. If anyone would like a copy I could post it up here. Regards, --Leto (excuse the disclaimer...) We have the need for analysed and 'not analysed/not tokenised' clauses within one query. Imagine an unparsed query like: +title:Hello World +path:Resources\Live\1 In the above example we would want the first clause to use StandardAnalyser and the second to use an analyser which returns the term as a single token. So a parsed result might look like: +(title:hello title:world) +path:Resources\Live\1 Would anyone have any suggestions on how this could be done? I was thinking maybe the QueryParser would have to be changed/extended to accept a separator other than colon :, something like = for example to indicate this clause is not to be tokenised. Or perhaps this can all be done using a single analyser? CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using multiple analysers within a query
Actually, just realised a PhraseQuery is incorrect... I only want a single TermQuery but it just needs to be quoted, d'oh. -Original Message- Then I found that because that analyser always returns a single token (TermQuery) it would send through spaces into the final query string, causing problems. So also in getFieldQuery I check if it needs breaking up and converting into a PhraseQuery. CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]