Re: Concurrent read and write
Lucene's write lock is simple. There are four places that requires the write lock. Those are IndexReader.delete(int), IndexReader.setNorm(int,String,byte), IndexReader.undeleteAll(), and IndexWriter.init. Once you've used one of methods above, you should close that IndexReader or IndexWriter instance to release the write lock (and to avoid the Lock-obtain-timed-out exception). For example, the following sequence should be okay. IndexReader reader = IndexReader.open( DIR ); reader.delete( new Term( A, B ) ); reader.close(); IndexWriter writer = new IndexWriter( DIR, a, b ); writer.add( oneDocument ); writer.close(); But, a sequence following causes a Lock obtain timed out exception. IndexReader reader = IndexReader.open( DIR ); reader.delete( new Term( A, B ) ); IndexWriter writer = new IndexWriter( DIR, a, b ); Because, the write lock obtained at IndexReader.delete() wouldn't be removed by IndexReader.close() and new IndexWriter() sentence requires a write lock. On Fri, 21 Jan 2005 08:20:12 -0800 (PST), Otis Gospodnetic [EMAIL PROTECTED] wrote: Hello Ashley, You can read/search while modifying the index, but you have to ensure only one thread or only one process is modifying an index at any given time. Both IndexReader and IndexWriter can be used to modify an index. The former to delete Documents and the latter to add them. You have to ensure these two operations don't overlap. c.f. http://www.lucenebook.com/search?query=concurrent Otis --- Ashley Steigerwalt [EMAIL PROTECTED] wrote: I am a little fuzzy on the thread-safeness of Lucene, or maybe just java. From what I understand, and correct me if I'm wrong, Lucene takes care of concurrency issues and it is ok to run a query while writing to an index. My question is, does this still hold true if the reader and writer are being executed as separate programs? I have a cron job that will update the index periodically. I also have a search application on a web form. Is this going to cause trouble if someone runs a query while the indexer is updating? Ashley - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Cheolgoo, Kang - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: retrieving added document
On Tue, 23 Nov 2004 22:47:21 +0100, Paul [EMAIL PROTECTED] wrote: Hi, I'm creating a document and adding it with a writer to the index. For some reason I need to add data to this specific document later on (minutes, not hours or days). Is it possible to retrieve it and add additonal data? No, you cannot add additional data (or modify) to previously added document. It's easy to delete the old one from the index and add a new document with additional data included. I found the document(int n) - method within the IndexReader (btw: the description makes no sense for me: Returns the stored fields of the nth Document in this index. - but it returns a Document and not a list of fields..) but where do I get that number from? (and the numbers change, I know..) Usually you search using IndexSearcher and it's resulting Hits has the doc-id (the number) in that index. And the Document contains the list of (stored) fields. thanks for any help Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Cheolgoo, Kang - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: modifying existing index
On Wed, 24 Nov 2004 13:04:20 +0530, Santosh [EMAIL PROTECTED] wrote: I have gon through IndexReader , I got method : delete(int docNum) , but from where I will get document number? Is this predifined? or we have to give a number prior to indexing? The number(aka doc-id) is given by lucene and is it's an internal sequential integer. This number is usually retrieved from Hits.id(int) of your search. Hits myHits = myIndexSearcher.search( myQuery ); for ( int i=0; imyHits.length(); i++ ) { Document doc = myHits.doc( myHits.id( i ) ); // myHits.id( i ) retrieves the i-th doc-id and // myHits.doc( myHits.id( i ) ) returns the desired i-th document // in the result myHits. } HTH - Original Message - From: Luke Francl [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 1:26 AM Subject: Re: modifying existing index On Tue, 2004-11-23 at 13:59, Santosh wrote: I am using lucene for indexing, when I am creating Index the docuemnts are added. but when I want to modify the single existing document and reIndex again, it is taking as new document and adding one more time, so that I am getting same document twice in the results. To overcome this I am deleting existing Index and again recreating whole Index. but is it possibe to index the modified document again and overwrite existing document without deleting and recreation. can I do this? If so how? You do not need to recreate the whole index. Just mark the document as deleted using the IndexReader and then add it again with the IndexWriter. Remember to close your IndexReader and IndexWriter after doing this. The deleted document will be removed the next time you optimize your index. Luke Francl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Cheolgoo, Kang - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Encrypted indexes
I think it's possible to make a field encrypted by an symmetric encryption algorithms just the same as the compressed field and algorithms such like DES can be used with little performance loss. If the ability to block reverse engineering is critical, you should use PKI and would result more and more performance loss than those symmectic methods. On Wed, 13 Oct 2004 15:33:53 +0200, petite_abeille [EMAIL PROTECTED] wrote: On Oct 13, 2004, at 15:26, Nader Henein wrote: Well, are you storing any data for retrieval from the index, because you could encrypt the actual data and then encrypt the search string public key style. Alternatively, write your index to an encrypted volume... something along the line of FileVault and PGP Disk [1] [2]. PA. [1] http://www.apple.com/macosx/features/filevault/ [2] http://www.pgp.com/products/desktop/index.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Cheolgoo, Kang - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: View lucene index file
check out this page http://jakarta.apache.org/lucene/docs/contributions.html we got tools like LIMO and Luke. Cheo On Thu, 9 Sep 2004 23:38:17 -0400, Anne Y. Zhang [EMAIL PROTECTED] wrote: I am using Nutch. Is there any way I can view the lucene index file? It seems that lucene write index as binary file. Could anybody explain how lucene does the indexing and where the index file located? Thank you very much! Ya - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Cheolgoo, Kang - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problems with special characters
How about creating a special-char-converting-reader like this? public class LuceneReader extends Reader { private Reader source = null; private char buffer = (char) 0; public LuceneReader( Reader sourceReader ) { this.source = sourceReader; } public int read() { char result = (char) 0; if ( buffer != (char) 0 ) { result = buffer; buffer = (char) 0; return result; } result = (char) source.read(); if ( isSpecialCharacter( result ) ) { buffer = result; return '\\'; } return result; } private boolean isSpecialCharacter( char c ) { return ( c == '+' /* all special characters */ ); } } The LuceneReader.read() above checks for the char to be returned. if it's one of those special characters, it buffers the char and return '\'. I've just wrote it instantly and of course not a complete one but can be your starting point. Cheolgoo On Fri, 2 Jul 2004 12:44:48 +0200, Marten Senkel [EMAIL PROTECTED] wrote: I had a similar problem. I don't know whether there is a more intelligent solution, but the quickest I had in mind was to convert the special characters I needed to look up into a fixed random character string. For example: prior to indexing I replace all occurences of '+' by 'PLUSsdfaEGsgfAE'. When searching I intercept the terms the user entered, replace '+' by the same random character string and search for it instead of the original special character. This works, of course, only if one constructs the query by oneself giving the user only some basic checkbox options to specify 'AND' or 'OR' queries for example. If you use sth like this users wouldn't be able to write themselves 'advanced' searches like +foo +bar as the command sign '+' would be converted as well. A fix for that problem could be to convert 'C+' to a random string and replace only 'C+' by the random string when searching ... this would leave the command '+' intact. It's a very basic and quick dirty solution, I know, but it worked well for me. Marten - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]