Re: Atomicity in Lucene operations
Hello, Nader: I am very interested in how you implement the atomicity. Could you send me a copy of your code? Thanks in advance. Roy On Sat, 16 Oct 2004 01:20:09 +0400, Nader Henein [EMAIL PROTECTED] wrote: We use Lucene over 4 replicated indecies and we have to maintain atomicity on deletion and updates with multiple fallback points. I'll send you the right up, it's too big to CC the entire board. nader henein Christian Rodriguez wrote: Hello guys, I need additions and deletions of documents to the index to be ATOMIC (they either happen to completion or not at all). On top of this, I need updates (which I currently implement with a deletion of the document followed by an addition) to be ATOMIC and DURABLE (once I return from the update function its because the operation happened to completion and stays in the index). Notice that I dont really need all the ACID properties for all the operations. I have tried to solve the problem by using the Lucene + BDB package written by Andi Vajda and using transactions, but the BDB database gets corrupted if I insert random System.exit() to simulate a crash of the application before aborting or commiting transactions. So I have two questions: 1. Has anyone been able to use the Lucene + BDB WITH transactions and simulate random crashes at different points in the process of addding items and found it to be robust (specially, have you been able to always recover after a crash, with uncommited txns rolled back and commited ones present in the DB)? 2. Can anyone suggest other solutions (beside using BDB) that may work? For example: are any of these operations already atomic in Lucene (using an FSDirectory)? Thanks for any help you can give me! Xtian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Roy **May I open-source your mind?** - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Atomicity in Lucene operations
It's pretty integrated into our system at this point, I'm working on Packaging it and cleaning up my documentation and then I'll make it available, I can give you the documents and if you still want the code I'll slap together a ruff copy for you and ship it across. Nader Henein Roy Shan wrote: Hello, Nader: I am very interested in how you implement the atomicity. Could you send me a copy of your code? Thanks in advance. Roy On Sat, 16 Oct 2004 01:20:09 +0400, Nader Henein [EMAIL PROTECTED] wrote: We use Lucene over 4 replicated indecies and we have to maintain atomicity on deletion and updates with multiple fallback points. I'll send you the right up, it's too big to CC the entire board. nader henein Christian Rodriguez wrote: Hello guys, I need additions and deletions of documents to the index to be ATOMIC (they either happen to completion or not at all). On top of this, I need updates (which I currently implement with a deletion of the document followed by an addition) to be ATOMIC and DURABLE (once I return from the update function its because the operation happened to completion and stays in the index). Notice that I dont really need all the ACID properties for all the operations. I have tried to solve the problem by using the Lucene + BDB package written by Andi Vajda and using transactions, but the BDB database gets corrupted if I insert random System.exit() to simulate a crash of the application before aborting or commiting transactions. So I have two questions: 1. Has anyone been able to use the Lucene + BDB WITH transactions and simulate random crashes at different points in the process of addding items and found it to be robust (specially, have you been able to always recover after a crash, with uncommited txns rolled back and commited ones present in the DB)? 2. Can anyone suggest other solutions (beside using BDB) that may work? For example: are any of these operations already atomic in Lucene (using an FSDirectory)? Thanks for any help you can give me! Xtian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StopWord elimination pls. HELP
On Sunday 17 October 2004 05:23, Miro Max wrote: d.add(Field.Text(cont, cont)); writer.addDocument(d); to get results from a database into lucene index. but when i check println(d) i can see the german stopwords too. how can i eliminate this? Field.Text(field, cont) where cont is a String will also store the original text, additionally to indexing it. toString() will then show the stored text. In the index you won't have any stopwords. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
index, reindexing problem
Hello. I have a problem when reindexing some documents after an index has been created, i get an error, the error is the following. caught a class java.io.IOException with message: Lock obtain timed out: [EMAIL PROTECTED]:\DOCUME~1\..lucene-0b877c2d5472a608d6ec3ee6174018de-write .lock mailto:[EMAIL PROTECTED]:\DOCUME~1\..lucene-0b877c2d5472a608d6ec3ee6174018 de-write.lock This is how i do it. 1.st make the index (_indexDir is the location of the index) writer = new IndexWriter(_indexDir, new StandardAnalyzer(), true); . do the indexing here writer.optimize(); writer.close(); this works fine 2. this is where i get the error (reindex an existing document) writer = new IndexWriter(_indexDir, new StandardAnalyzer(), false); Directory directory; IndexReader reader; // if the file is in the index already, remove it directory = FSDirectory.getDirectory(_indexDir, false); reader = IndexReader.open(directory); try { Term term = new Term(deleteid, deleteID.toLowerCase()); if (reader.docFreq(term) = 1) { deletedItems = reader.delete(term);// - this is where the error occurs, i get the locking error } } catch (Exception e) { System.out.println( caught a + e.getClass() + \n with message: + e.getMessage());} finally { reader.close(); directory.close(); } continue with reindexing the new document .. I hope anyone can help me with this problem. Best regards, Mats Lindberg
RE: index, reindexing problem
I had this same problem a while back. It should be resolved if you move the writer = new IndexWriter(...) until after the reader.close(). I.e., complete all the deletions and close the reader before creating the writer. Chuck -Original Message- From: MATL (Mats Lindberg) [mailto:[EMAIL PROTECTED] Sent: Sunday, October 17, 2004 5:36 AM To: [EMAIL PROTECTED] Subject: index, reindexing problem Hello. I have a problem when reindexing some documents after an index has been created, i get an error, the error is the following. caught a class java.io.IOException with message: Lock obtain timed out: [EMAIL PROTECTED]:\DOCUME~1\..lucene-0b877c2d5472a608d6ec3ee6174018de-write .lock mailto:[EMAIL PROTECTED]:\DOCUME~1\..lucene-0b877c2d5472a608d6ec3ee6174018 de-write.lock This is how i do it. 1.st make the index (_indexDir is the location of the index) writer = new IndexWriter(_indexDir, new StandardAnalyzer(), true); . do the indexing here writer.optimize(); writer.close(); this works fine 2. this is where i get the error (reindex an existing document) writer = new IndexWriter(_indexDir, new StandardAnalyzer(), false); Directory directory; IndexReader reader; // if the file is in the index already, remove it directory = FSDirectory.getDirectory(_indexDir, false); reader = IndexReader.open(directory); try { Term term = new Term(deleteid, deleteID.toLowerCase()); if (reader.docFreq(term) = 1) { deletedItems = reader.delete(term);// - this is where the error occurs, i get the locking error } } catch (Exception e) { System.out.println( caught a + e.getClass() + \n with message: + e.getMessage());} finally { reader.close(); directory.close(); } continue with reindexing the new document .. I hope anyone can help me with this problem. Best regards, Mats Lindberg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Google Desktop Could be Better
Bill Tschumy writes: I've looked at pdfBox, but the jar file is so big that I hate to burden my users by incorporating it. Bill, My system (see http://www.parc.com/janssen/pubs/TR-03-16.pdf) uses pdftotext underneath. I've been very satisfied with that. Another Java solution would be to use Multivalent (multivalent.sourceforge.net). Multivalent, by the way, advertises the following: Extract text from all formats. Full-text search with Lucene. Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
simultanous search and indexing
hi, i'm using servlet to search my index and i wish to be able to create an index at the same time. do i have to use threads - i'm beginner thx ___ Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: simultanous search and indexing
you can do both at the same time, it's thread safe, you will face different issues depending on the frequency or your indexing and the load on the search, but that shouldn't come into play till your index gets nice and heavy. So basically code on. Nader Henein Miro Max wrote: hi, i'm using servlet to search my index and i wish to be able to create an index at the same time. do i have to use threads - i'm beginner thx ___ Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]