Re: Problem while indexing
Amit, I don't exactly know what your problem is, but I'm using a configuration not too different from yours with no problems - so at least you know it's possible. I have an index of about 125MB which I use on various machines, including an old Windows98/SE 400MHz notebook. I used the default MergeFactor (10, I think) and do a daily merge (the daily addition represents about 200 documents added to a total of over 58,000). Each document (XML format) has about 15 fields of various types. I'm using release 1.3 dev 1. At one point I too had a problem of too many open files - turned out that I wasn't closing the IndexReader. Fixed that, and the number of open files usually stays below 500 (without Lucene, there are typically about 300-400 open files just for the system). HTH, Terry - Original Message - From: Amit Kapur [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, April 03, 2003 12:13 AM Subject: Problem while indexing hi all I m facing problems like mentioned below while indexing, If anyone has any help to offer i would to obliged couldn't rename segments.new to segments F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open files) I am trying to index documents using Lucene generating about 30 MB of index (Optimized) which can be raised to about 100 MB or More ( but that would be on a high end server machine). Description of Current Case: #---Each Document has four fields (One Text field, and 3 other Keyword Fields). #---The analyzer is based on a StopFilter and a PorterStemFilter. #---I am using a Compaq PIII, 128 MB RAM, 650 MHz. #---mergeFactor is set to 25, and I am optimizing the index after adding about 20 Documents. #---Using Lucene Release 1.2 Problem Faced After adding about 4000 Documents generating an index of 30 MB, I initially got an error saying, couldn't rename segments.new to segments after which the IndexReader or the IndexWriter to the current index couldnot be opened. Then I changed a couple of settings, #---mergeFactor=20 and Optimize was called after ever 10 documents. #---Using Lucene Release 1.3 Problem Faced After adding about 1500 Documents generating an index of 10 MB, I initially got an error saying, F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open files) after which the IndexWriter to the current index couldnot be opened. Now my requirement needs to have a much much larger index (practically) and I am actually at the point where these errors are coming unpredictably. Please if anyone could guide me on this ASAP. Thanx in advance Regards Amit PS: I have already read articles in the mail archieve http://www.mail-archive.com/[EMAIL PROTECTED]/msg02815.html. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
Would there be any abnormal effects if after adding a document, you called optimize? I am still seeing a large growth from setting a field. When I set a field I: 1. Get the document 2. Remove the field. 3. Write the document to index 4. Get the document again. 5. Add the new field object. 6. Write the document to index. 7. Call optimize. From writing out my steps it looks like I should write a set method instead of treating set as removeField() and addField(), I thought combining these two would equal set which it does, but it seems horribly inefficient. But in any case would the above cause in the index to grow from say 10.5 megs to 31 megs? Is there any efficient way to implement a set, for example if there was a field value pair of book/hamlet, but now we wanted to set book = none? Please keep in mind there could be multiple field names with book. So it is not simply a matter of removing the field book and then readding it. Anyhow let me know your thoughts. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 11:35 AM To: Lucene Users List Subject: RE: Indexing Growth Funny how this is the outcome of 90% of the problems people have with software - their own mistakes :) Regarding reindexing - no need for any explicit calls. When you add a document to the index it is indexed right away. You will have to detect index change (methods for that are there) and re-open the IndexSearcher in order to see newly added/indexed documents. Otis --- Rob Outar [EMAIL PROTECTED] wrote: I found the freakin problem, I am going to kill my co-worker when he gets in. He was removing a field and adding the same field back for each document in the index in a piece of code I did not notice until now He is so dead. I commented out that piece of code, queried to my hearts content and the index has not changed. Heck the tool is like super fast now. One last concern is about the re-indexing thing, when does that occur? optimize()? I am curious what method would cause a reindex. I want to thank all of you for your help, it was truly appreciated! Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problem while indexing
Thanx Terry Well today again I have made a few changes in the archietecture of my component where am using Lucene, and changed the way I am using the IndexReader and as u said made sure that all readers are closed, mergefactor is back to default (10). The test run is on and its working pretty well for now, have managed to have about 1000 documents, index of about 10MB, n counting :) .. hope this time things are better, well thanx for your word, it made me feel that this has been rightly done before and can be done even now. I appreciate the way you replied. Thanx would get back to you later ... Cheers!! Amit - Original Message - From: Terry Steichen To: Lucene Users Group Sent: Thursday, April 03, 2003 7:38 PM Subject: Re: Problem while indexing Amit, I don't exactly know what your problem is, but I'm using a configuration not too different from yours with no problems - so at least you know it's possible. I have an index of about 125MB which I use on various machines, including an old Windows98/SE 400MHz notebook. I used the default MergeFactor (10, I think) and do a daily merge (the daily addition represents about 200 documents added to a total of over 58,000). Each document (XML format) has about 15 fields of various types. I'm using release 1.3 dev 1. At one point I too had a problem of too many open files - turned out that I wasn't closing the IndexReader. Fixed that, and the number of open files usually stays below 500 (without Lucene, there are typically about 300-400 open files just for the system). HTH, Terry - Original Message - From: Amit Kapur [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, April 03, 2003 12:13 AM Subject: Problem while indexing hi all I m facing problems like mentioned below while indexing, If anyone has any help to offer i would to obliged couldn't rename segments.new to segments F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open files) I am trying to index documents using Lucene generating about 30 MB of index (Optimized) which can be raised to about 100 MB or More ( but that would be on a high end server machine). Description of Current Case: #---Each Document has four fields (One Text field, and 3 other Keyword Fields). #---The analyzer is based on a StopFilter and a PorterStemFilter. #---I am using a Compaq PIII, 128 MB RAM, 650 MHz. #---mergeFactor is set to 25, and I am optimizing the index after adding about 20 Documents. #---Using Lucene Release 1.2 Problem Faced After adding about 4000 Documents generating an index of 30 MB, I initially got an error saying, couldn't rename segments.new to segments after which the IndexReader or the IndexWriter to the current index couldnot be opened. Then I changed a couple of settings, #---mergeFactor=20 and Optimize was called after ever 10 documents. #---Using Lucene Release 1.3 Problem Faced After adding about 1500 Documents generating an index of 10 MB, I initially got an error saying, F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open files) after which the IndexWriter to the current index couldnot be opened. Now my requirement needs to have a much much larger index (practically) and I am actually at the point where these errors are coming unpredictably. Please if anyone could guide me on this ASAP. Thanx in advance Regards Amit PS: I have already read articles in the mail archieve http://www.mail-archive.com/[EMAIL PROTECTED]/msg02815.html. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
I took out the optimize() after the write and the index is growing but at like a 1kb rate, but now there are tons of 1kb files. I assume at this optimize would fix this? What is a good rule of thumb for calling optimize()? Will Lucene ever invoke an optimize() on it's own? Thanks, Rob Outar OneSAF AI -- SAIC Software\Data Engineer 321-235-7660 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 10:53 AM To: Lucene Users List Subject: RE: Indexing Growth Would there be any abnormal effects if after adding a document, you called optimize? I am still seeing a large growth from setting a field. When I set a field I: 1. Get the document 2. Remove the field. 3. Write the document to index 4. Get the document again. 5. Add the new field object. 6. Write the document to index. 7. Call optimize. From writing out my steps it looks like I should write a set method instead of treating set as removeField() and addField(), I thought combining these two would equal set which it does, but it seems horribly inefficient. But in any case would the above cause in the index to grow from say 10.5 megs to 31 megs? Is there any efficient way to implement a set, for example if there was a field value pair of book/hamlet, but now we wanted to set book = none? Please keep in mind there could be multiple field names with book. So it is not simply a matter of removing the field book and then readding it. Anyhow let me know your thoughts. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 11:35 AM To: Lucene Users List Subject: RE: Indexing Growth Funny how this is the outcome of 90% of the problems people have with software - their own mistakes :) Regarding reindexing - no need for any explicit calls. When you add a document to the index it is indexed right away. You will have to detect index change (methods for that are there) and re-open the IndexSearcher in order to see newly added/indexed documents. Otis --- Rob Outar [EMAIL PROTECTED] wrote: I found the freakin problem, I am going to kill my co-worker when he gets in. He was removing a field and adding the same field back for each document in the index in a piece of code I did not notice until now He is so dead. I commented out that piece of code, queried to my hearts content and the index has not changed. Heck the tool is like super fast now. One last concern is about the re-indexing thing, when does that occur? optimize()? I am curious what method would cause a reindex. I want to thank all of you for your help, it was truly appreciated! Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: about increment update
Thank you Otis, Yes, reader should be closed. But it isn't the reason of this Exception. the errors happen before deleting file. Kerr. close() Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called. - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, April 03, 2003 12:14 PM Subject: Re: about increment update Maybe this is missing? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close() Otis --- kerr [EMAIL PROTECTED] wrote: Hello everyone, Here I try to increment update index file and follow the idea to delete modified file first and re-add it. Here is the source. But when I execute it, the index directory create a file(write.lock) when execute the line reader.delete(i);, and caught a class java.io.IOException with message: Index locked for write. After that, when I execute the line IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), false); caught a class java.io.IOException with message: Index locked for write if I delete the file(write.lock), the error will re-happen. anyone can help and thanks. Kerr. import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import java.io.File; import java.util.Date; public class UpdateIndexFiles { public static void main(String[] args) { try { Date start = new Date(); Directory directory = FSDirectory.getDirectory(index, false); IndexReader reader = IndexReader.open(directory); System.out.println(reader.isLocked(directory)); //reader.unlock(directory); IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), false); String base = ; if (args.length == 0){ base = D:\\Tomcat\\webapps\\ROOT\\test; } else { base = args[0]; } removeModifiedFiles(reader); updateIndexDocs(reader, writer, new File(base)); writer.optimize(); writer.close(); Date end = new Date(); System.out.print(end.getTime() - start.getTime()); System.out.println( total milliseconds); } catch (Exception e) { System.out.println( caught a + e.getClass() + \n with message: + e.getMessage()); e.printStackTrace(); } } public static void removeModifiedFiles(IndexReader reader) throws Exception { Document adoc; String path; File aFile; for (int i=0; ireader.numDocs(); i++){ adoc = reader.document(i); path = adoc.get(path); aFile = new File(path); if (reader.lastModified(path) aFile.lastModified()){ System.out.println(reader.isLocked(path)); reader.delete(i); } } } public static void updateIndexDocs(IndexReader reader, IndexWriter writer, File file) throws Exception { if (file.isDirectory()) { String[] files = file.list(); for (int i = 0; i files.length; i++) updateIndexDocs(reader, writer, new File(file, files[i])); } else { if (!reader.indexExists(file)){ System.out.println(adding + file); writer.addDocument(FileDocument.Document(file)); } else {} } } } __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JSP files
Additionally - you can use a crawler to crawl your site, then index the resulting files. Lucene comes with a crawler called LARM but the current make file doesnt build it properly. I ended using a different crawler called Spinx : http://www-2.cs.cmu.edu/~rcm/websphinx/ Pinky, You don't want to index the jsp directly, as you would be missing the content inserted by the server when the pages are accessed. Typically indexing dynamic pages is problematic since the content will change freqently... That being said, the java.io library provides classes for retrieving the content of a URL as an input stream. You can write a class to traverse your site downloading the URLS and indexing them. It will be slower of course than reading HTML from disk files. -Tom --- Pinky Iyer [EMAIL PROTECTED] wrote: Hi all! Is there any seperate parser for jsp files. Any other option other than modifying indexHTML.java class is appreciated. I already tried modifying this class, html parsing is fine, but jsp parsing yields all the jsp tags as well in the summary... Thanks! Pinky - Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Querying Question
Hi all, I am a little fuzzy on complex querying using AND, OR, etc.. For example: I have the following name/value pairs file 1 = name = checkpoint value = filename_1 file 2 = name = checkpoint value = filename_2 file 3 = name = checkpoint value = filename_3 file 4 = name = checkpoint value = filename_4 I ran the following Query: name:\checkpoint\ AND value:\filenane_1\ Instead of getting back file 1, I got back all four files? Then after trying different things I did: +(name:\checkpoint\) AND +(value:\filenane_1\) it then returned file 1. Our project queries solely on name value pairs and we need the ability to query using AND, OR, NOTS, etc.. What the correct syntax for such queries? The code I use is : QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query.toLowerCase()); Hits hits = this.searcher.search(this.query); Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Querying Question
This query.toLowerCase() lowercased your query to become: name:\checkpoint\ and value:\filenane_1\ The keyword AND must be uppercase when the query parser gets a hold of it. If your RepositoryIndexAnalyzer lowercases its tokens you don't need to do query.toLowerCase(). If it doesn't lowercase its tokens, you may want to modify it so that it does. Eric -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:11 PM To: Lucene Users List Subject: Querying Question Importance: High Hi all, I am a little fuzzy on complex querying using AND, OR, etc.. For example: I have the following name/value pairs file 1 = name = checkpoint value = filename_1 file 2 = name = checkpoint value = filename_2 file 3 = name = checkpoint value = filename_3 file 4 = name = checkpoint value = filename_4 I ran the following Query: name:\checkpoint\ AND value:\filenane_1\ Instead of getting back file 1, I got back all four files? Then after trying different things I did: +(name:\checkpoint\) AND +(value:\filenane_1\) it then returned file 1. Our project queries solely on name value pairs and we need the ability to query using AND, OR, NOTS, etc.. What the correct syntax for such queries? The code I use is : QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query.toLowerCase()); Hits hits = this.searcher.search(this.query); Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Querying Question
RepositoryIndexAnalyzer : /** * Creates a TokenStream which tokenizes all the text in the provided Reader. * Default implementation forwards to tokenStream(Reader) for compatibility * with older version. Override to allow Analyzer to choose strategy based * on document and/or field. * @param field is the name of the field * @param reader is the data * @return a token stream * @build 10 */ public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } but earlier when I did a query case became an issue I am not sure why as the analyzer should have lowercased the token but it did not. Thanks, Rob -Original Message- From: Eric Isakson [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:23 PM To: Lucene Users List Subject: RE: Querying Question This query.toLowerCase() lowercased your query to become: name:\checkpoint\ and value:\filenane_1\ The keyword AND must be uppercase when the query parser gets a hold of it. If your RepositoryIndexAnalyzer lowercases its tokens you don't need to do query.toLowerCase(). If it doesn't lowercase its tokens, you may want to modify it so that it does. Eric -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:11 PM To: Lucene Users List Subject: Querying Question Importance: High Hi all, I am a little fuzzy on complex querying using AND, OR, etc.. For example: I have the following name/value pairs file 1 = name = checkpoint value = filename_1 file 2 = name = checkpoint value = filename_2 file 3 = name = checkpoint value = filename_3 file 4 = name = checkpoint value = filename_4 I ran the following Query: name:\checkpoint\ AND value:\filenane_1\ Instead of getting back file 1, I got back all four files? Then after trying different things I did: +(name:\checkpoint\) AND +(value:\filenane_1\) it then returned file 1. Our project queries solely on name value pairs and we need the ability to query using AND, OR, NOTS, etc.. What the correct syntax for such queries? The code I use is : QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query.toLowerCase()); Hits hits = this.searcher.search(this.query); Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Querying Question
You should not tokenize the file name instead you should use doc.add(new Field(name, value, true, true, true)); Or Doc.add(Field.keyword(name,value)); Aviran -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:27 PM To: Lucene Users List Subject: RE: Querying Question Use the following type of Field: doc.add(new Field(name, value, true, true, true)); Thanks, Rob -Original Message- From: Aviran Mordo [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:19 PM To: 'Lucene Users List' Subject: RE: Querying Question Did you index the value field as a keyword? Aviran -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:11 PM To: Lucene Users List Subject: Querying Question Importance: High Hi all, I am a little fuzzy on complex querying using AND, OR, etc.. For example: I have the following name/value pairs file 1 = name = checkpoint value = filename_1 file 2 = name = checkpoint value = filename_2 file 3 = name = checkpoint value = filename_3 file 4 = name = checkpoint value = filename_4 I ran the following Query: name:\checkpoint\ AND value:\filenane_1\ Instead of getting back file 1, I got back all four files? Then after trying different things I did: +(name:\checkpoint\) AND +(value:\filenane_1\) it then returned file 1. Our project queries solely on name value pairs and we need the ability to query using AND, OR, NOTS, etc.. What the correct syntax for such queries? The code I use is : QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query.toLowerCase()); Hits hits = this.searcher.search(this.query); Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: about increment update
Try this: 1. Open reader. 2. removeModifiedFiles(reader) 3. reader.close() 4. Open writer. 5. updateIndexDocs() 6. writer.close(); i.e. don't have both reader and writer open at the same time. btw I suspect you might be removing index entries for files that have been modified, but adding all files. Another index keeps growing problem! Could be wrong. -- Ian. [EMAIL PROTECTED] (kerr) wrote Thank you Otis, Yes, reader should be closed. But it isn't the reason of this Exception. the errors happen before deleting file. Kerr. close() Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called. - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, April 03, 2003 12:14 PM Subject: Re: about increment update Maybe this is missing? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close() Otis --- kerr [EMAIL PROTECTED] wrote: Hello everyone, Here I try to increment update index file and follow the idea to delete modified file first and re-add it. Here is the source. But when I execute it, the index directory create a file(write.lock) when execute the line reader.delete(i);, and caught a class java.io.IOException with message: Index locked for write. After that, when I execute the line IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), false); caught a class java.io.IOException with message: Index locked for write if I delete the file(write.lock), the error will re-happen. anyone can help and thanks. Kerr. import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import java.io.File; import java.util.Date; public class UpdateIndexFiles { public static void main(String[] args) { try { Date start = new Date(); Directory directory = FSDirectory.getDirectory(index, false); IndexReader reader = IndexReader.open(directory); System.out.println(reader.isLocked(directory)); //reader.unlock(directory); IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), false); String base = ; if (args.length == 0){ base = D:\\Tomcat\\webapps\\ROOT\\test; } else { base = args[0]; } removeModifiedFiles(reader); updateIndexDocs(reader, writer, new File(base)); writer.optimize(); writer.close(); Date end = new Date(); System.out.print(end.getTime() - start.getTime()); System.out.println( total milliseconds); } catch (Exception e) { System.out.println( caught a + e.getClass() + \n with message: + e.getMessage()); e.printStackTrace(); } } public static void removeModifiedFiles(IndexReader reader) throws Exception { Document adoc; String path; File aFile; for (int i=0; ireader.numDocs(); i++){ adoc = reader.document(i); path = adoc.get(path); aFile = new File(path); if (reader.lastModified(path) aFile.lastModified()){ System.out.println(reader.isLocked(path)); reader.delete(i); } } } public static void updateIndexDocs(IndexReader reader, IndexWriter writer, File file) throws Exception { if (file.isDirectory()) { String[] files = file.list(); for (int i = 0; i files.length; i++) updateIndexDocs(reader, writer, new File(file, files[i])); } else { if (!reader.indexExists(file)){ System.out.println(adding + file); writer.addDocument(FileDocument.Document(file)); } else {} } } } -- Searchable personal storage and archiving from http://www.digimem.net/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]