Re: OutOfMemoryError with Lucene 1.4 final
You probably need to increase the amount of RAM available to your JVM. See the parameters: -Xmx :Maximum memory usable by the JVM -Xms :Initial memory allocated to JVM My params are; -Xmx2048m -Xms128m (2G max, 128M initial) On Fri, 10 Dec 2004 11:17:29 -0600, Sildy Augustine [EMAIL PROTECTED] wrote: I think you should close your files in a finally clause in case of exceptions with file system and also print out the exception. You could be running out of file handles. -Original Message- From: Jin, Ying [mailto:[EMAIL PROTECTED] Sent: Friday, December 10, 2004 11:15 AM To: [EMAIL PROTECTED] Subject: OutOfMemoryError with Lucene 1.4 final Hi, Everyone, We're trying to index ~1500 archives but get OutOfMemoryError about halfway through the index process. I've tried to run program under two different Redhat Linux servers: One with 256M memory and 365M swap space. The other one with 512M memory and 1G swap space. However, both got OutOfMemoryError at the same place (at record 898). Here is my code for indexing: === Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getPath())); doc.add(Field.Keyword(modified, DateField.timeToString(f.lastModified(; doc.add(Field.UnIndexed(eprintid, id)); doc.add(Field.Text(metadata, metadata)); FileInputStream is = new FileInputStream(f); // the text file BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuffer stringBuffer = new StringBuffer(); String line = ; try{ while((line = reader.readLine()) != null){ stringBuffer.append(line); } doc.add(Field.Text(contents, stringBuffer.toString())); // release the resources is.close(); reader.close(); }catch(java.io.IOException e){} = Is there anything wrong with my code or I need more memory? Thanks for any help! Ying - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryError with Lucene 1.4 final
I am not sure. But guess there are three possilities, (1). see that you use Field.Text(contents, stringBuffer.toString()) This will store all your string of text into document object. And it might be long ... I do not know the detail how Lucene implemented. I think you can try use unstored first to see if the same problem happen. BTW, how large is your document. Mine has 1M docs and max-length less than 1 M, usually has length about several k. (2) I guess another possiblilty is that record 898 is a very long document, maybe java' s string object has a maxlength? Just trace the code, see when the exception occur. (3) Moreover, if you run it on a java VM, it also has a setting of its virtual mem. It has nothing to do with the hardware you are running. I has met this before when I use the directory's ListOfFile function, where it easily exceed the max mem, if there are 1M docs under the same dir (a stupid mistake I made). But if I expand the VM's mem, it is then appears ok. :) On Fri, 10 Dec 2004, Jin, Ying wrote: Hi, Everyone, We're trying to index ~1500 archives but get OutOfMemoryError about halfway through the index process. I've tried to run program under two different Redhat Linux servers: One with 256M memory and 365M swap space. The other one with 512M memory and 1G swap space. However, both got OutOfMemoryError at the same place (at record 898). Here is my code for indexing: === Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getPath())); doc.add(Field.Keyword(modified, DateField.timeToString(f.lastModified(; doc.add(Field.UnIndexed(eprintid, id)); doc.add(Field.Text(metadata, metadata)); FileInputStream is = new FileInputStream(f); // the text file BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuffer stringBuffer = new StringBuffer(); String line = ; try{ while((line = reader.readLine()) != null){ stringBuffer.append(line); } doc.add(Field.Text(contents, stringBuffer.toString())); // release the resources is.close(); reader.close(); }catch(java.io.IOException e){} = Is there anything wrong with my code or I need more memory? Thanks for any help! Ying - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryError with Lucene 1.4 final
Great!!! It works perfect after I setup -Xms and -Xmx JVM command-line parameters with: java -Xms128m -Xmx128m It turns out that my JVM is running out of memory. And Otis is right on my reader closing too. reader.close() will close the reader and release any system resources associated with it. I really appreciate everyone's help! Ying
RE: OutOfMemoryError with Lucene 1.4 final
I think you should close your files in a finally clause in case of exceptions with file system and also print out the exception. You could be running out of file handles. -Original Message- From: Jin, Ying [mailto:[EMAIL PROTECTED] Sent: Friday, December 10, 2004 11:15 AM To: [EMAIL PROTECTED] Subject: OutOfMemoryError with Lucene 1.4 final Hi, Everyone, We're trying to index ~1500 archives but get OutOfMemoryError about halfway through the index process. I've tried to run program under two different Redhat Linux servers: One with 256M memory and 365M swap space. The other one with 512M memory and 1G swap space. However, both got OutOfMemoryError at the same place (at record 898). Here is my code for indexing: === Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getPath())); doc.add(Field.Keyword(modified, DateField.timeToString(f.lastModified(; doc.add(Field.UnIndexed(eprintid, id)); doc.add(Field.Text(metadata, metadata)); FileInputStream is = new FileInputStream(f); // the text file BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuffer stringBuffer = new StringBuffer(); String line = ; try{ while((line = reader.readLine()) != null){ stringBuffer.append(line); } doc.add(Field.Text(contents, stringBuffer.toString())); // release the resources is.close(); reader.close(); }catch(java.io.IOException e){} = Is there anything wrong with my code or I need more memory? Thanks for any help! Ying - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
OutOfMemoryError with Lucene 1.4 final
Hi, Everyone, We're trying to index ~1500 archives but get OutOfMemoryError about halfway through the index process. I've tried to run program under two different Redhat Linux servers: One with 256M memory and 365M swap space. The other one with 512M memory and 1G swap space. However, both got OutOfMemoryError at the same place (at record 898). Here is my code for indexing: === Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getPath())); doc.add(Field.Keyword(modified, DateField.timeToString(f.lastModified(; doc.add(Field.UnIndexed(eprintid, id)); doc.add(Field.Text(metadata, metadata)); FileInputStream is = new FileInputStream(f); // the text file BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuffer stringBuffer = new StringBuffer(); String line = ; try{ while((line = reader.readLine()) != null){ stringBuffer.append(line); } doc.add(Field.Text(contents, stringBuffer.toString())); // release the resources is.close(); reader.close(); }catch(java.io.IOException e){} = Is there anything wrong with my code or I need more memory? Thanks for any help! Ying
RE: OutOfMemoryError with Lucene 1.4 final
Ying, You should follow this finally block advice below. In addition, I think you can just close the reader, and it will close the underlying stream (I'm not sure about that, double-check it). You are not running out of file handles, though. Your JVM is running out of memory. You can play with: 1) -Xms and -Xmx JVM command-line parameters 2) IndexWriter's parameters: mergeFactor and minMergeDocs - check the Javadocs for more info. They will let you control how much memory your indexing process uses. Otis --- Sildy Augustine [EMAIL PROTECTED] wrote: I think you should close your files in a finally clause in case of exceptions with file system and also print out the exception. You could be running out of file handles. -Original Message- From: Jin, Ying [mailto:[EMAIL PROTECTED] Sent: Friday, December 10, 2004 11:15 AM To: [EMAIL PROTECTED] Subject: OutOfMemoryError with Lucene 1.4 final Hi, Everyone, We're trying to index ~1500 archives but get OutOfMemoryError about halfway through the index process. I've tried to run program under two different Redhat Linux servers: One with 256M memory and 365M swap space. The other one with 512M memory and 1G swap space. However, both got OutOfMemoryError at the same place (at record 898). Here is my code for indexing: === Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getPath())); doc.add(Field.Keyword(modified, DateField.timeToString(f.lastModified(; doc.add(Field.UnIndexed(eprintid, id)); doc.add(Field.Text(metadata, metadata)); FileInputStream is = new FileInputStream(f); // the text file BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuffer stringBuffer = new StringBuffer(); String line = ; try{ while((line = reader.readLine()) != null){ stringBuffer.append(line); } doc.add(Field.Text(contents, stringBuffer.toString())); // release the resources is.close(); reader.close(); }catch(java.io.IOException e){} = Is there anything wrong with my code or I need more memory? Thanks for any help! Ying - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryError with Lucene 1.4 final
Ok, I see. Seems most ppl think is the third possiblity On Fri, 10 Dec 2004, Xiangyu Jin wrote: I am not sure. But guess there are three possilities, (1). see that you use Field.Text(contents, stringBuffer.toString()) This will store all your string of text into document object. And it might be long ... I do not know the detail how Lucene implemented. I think you can try use unstored first to see if the same problem happen. BTW, how large is your document. Mine has 1M docs and max-length less than 1 M, usually has length about several k. (2) I guess another possiblilty is that record 898 is a very long document, maybe java' s string object has a maxlength? Just trace the code, see when the exception occur. (3) Moreover, if you run it on a java VM, it also has a setting of its virtual mem. It has nothing to do with the hardware you are running. I has met this before when I use the directory's ListOfFile function, where it easily exceed the max mem, if there are 1M docs under the same dir (a stupid mistake I made). But if I expand the VM's mem, it is then appears ok. :) On Fri, 10 Dec 2004, Jin, Ying wrote: Hi, Everyone, We're trying to index ~1500 archives but get OutOfMemoryError about halfway through the index process. I've tried to run program under two different Redhat Linux servers: One with 256M memory and 365M swap space. The other one with 512M memory and 1G swap space. However, both got OutOfMemoryError at the same place (at record 898). Here is my code for indexing: === Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getPath())); doc.add(Field.Keyword(modified, DateField.timeToString(f.lastModified(; doc.add(Field.UnIndexed(eprintid, id)); doc.add(Field.Text(metadata, metadata)); FileInputStream is = new FileInputStream(f); // the text file BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuffer stringBuffer = new StringBuffer(); String line = ; try{ while((line = reader.readLine()) != null){ stringBuffer.append(line); } doc.add(Field.Text(contents, stringBuffer.toString())); // release the resources is.close(); reader.close(); }catch(java.io.IOException e){} = Is there anything wrong with my code or I need more memory? Thanks for any help! Ying - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Re: Re: OutOfMemoryError
Terence, Calling close() on IndexSearcher will not release the memory immediately. It will only release resources (e.g. other Java objects used by IndexSearcher), and it is up to the JVM's garbage collector to actually reclaim/release the previously used memory. There are command-line parameters you can use to tune garbage collection. Here is one example: java -XX:+UseParallelGC -XX:PermSize=20M -XX:MaxNewSize=32M -XX:NewSize=32M . This works with Sun's JVM. The above is just an example - you need to play with the options and see what works for you. There are other options, too: -Xnoclassgc disable class garbage collection -Xincgc enable incremental garbage collection -Xloggc:filelog GC status to a file with time stamps -Xbatch disable background compilation -Xmssizeset initial Java heap size -Xmxsizeset maximum Java heap size -Xsssizeset java thread stack size -Xprofoutput cpu profiling data -Xrunhprof[:help]|[:option=value, ...] perform JVMPI heap, cpu, or monitor profiling Otis --- Terence Lai [EMAIL PROTECTED] wrote: Hi David, In my test program, I invoke the IndexSearcher.close() method at the end of the loop. However, it doesn't seems to release the memory. My concern is that even though I put the IndexSearcher.close() statement in the hook methods, it may not release all the memory until the application server is shut down. Every time the EJB object is re-actived, a new IndexSearcher is open. If the resources allocated to the previous IndexSearcher cannot be fully released, the system will use up more memory. Eventually, it may run into the OutOfMemoryError. I am not very familiar with EJB. My interpretation could be wrong. I am going to try the hook methods. Thanks for pointing this out to me. Terence I tried to reuse the IndexSearcher, but I have another question. What happen if an application server unloads the class after it is idle for a while, and then re-instantiate the object back when it recieves a new request? The EJB spec takes this into account, as there are hook methods you can define that get called when your EJB object is about to be passivated or activated. Search for something like passivate/active and/or ejbLoad/ejbSave. This is where you should close/open your single index searcher object. -- Cheers, David This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Re: OutOfMemoryError
Use the life-cycle hooks mentioned in another email (activate/passivate) and when you detect that the server is about to unload your class, call close() on IndexSearcher. I haven't used Lucene in an EJB environment, so I don't know the details, unfortunately. :( Your simulation may be too fast for the JVM. Like I mentioned in the previous email, close() doesn't release the memory, it's the JVM that has to reclaim it. Your for loop is very fast (no pauses anywhere, probably), so maybe the garbage collector doesn't have time to reclaim the needed memory. I don't know enough about the low-level JVM stuff to be certain about this statement, but you could try adding some Thread.sleep calls in your test code. Otis --- Terence Lai [EMAIL PROTECTED] wrote: Hi, I tried to reuse the IndexSearcher, but I have another question. What happen if an application server unloads the class after it is idle for a while, and then re-instantiate the object back when it recieves a new request? Everytime the server re-instantiates the class, a new IndexSearcher instance will be created. If the IndexSearcher.close() method does not release all the memory and the server keeps unloading and re-instantiating the class, it will eventually hit the OutOfMemoryError issue. The test program from my previous email is simulating this condition. The reason why I instantiate/close the IndexSearcher inside the loop is to simulate the scenario when the server unloads and re-instantiates the object. I think that the same issue will happen if the application is written in servlet. Although the singleton pattern may resolve the problem that I described above; however, it isn't permitted by the J2EE spec according to some news letters. In order words, I can't use singleton pattern in EJB. Please correct me if I am wrong on this. Thanks, Terence Reuse your IndexSearcher! :) Also, I think somebody has written some EJB stuff to work with Lucene. The project is on SF.net. Otis --- Terence Lai [EMAIL PROTECTED] wrote: Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop
RE: Re: OutOfMemoryError
Terence, 2) I have a background process to update the index files. If I keep the IndexSearcher opened, I am not sure whether it will pick up the changes from the index updates done in the background process. This is a frequently asked question. Basically, you have to make use of IndexReader's method for checking the index version. You can do it as often as you want, it's really up to you, and when you detect that the index has been modified, throw away the old IndexSearcher and make a new one. If you are sure nobody is using your old IndexSearcher, you can close() it, but if somebody (e.g. another thread) is still using it and you close() it, you will get an error. Otis Reuse your IndexSearcher! :) Also, I think somebody has written some EJB stuff to work with Lucene. The project is on SF.net. Otis --- Terence Lai [EMAIL PROTECTED] wrote: Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: OutOfMemoryError
Terence, This may help: http://issues.apache.org/bugzilla/show_bug.cgi?id=30628 I had the problem, above...but I managed to resolve it be not closing the indexsearcher. Instead I now reuse the same indexsearcher all of the time within my JSP code as an application variable. GC keeps memory in check on my System now and search is faster too. Also, make sure that you are using 1.4.1 as it fixes a sort caching problem in 1.4 if ((application.getAttribute(searcher)) != null){ searcher = (IndexSearcher)application.getAttribute(searcher); } else { searcher = new IndexSearcher(IndexReader.open(indexName)); application.setAttribute(searcher, searcher); } On Tue, 2004-08-17 at 23:39, Terence Lai wrote: Sorry. I should make it more clear in my last email. I have implemented an EJB Session Bean executing the Lucene search. At the beginning, the session been is working fine. It returns the correct search results to me. As more and more search requests being processed, the server ends up having the OutOfMemoryError. If I restart the server, every thing works fine again. Terence Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- John Moylan RT ePublishing, Montrose House, Donnybrook, Dublin 4 T: +353 1 2083564 E: [EMAIL PROTECTED] ** The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please note that emails
Re: OutOfMemoryError
Reuse your IndexSearcher! :) Also, I think somebody has written some EJB stuff to work with Lucene. The project is on SF.net. Otis --- Terence Lai [EMAIL PROTECTED] wrote: Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Re: OutOfMemoryError
Hi Otis, The reason why I ran into this problem is that I partition my search documents into multiple index directories ordered by document modified date. My application only returns the lastest 500 documents that matches the criteria. By partitioning the documents into different directories, we have a huge performance gain. Considering I have the following partitions, - partition 1 (earliest documents are in this partition) - partition 2 - partition 3 - partition 4 (latest documents are in this partition) If I only need the lastest 500 documents, I will start searching from partition 4. If I got 500 documents matched, I don't need to search for the remaining partitions. Otherwise, I will perform another search on partition 3 and so forth util I get 500 documents or I go through all the partitions. I can also make use of the MultiSearcher and ParallelMultiSearcher in my search. Now, the problems that I am having if I keep the IndexSearcher opened are the followings: 1) As the number of documents increases, my number of my index partition directories will also increase since I set a upper limit of the number of documents in each partition. If it reaches the limit, I will create a new partitions. As the number of IndexSearcher increases, it will eventually runs out of memory if I cannot close the IndexSearcher and release the memory. 2) I have a background process to update the index files. If I keep the IndexSearcher opened, I am not sure whether it will pick up the changes from the index updates done in the background process. Any idea how I can work around this problem? Thanks, Terence Reuse your IndexSearcher! :) Also, I think somebody has written some EJB stuff to work with Lucene. The project is on SF.net. Otis --- Terence Lai [EMAIL PROTECTED] wrote: Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED
RE: Re: OutOfMemoryError
Hi, I tried to reuse the IndexSearcher, but I have another question. What happen if an application server unloads the class after it is idle for a while, and then re-instantiate the object back when it recieves a new request? Everytime the server re-instantiates the class, a new IndexSearcher instance will be created. If the IndexSearcher.close() method does not release all the memory and the server keeps unloading and re-instantiating the class, it will eventually hit the OutOfMemoryError issue. The test program from my previous email is simulating this condition. The reason why I instantiate/close the IndexSearcher inside the loop is to simulate the scenario when the server unloads and re-instantiates the object. I think that the same issue will happen if the application is written in servlet. Although the singleton pattern may resolve the problem that I described above; however, it isn't permitted by the J2EE spec according to some news letters. In order words, I can't use singleton pattern in EJB. Please correct me if I am wrong on this. Thanks, Terence Reuse your IndexSearcher! :) Also, I think somebody has written some EJB stuff to work with Lucene. The project is on SF.net. Otis --- Terence Lai [EMAIL PROTECTED] wrote: Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re: OutOfMemoryError
I tried to reuse the IndexSearcher, but I have another question. What happen if an application server unloads the class after it is idle for a while, and then re-instantiate the object back when it recieves a new request? The EJB spec takes this into account, as there are hook methods you can define that get called when your EJB object is about to be passivated or activated. Search for something like passivate/active and/or ejbLoad/ejbSave. This is where you should close/open your single index searcher object. -- Cheers, David This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Re: Re: OutOfMemoryError
Hi David, In my test program, I invoke the IndexSearcher.close() method at the end of the loop. However, it doesn't seems to release the memory. My concern is that even though I put the IndexSearcher.close() statement in the hook methods, it may not release all the memory until the application server is shut down. Every time the EJB object is re-actived, a new IndexSearcher is open. If the resources allocated to the previous IndexSearcher cannot be fully released, the system will use up more memory. Eventually, it may run into the OutOfMemoryError. I am not very familiar with EJB. My interpretation could be wrong. I am going to try the hook methods. Thanks for pointing this out to me. Terence I tried to reuse the IndexSearcher, but I have another question. What happen if an application server unloads the class after it is idle for a while, and then re-instantiate the object back when it recieves a new request? The EJB spec takes this into account, as there are hook methods you can define that get called when your EJB object is about to be passivated or activated. Search for something like passivate/active and/or ejbLoad/ejbSave. This is where you should close/open your single index searcher object. -- Cheers, David This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
OutOfMemoryError
Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: OutOfMemoryError
Sorry. I should make it more clear in my last email. I have implemented an EJB Session Bean executing the Lucene search. At the beginning, the session been is working fine. It returns the correct search results to me. As more and more search requests being processed, the server ends up having the OutOfMemoryError. If I restart the server, every thing works fine again. Terence Hi All, I am getting a OutOfMemoryError when I deploy my EJB application. To debug the problem, I wrote the following test program: public static void main(String[] args) { try { Query query = getQuery(); for (int i=0; i1000; i++) { search(query); if ( i%50 == 0 ) { System.out.println(Sleep...); Thread.currentThread().sleep(5000); System.out.println(Wake up!); } } } catch (Exception e) { e.printStackTrace(); } } private static void search(Query query) throws IOException { FSDirectory fsDir = null; IndexSearcher is = null; Hits hits = null; try { fsDir = FSDirectory.getDirectory(C:\\index, false); is = new IndexSearcher(fsDir); SortField sortField = new SortField(profile_modify_date, SortField.STRING, true); hits = is.search(query, new Sort(sortField)); } finally { if (is != null) { try { is.close(); } catch (Exception ex) { } } if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } } } In the test program, I wrote a loop to keep calling the search method. Everytime it enters the search method, I would instantiate the IndexSearcher. Before I exit the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 seconds in every 50 searches. Hopefully, this will give some time for the java to do the Garbage Collection. Unfortunately, when I observe the memory usage of my process, it keeps increasing until I got the java.lang.OutOfMemoryError. Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I don't have this problem, and the memory usage keeps at a very static level. Does anyone experience a similar problem? Did I do something wrong in the test program. I throught by closing the IndexSearcher and the FSDirectory, the memory will be able to release during the Garbage Collection. Thanks, Terence -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryError
On Wednesday 18 August 2004 00:30, Terence Lai wrote: if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } You close is here again, not fsDir. Also, it's a good idea to never ignore exceptions, you should at least print them out, even if it's just a close() that fails. Regards Daniel -- http://www.danielnaber.de
RE: Re: OutOfMemoryError
Thanks for pointing this out. Even I fixed the code to close the fsDir and also add the ex.printStackTrace(System.out), I am still hitting the OutOfMemeoryError. Terence On Wednesday 18 August 2004 00:30, Terence Lai wrote: if (fsDir != null) { try { is.close(); } catch (Exception ex) { } } You close is here again, not fsDir. Also, it's a good idea to never ignore exceptions, you should at least print them out, even if it's just a close() that fails. Regards Daniel -- http://www.danielnaber.de -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)
petite_abeille wrote: On Apr 13, 2004, at 02:45, Kevin A. Burton wrote: He mentioned that I might be able to squeeze 5-10% out of index merges this way. Talking of which... what strategy(ies) do people use to minimize downtime when updating an index? This should probably be a wiki page. Anyway... two thoughts I had on the subject a while back: You maintain two disk (not RAID ... you get reliability through software). Searches are load balanced between disks for performance reasons. If one fails you just stop using it. When you want to do an index merge you read from disk0 and write to disk1. Then you take disk0 out of search rotation and add disk1 and copy the contents of disk1 to disk two. Users shouldn't notice much of a performance issue during the merge because it will be VERY fast and it's just reads from disk0. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)
On Apr 13, 2004, at 02:45, Kevin A. Burton wrote: He mentioned that I might be able to squeeze 5-10% out of index merges this way. Talking of which... what strategy(ies) do people use to minimize downtime when updating an index? My current strategy is as follow: (1) use a temporary RAMDirectory for ongoing updates. (2) perform a copy on write when flushing the RAMDirectory into the persistent index. The second step means that I create an offline copy of a live index before invoking addIndexes() and then substitute the old index with the new, updated, one. While this effectively increase the time it takes to update an index, it nonetheless reduce the *perceived* downtime for it. Thoughts? Alternative strategies? TIA. Cheers, PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)
I'm actually pretty lazy about index updates, and haven't had the need for efficiency, since my requirement is that new documents should be available on a next working day basis. I reindex everything from scatch every night (400,000 docs) and store it in an timestamped index. When the reindexing is done, I alert a controller of the new active index. I keep a few versions of the index in case of a failure somewhere and I can always send a message to the controller to use an old index. cheers, sv On Tue, 13 Apr 2004, petite_abeille wrote: On Apr 13, 2004, at 02:45, Kevin A. Burton wrote: He mentioned that I might be able to squeeze 5-10% out of index merges this way. Talking of which... what strategy(ies) do people use to minimize downtime when updating an index? My current strategy is as follow: (1) use a temporary RAMDirectory for ongoing updates. (2) perform a copy on write when flushing the RAMDirectory into the persistent index. The second step means that I create an offline copy of a live index before invoking addIndexes() and then substitute the old index with the new, updated, one. While this effectively increase the time it takes to update an index, it nonetheless reduce the *perceived* downtime for it. Thoughts? Alternative strategies? TIA. Cheers, PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
OutOfMemoryError when using wildcard queries
Hi, Am using Lucene 1.2 and getting OutOfMemoryError when searching using some wildcard queries. Is there some provision that restricts the number of terms for wildcard queries? Thanks, Akila - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: OutOfMemoryError when using wildcard queries
No, but the JVM does have a memory limit. By default it's 64 megs, I believe. To increase it, use the -Xmx option when you run java. For example, to give the JVM 100 megs of ram, you would write: java -Xmx100m YourClassHere -Original Message- From: Âkila [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2003 9:15 AM To: Lucene Users List Subject: OutOfMemoryError when using wildcard queries Hi, Am using Lucene 1.2 and getting OutOfMemoryError when searching using some wildcard queries. Is there some provision that restricts the number of terms for wildcard queries? Thanks, Akila - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryError when using wildcard queries
No, but the JVM does have a memory limit. By default it's 64 megs, I believe. To increase it, use the -Xmx option when you run java. Dan Akila, I may be wrong. But I remembered a while back there is a dicussion about limiting the number of terms expanded using the wildcard query. I'm not sure whether it has been implemented or not. Searching the mailing list should give you some pointers. Then you can implemented the patch back to 1.2 for your ow need. /vh For example, to give the JVM 100 megs of ram, you would write: java -Xmx100m YourClassHere -Original Message- From: Âkila [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2003 9:15 AM To: Lucene Users List Subject: OutOfMemoryError when using wildcard queries Hi, Am using Lucene 1.2 and getting OutOfMemoryError when searching using some wildcard queries. Is there some provision that restricts the number of terms for wildcard queries? Thanks, Akila - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Thanks for the info, but unfortunately it still is getting an OutOfMemoryError, Here's my code: -- final BitSet bits = new BitSet(); HitCollector hc = new HitCollector() { public void collect(int doc, float score){ System.out.println(collect); if (score THRESHOLD) { bits.set(doc); } } }; mSearcher.search(query, hc); System.out.println( results (+bits.cardinality()+):\n); - When I search with a low-hit query, collect is printed many times. When I search with a query I know will hit most of the 1.8 million records, the collect print does not even print, it eats up the 700+MB I allocated and then throws an OutOfMemoryError. Did I do something wrong? Thanks for you help, Cory At 09:43 PM 5/27/2003 +0200, you wrote: Hits hits = searcher.search(myQuery); BitSet results = new BitSet(); searcher.search(myQuery, new HitCollector() { public void collect(int doc, float score) { if (score THRESHOLD) results.set(doc); } }); -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
When I search with a query I know will hit most of the 1.8 million records, the collect print does not even print, it eats up the 700+MB I allocated and then throws an OutOfMemoryError. Are you using wildcard queries? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Yes. Is that the problem? At 05:13 PM 5/28/2003 +0200, you wrote: When I search with a query I know will hit most of the 1.8 million records, the collect print does not even print, it eats up the 700+MB I allocated and then throws an OutOfMemoryError. Are you using wildcard queries? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Yes. Is that the problem? I believe a term with a wildcard is expanded into all possible terms in memory before searching for it, so if the term is 'a*', and you have a million different terms starting with 'a' occuring in your documents, it's quite possible to run out of memory. Does anyone know if there is a setting that limits the number of terms for wildcard queries? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Cory, When performing wildcard queries, the bulk of the memory is used during wildcard term expansion. The memory requirement is proportional to the number of matching terms, not the number of hits. You should make sure you are using the latest Lucene. There was a fix in 1.3 to reduce the memory requirements of all all queries. But, wildcard queries that expand to many terms are allways going to be memory intensive in Lucene. We ran into this problem and decided to put a check on the number of expanded terms and abort the query if the number got too high. If you're ambitious, you could modify the Lucene source to serialize the query process for queries with a large number of terms, but that would be a bit of work. If you absolutely require these huge wildcard queries, then you may have to look into it, though. Non-wildcard queries that return a large result set should not be a memory problem, though. Dave Cory Albright [EMAIL PROTECTED]To: Lucene Users List com [EMAIL PROTECTED] cc: 05/28/03 11:16 AMSubject: Re: too many hits - OutOfMemoryError Please respond to Lucene Users List Yes. Is that the problem? At 05:13 PM 5/28/2003 +0200, you wrote: When I search with a query I know will hit most of the 1.8 million records, the collect print does not even print, it eats up the 700+MB I allocated and then throws an OutOfMemoryError. Are you using wildcard queries? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Thanks for the help! Yes, it works fine without a wildcard search, which I believe at this point will be ok for our app. Thanks again, Cory At 11:50 AM 5/28/2003 -0400, you wrote: Cory, When performing wildcard queries, the bulk of the memory is used during wildcard term expansion. The memory requirement is proportional to the number of matching terms, not the number of hits. You should make sure you are using the latest Lucene. There was a fix in 1.3 to reduce the memory requirements of all all queries. But, wildcard queries that expand to many terms are allways going to be memory intensive in Lucene. We ran into this problem and decided to put a check on the number of expanded terms and abort the query if the number got too high. If you're ambitious, you could modify the Lucene source to serialize the query process for queries with a large number of terms, but that would be a bit of work. If you absolutely require these huge wildcard queries, then you may have to look into it, though. Non-wildcard queries that return a large result set should not be a memory problem, though. Dave Cory Albright [EMAIL PROTECTED]To: Lucene Users List com [EMAIL PROTECTED] cc: 05/28/03 11:16 AMSubject: Re: too many hits - OutOfMemoryError Please respond to Lucene Users List Yes. Is that the problem? At 05:13 PM 5/28/2003 +0200, you wrote: When I search with a query I know will hit most of the 1.8 million records, the collect print does not even print, it eats up the 700+MB I allocated and then throws an OutOfMemoryError. Are you using wildcard queries? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
We ran into this problem and decided to put a check on the number of expanded terms and abort the query if the number got too high. Is it possible to perform this check without having to modify Lucene's source code? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Unfortunately, no. The modifications are not very extreme, though. If you're interested in seeing our approach, let me know. DaveB Eric Jain [EMAIL PROTECTED]To: Lucene Users List b.ch [EMAIL PROTECTED] cc: 05/28/03 12:22 PMSubject: Re: too many hits - OutOfMemoryError Please respond to Lucene Users List We ran into this problem and decided to put a check on the number of expanded terms and abort the query if the number got too high. Is it possible to perform this check without having to modify Lucene's source code? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Hi, I have been following this discussion and as i am anticipating such a problem when my index size grows , i would like to hear your approach on limiting the query expansion. Regards, Harpreet - Original Message - From: [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, May 28, 2003 10:21 PM Subject: Re: too many hits - OutOfMemoryError Unfortunately, no. The modifications are not very extreme, though. If you're interested in seeing our approach, let me know. DaveB Eric Jain [EMAIL PROTECTED]To: Lucene Users List b.ch [EMAIL PROTECTED] cc: 05/28/03 12:22 PMSubject: Re: too many hits - OutOfMemoryError Please respond to Lucene Users List We ran into this problem and decided to put a check on the number of expanded terms and abort the query if the number got too high. Is it possible to perform this check without having to modify Lucene's source code? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
too many hits - OutOfMemoryError
Hi - I have created an index of 1.8 million documents, each document containing 5-10 fields. When I run a search, that I know has a small number of hits, it works great. However, if I run a search that I know will hit most of the documents, I get an OutOfMemoryError.I am using the basic search call: Hits hits = searcher.search(myQuery); Is there a better way to handle this problem? Without knowing the number hits that a given query will return, how do I prevent against this problem? Thank you, Cory Albright - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Out of curiosity, how much free RAM does the computer normally have? And have you tried increasing the amount available to the JVM? David Medinets http://www.codebits.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
The computer is a 1.7Ghz P4 with 1.25GB Ram. I tried the jvm arg: -Xmx700M (as I had a little over 700MB free). Cory Albright At 02:37 PM 5/27/2003 -0400, you wrote: Out of curiosity, how much free RAM does the computer normally have? And have you tried increasing the amount available to the JVM? David Medinets http://www.codebits.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many hits - OutOfMemoryError
Hits hits = searcher.search(myQuery); BitSet results = new BitSet(); searcher.search(myQuery, new HitCollector() { public void collect(int doc, float score) { if (score THRESHOLD) results.set(doc); } }); -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryError with boolean queries
Robert, I'm moving this to lucene-user, which is a more appropriate list for this type of a problem. You are not saying whether you are using some of those handy -X (-Xms -Xmx) command line switches when you invoke your application that dies with OutOfMemoryError. If you are not, try that, it may help. I recall a few other people reporting the same problem and using -Xms and -Xmx solved their problem. If your machine doesn't have the RAM it needs this won't help, of course :) Otis --- Robert_Wennström [EMAIL PROTECTED] wrote: Hi, I'm experiencing OutOfMemoryErrors when searching using many logical ANDs combined with prefix queries. The reason is clearly too many returned hits waiting for combined evaluation. I was wondering if there are any thoughts of changing the search approach to something less memory consuming. This is quite a big problem even with small sets of documents. Example: my 55000 document index runs out of memory when searching for a* AND e* Could you estimate the difficulty to change the behaviour to search for e* in just the hits matching a* ? I'm about to put a BitSet at the innermost hit collection to sort out AND-clauses that hasn't been matched by previous AND-clauses. Is there a better approach ? Thanks for a great java project guys Robert Wennström [developer, netadmin] robert -at- agent25.com www.agent25.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop! http://platinum.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: OutOfMemoryError with boolean queries
Sorry. I wasn't verbose enough. I use the default memory settings. But my issue was the core structure of Lucene taking up (it seems to me) more memory than it would have to, if it had a different approach. Correct me if I'm wrong, but it seems to me that BooleanQuery stores all hits (as Bucket objects) from all terms in the query even if it is a simple war* AND wash* AND sad*. Instead of looking for wash* just in the war* hits (and then looking for sad* in the remaining hits) it makes three separate searches, which would be a waste of memory. - test output begins - Index size = 55000 Query: a* Total memory before: 2031616 Searching for: a* (org.apache.lucene.search.PrefixQuery) Total memory after: 55128064 53527 total matching documents (1984ms) Query: e* Total memory before: 55128064 Searching for: e* (org.apache.lucene.search.PrefixQuery) Total memory after: 55128064 52456 total matching documents (984ms) Query: a* AND e* Total memory before: 55128064 Searching for: +a* +e* (org.apache.lucene.search.BooleanQuery) Total memory after: 124882944 51267 total matching documents (2468ms) - test output ends - In my perfect world the memory allocation, when searching for a* AND e*, should not increase at all after the both separate searches a* and e*, cause it would just allocate space for a*-hits, and ignoring e*-hits that has no previous hit. My biggest index lies at 2,34 million documents during testing, but should grow with approximately 1 docs/day in production. With that figure I wish for the best possible memory handling. At the moment we use a search engine that, given the right question (or wrong), consumes memory like a starving wolf and crashes the whole thing. The search engine should be able to play with about 1GB RAM on the machine. I just don't want the same possibilities of a crash with Lucene too. I want to know if the Lucene developers feel that there are things to optimize or if they have done everything like it should be from the start ? thanks /RW -Ursprungligt meddelande- Frn: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Skickat: den 19 mars 2003 16:19 Till: [EMAIL PROTECTED] mne: Re: OutOfMemoryError with boolean queries Robert, I'm moving this to lucene-user, which is a more appropriate list for this type of a problem. You are not saying whether you are using some of those handy -X (-Xms -Xmx) command line switches when you invoke your application that dies with OutOfMemoryError. If you are not, try that, it may help. I recall a few other people reporting the same problem and using -Xms and -Xmx solved their problem. If your machine doesn't have the RAM it needs this won't help, of course :) Otis --- Robert_Wennstrm [EMAIL PROTECTED] wrote: Hi, I'm experiencing OutOfMemoryErrors when searching using many logical ANDs combined with prefix queries. The reason is clearly too many returned hits waiting for combined evaluation. I was wondering if there are any thoughts of changing the search approach to something less memory consuming. This is quite a big problem even with small sets of documents. Example: my 55000 document index runs out of memory when searching for a* AND e* Could you estimate the difficulty to change the behaviour to search for e* in just the hits matching a* ? I'm about to put a BitSet at the innermost hit collection to sort out AND-clauses that hasn't been matched by previous AND-clauses. Is there a better approach ? Thanks for a great java project guys Robert Wennstrm [developer, netadmin] robert -at- agent25.com www.agent25.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop! http://platinum.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OutOfMemoryError
hi Ian, hi Winton, hi all, sorry I meant heap size of 100Mb. I'm starting java with -Xmx100m. I'm not setting -Xms. For what I know now, I had a bug in my own code. still I don't understand where these OutOfMemoryErrors came from. I will try to index again in one thread without RAMDirectory just to check if the program is sane. the problem that the files get to big while merging remains. I wonder why there is not the possibility to tell lucene not to create files that are bigger than the system limit. how am i supposed to know after how many documents this limit is reached? lucene creates the documents - i just know the average size of a piece of text that is the input for a document. or am I missing something?! chantal Am Mittwoch, 28. November 2001 20:14 schrieben Sie: Were you using -mx and -ms (setting heap size ?) Cheers, Winton As I run the program on a multi-processor machine I now changed the code to index each file in a single thread and write to one single IndexWriter. the merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum heap size to 1MB. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: OutOfMemoryError
Doug sent the message below to the list on 3-Nov in response to a query about file size limits. There may have been more related stuff on the thread as well. -- Ian. *** Anyway, is there anyway to control how big the indexes grow ? The easiset thing is to set IndexWriter.maxMergeDocs. Since you hit 2GB at 8M docs, set this to 7M. That will keep Lucene from trying to merge an index that won't fit in your filesystem. (It will actually effectively round this down to the next lower power of Index.mergeFactor. So with the default mergeFactor=10, maxMergeDocs=7M will generate a series of 1M document indexes, since merging 10 of these would exceed the max.) Slightly more complex: you could further minimize the number of segments, if, when you've added seven million documents, optimize the index and start a new index. Then use MultiSearcher to search. Even more complex and optimal: write a version of FSDirectory that, when a file exceeds 2GB, creates a subdirectory and represents the file as a series of files. (I've done this before, and found that, on at least the version of Solaris that I was using, the files had to be a few 100k less than 2GB for programs like 'cp' and 'ftp' to operate correctly on them.) Doug Chantal Ackermann wrote: hi Ian, hi Winton, hi all, sorry I meant heap size of 100Mb. I'm starting java with -Xmx100m. I'm not setting -Xms. For what I know now, I had a bug in my own code. still I don't understand where these OutOfMemoryErrors came from. I will try to index again in one thread without RAMDirectory just to check if the program is sane. the problem that the files get to big while merging remains. I wonder why there is not the possibility to tell lucene not to create files that are bigger than the system limit. how am i supposed to know after how many documents this limit is reached? lucene creates the documents - i just know the average size of a piece of text that is the input for a document. or am I missing something?! chantal -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: OutOfMemoryError
Chantal, For what I know now, I had a bug in my own code. still I don't understand where these OutOfMemoryErrors came from. I will try to index again in one thread without RAMDirectory just to check if the program is sane. Java often has misleading error messages. For example, on solaris machines the default ulimit used to be 24 - that's 24 open file handles! Yeesh. This will cause an OutOfMemoryError. So don't assume it's actually a memory problem, particularly if a memory problem doesn't particularly make sense. Just a thought. Steven J. Owens [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: OutOfMemoryError
I wrote: Java often has misleading error messages. For example, on solaris machines the default ulimit used to be 24 - that's 24 open file handles! Yeesh. This will cause an OutOfMemoryError. So don't Jeff Trent replied: Wow. I did not know that! I also don't see an option to increase that limit from java -X. Do you know how to increase that limit? That's used to be, I think it's larger on newer machines. I don't think there's a java command line option to set this, it's a system limit. The solaris command to check it is ulimit. To set it for a given login process (assuming sufficient privileges) use ulimit number (i.e. ulimit 128). ulimit -a prints out all limits. Steven J. Owens [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
OutOfMemoryError
hi to all, please help! I think I mixed my brain up already with this stuff... I'm trying to index about 29 textfiles where the biggest one is ~700Mb and the smallest ~300Mb. I achieved once to run the whole index, with a merge factor = 10 and maxMergeDocs=1. This took more than 35 hours I think (don't know exactly) and it didn't use much RAM (though it could have). unfortunately I had a call to optimize at the end and while optimization an IOException (File to big) occured (while merging). As I run the program on a multi-processor machine I now changed the code to index each file in a single thread and write to one single IndexWriter. the merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum heap size to 1MB. I tried to use RAMDirectory (as mentioned in the mailing list) and just use IndexWriter.addDocument(). At the moment it seems not to make any difference. after a while _all_ the threads exit one after another (not all at once!) with an OutOfMemoryError. the priority of all of them is at the minimum. even if the multithreading doesn't increase performance I would be glad if I could just once get it running again. I would be even happier if someone could give me a hint what would be the best way to index this amount of data. (the average size of an entry that gets parsed for a Document is about 1Kb.) thanx for any help! chantal -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: OutOfMemoryError
I've loaded a large (but not as large as yours) index with mergeFactor set to 1000. Was substantially faster than with default setting. Making it higher didn't seem to make things much faster but did cause it to use more memory. In addition I loaded the data in chunks in separate processes and optimized the index after each chunk, again in a separate process. All done straight to disk, no messing about with RAMDirectories. Didn't play with maxMergeDocs and am not sure what you mean by maximum heap size but 1MB doesn't sound very large. -- Ian. [EMAIL PROTECTED] Chantal Ackermann wrote: hi to all, please help! I think I mixed my brain up already with this stuff... I'm trying to index about 29 textfiles where the biggest one is ~700Mb and the smallest ~300Mb. I achieved once to run the whole index, with a merge factor = 10 and maxMergeDocs=1. This took more than 35 hours I think (don't know exactly) and it didn't use much RAM (though it could have). unfortunately I had a call to optimize at the end and while optimization an IOException (File to big) occured (while merging). As I run the program on a multi-processor machine I now changed the code to index each file in a single thread and write to one single IndexWriter. the merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum heap size to 1MB. I tried to use RAMDirectory (as mentioned in the mailing list) and just use IndexWriter.addDocument(). At the moment it seems not to make any difference. after a while _all_ the threads exit one after another (not all at once!) with an OutOfMemoryError. the priority of all of them is at the minimum. even if the multithreading doesn't increase performance I would be glad if I could just once get it running again. I would be even happier if someone could give me a hint what would be the best way to index this amount of data. (the average size of an entry that gets parsed for a Document is about 1Kb.) thanx for any help! chantal -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]