Re: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Justin Swanhart
You probably need to increase the amount of RAM available to your JVM.  

See the parameters:
-Xmx   :Maximum memory usable by the JVM
-Xms   :Initial memory allocated to JVM

My params are;  -Xmx2048m -Xms128m  (2G max, 128M initial)


On Fri, 10 Dec 2004 11:17:29 -0600, Sildy Augustine
[EMAIL PROTECTED] wrote:
 I think you should close your files in a finally clause in case of
 exceptions with file system and also print out the exception.
 
 You could be running out of file handles.
 
 
 
 -Original Message-
 From: Jin, Ying [mailto:[EMAIL PROTECTED]
 Sent: Friday, December 10, 2004 11:15 AM
 To: [EMAIL PROTECTED]
 Subject: OutOfMemoryError with Lucene 1.4 final
 
 Hi, Everyone,
 
 We're trying to index ~1500 archives but get OutOfMemoryError about
 halfway through the index process. I've tried to run program under two
 different Redhat Linux servers: One with 256M memory and 365M swap
 space. The other one with 512M memory and 1G swap space. However, both
 got OutOfMemoryError at the same place (at record 898).
 
 Here is my code for indexing:
 
 ===
 
 Document doc = new Document();
 
 doc.add(Field.UnIndexed(path, f.getPath()));
 
 doc.add(Field.Keyword(modified,
 
 DateField.timeToString(f.lastModified(;
 
 doc.add(Field.UnIndexed(eprintid, id));
 
 doc.add(Field.Text(metadata, metadata));
 
 FileInputStream is = new FileInputStream(f);  // the text file
 
 BufferedReader reader = new BufferedReader(new
 InputStreamReader(is));
 
 StringBuffer stringBuffer = new StringBuffer();
 
 String line = ;
 
 try{
 
   while((line = reader.readLine()) != null){
 
 stringBuffer.append(line);
 
   }
 
   doc.add(Field.Text(contents, stringBuffer.toString()));
 
   // release the resources
 
   is.close();
 
   reader.close();
 
 }catch(java.io.IOException e){}
 
 =
 
 Is there anything wrong with my code or I need more memory?
 
 Thanks for any help!
 
 Ying
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Xiangyu Jin

I am not sure. But guess there are three possilities,

(1). see that you use
Field.Text(contents, stringBuffer.toString())
This will store all your string of text into document object.
And it might be long ...

I do not know the detail how Lucene implemented.
I think you can try use unstored first to see
if the same problem happen.

BTW, how large is your document. Mine has 1M docs and
max-length less than 1 M, usually has length about several k.

(2) I guess another possiblilty is that record 898 is a very long
document, maybe java' s string object has a maxlength?
Just trace the code, see when the exception occur.

(3) Moreover, if you run it on a java VM, it also has a setting of
its virtual mem. It has nothing to do with the hardware
you are running. I has met this before when I use the directory's
ListOfFile function, where it easily exceed the max mem, if
there are 1M docs under the same dir (a stupid mistake I made).
But if I expand the VM's mem, it is then appears ok.

:)





On Fri, 10 Dec 2004, Jin, Ying wrote:

 Hi, Everyone,



 We're trying to index ~1500 archives but get OutOfMemoryError about
 halfway through the index process. I've tried to run program under two
 different Redhat Linux servers: One with 256M memory and 365M swap
 space. The other one with 512M memory and 1G swap space. However, both
 got OutOfMemoryError at the same place (at record 898).



 Here is my code for indexing:

 ===

 Document doc = new Document();

 doc.add(Field.UnIndexed(path, f.getPath()));

 doc.add(Field.Keyword(modified,


 DateField.timeToString(f.lastModified(;

 doc.add(Field.UnIndexed(eprintid, id));

 doc.add(Field.Text(metadata, metadata));



 FileInputStream is = new FileInputStream(f);  // the text file

 BufferedReader reader = new BufferedReader(new
 InputStreamReader(is));



 StringBuffer stringBuffer = new StringBuffer();

 String line = ;

 try{

   while((line = reader.readLine()) != null){

 stringBuffer.append(line);

   }

   doc.add(Field.Text(contents, stringBuffer.toString()));

   // release the resources

   is.close();

   reader.close();

 }catch(java.io.IOException e){}

 =

 Is there anything wrong with my code or I need more memory?



 Thanks for any help!

 Ying



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Jin, Ying
Great!!! It works perfect after I setup -Xms and -Xmx JVM command-line 
parameters with:
java -Xms128m -Xmx128m

It turns out that my JVM is running out of memory. And Otis is right on
my 
reader closing too.
reader.close() will close the reader and release any system resources 
associated with it.

I really appreciate everyone's help!
Ying



RE: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Sildy Augustine
I think you should close your files in a finally clause in case of
exceptions with file system and also print out the exception. 

You could be running out of file handles.

-Original Message-
From: Jin, Ying [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 10, 2004 11:15 AM
To: [EMAIL PROTECTED]
Subject: OutOfMemoryError with Lucene 1.4 final

Hi, Everyone,

 

We're trying to index ~1500 archives but get OutOfMemoryError about
halfway through the index process. I've tried to run program under two
different Redhat Linux servers: One with 256M memory and 365M swap
space. The other one with 512M memory and 1G swap space. However, both
got OutOfMemoryError at the same place (at record 898). 

 

Here is my code for indexing:

===

Document doc = new Document();

doc.add(Field.UnIndexed(path, f.getPath()));

doc.add(Field.Keyword(modified,

 
DateField.timeToString(f.lastModified(;

doc.add(Field.UnIndexed(eprintid, id));

doc.add(Field.Text(metadata, metadata));

 

FileInputStream is = new FileInputStream(f);  // the text file

BufferedReader reader = new BufferedReader(new
InputStreamReader(is));

 

StringBuffer stringBuffer = new StringBuffer();

String line = ;

try{

  while((line = reader.readLine()) != null){

stringBuffer.append(line);

  }

  doc.add(Field.Text(contents, stringBuffer.toString()));

  // release the resources

  is.close();

  reader.close();

}catch(java.io.IOException e){}

=

Is there anything wrong with my code or I need more memory?

 

Thanks for any help!

Ying


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Jin, Ying
Hi, Everyone,

 

We're trying to index ~1500 archives but get OutOfMemoryError about
halfway through the index process. I've tried to run program under two
different Redhat Linux servers: One with 256M memory and 365M swap
space. The other one with 512M memory and 1G swap space. However, both
got OutOfMemoryError at the same place (at record 898). 

 

Here is my code for indexing:

===

Document doc = new Document();

doc.add(Field.UnIndexed(path, f.getPath()));

doc.add(Field.Keyword(modified,

 
DateField.timeToString(f.lastModified(;

doc.add(Field.UnIndexed(eprintid, id));

doc.add(Field.Text(metadata, metadata));

 

FileInputStream is = new FileInputStream(f);  // the text file

BufferedReader reader = new BufferedReader(new
InputStreamReader(is));

 

StringBuffer stringBuffer = new StringBuffer();

String line = ;

try{

  while((line = reader.readLine()) != null){

stringBuffer.append(line);

  }

  doc.add(Field.Text(contents, stringBuffer.toString()));

  // release the resources

  is.close();

  reader.close();

}catch(java.io.IOException e){}

=

Is there anything wrong with my code or I need more memory?

 

Thanks for any help!

Ying



RE: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Otis Gospodnetic
Ying,

You should follow this finally block advice below.  In addition, I
think you can just close the reader, and it will close the underlying
stream (I'm not sure about that, double-check it).

You are not running out of file handles, though.  Your JVM is running
out of memory.  You can play with:

1) -Xms and -Xmx JVM command-line parameters
2) IndexWriter's parameters: mergeFactor and minMergeDocs - check the
Javadocs for more info.  They will let you control how much memory your
indexing process uses.

Otis


--- Sildy Augustine [EMAIL PROTECTED] wrote:

 I think you should close your files in a finally clause in case of
 exceptions with file system and also print out the exception. 
 
 You could be running out of file handles.
 
 -Original Message-
 From: Jin, Ying [mailto:[EMAIL PROTECTED] 
 Sent: Friday, December 10, 2004 11:15 AM
 To: [EMAIL PROTECTED]
 Subject: OutOfMemoryError with Lucene 1.4 final
 
 Hi, Everyone,
 
  
 
 We're trying to index ~1500 archives but get OutOfMemoryError about
 halfway through the index process. I've tried to run program under
 two
 different Redhat Linux servers: One with 256M memory and 365M swap
 space. The other one with 512M memory and 1G swap space. However,
 both
 got OutOfMemoryError at the same place (at record 898). 
 
  
 
 Here is my code for indexing:
 
 ===
 
 Document doc = new Document();
 
 doc.add(Field.UnIndexed(path, f.getPath()));
 
 doc.add(Field.Keyword(modified,
 
  
 DateField.timeToString(f.lastModified(;
 
 doc.add(Field.UnIndexed(eprintid, id));
 
 doc.add(Field.Text(metadata, metadata));
 
  
 
 FileInputStream is = new FileInputStream(f);  // the text file
 
 BufferedReader reader = new BufferedReader(new
 InputStreamReader(is));
 
  
 
 StringBuffer stringBuffer = new StringBuffer();
 
 String line = ;
 
 try{
 
   while((line = reader.readLine()) != null){
 
 stringBuffer.append(line);
 
   }
 
   doc.add(Field.Text(contents, stringBuffer.toString()));
 
   // release the resources
 
   is.close();
 
   reader.close();
 
 }catch(java.io.IOException e){}
 
 =
 
 Is there anything wrong with my code or I need more memory?
 
  
 
 Thanks for any help!
 
 Ying
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Xiangyu Jin

Ok, I see. Seems most ppl think is the third possiblity

On Fri, 10 Dec 2004, Xiangyu  Jin wrote:


 I am not sure. But guess there are three possilities,

 (1). see that you use
 Field.Text(contents, stringBuffer.toString())
 This will store all your string of text into document object.
 And it might be long ...

 I do not know the detail how Lucene implemented.
 I think you can try use unstored first to see
 if the same problem happen.

 BTW, how large is your document. Mine has 1M docs and
 max-length less than 1 M, usually has length about several k.

 (2) I guess another possiblilty is that record 898 is a very long
 document, maybe java' s string object has a maxlength?
 Just trace the code, see when the exception occur.

 (3) Moreover, if you run it on a java VM, it also has a setting of
 its virtual mem. It has nothing to do with the hardware
 you are running. I has met this before when I use the directory's
 ListOfFile function, where it easily exceed the max mem, if
 there are 1M docs under the same dir (a stupid mistake I made).
 But if I expand the VM's mem, it is then appears ok.

 :)





 On Fri, 10 Dec 2004, Jin, Ying wrote:

  Hi, Everyone,
 
 
 
  We're trying to index ~1500 archives but get OutOfMemoryError about
  halfway through the index process. I've tried to run program under two
  different Redhat Linux servers: One with 256M memory and 365M swap
  space. The other one with 512M memory and 1G swap space. However, both
  got OutOfMemoryError at the same place (at record 898).
 
 
 
  Here is my code for indexing:
 
  ===
 
  Document doc = new Document();
 
  doc.add(Field.UnIndexed(path, f.getPath()));
 
  doc.add(Field.Keyword(modified,
 
 
  DateField.timeToString(f.lastModified(;
 
  doc.add(Field.UnIndexed(eprintid, id));
 
  doc.add(Field.Text(metadata, metadata));
 
 
 
  FileInputStream is = new FileInputStream(f);  // the text file
 
  BufferedReader reader = new BufferedReader(new
  InputStreamReader(is));
 
 
 
  StringBuffer stringBuffer = new StringBuffer();
 
  String line = ;
 
  try{
 
while((line = reader.readLine()) != null){
 
  stringBuffer.append(line);
 
}
 
doc.add(Field.Text(contents, stringBuffer.toString()));
 
// release the resources
 
is.close();
 
reader.close();
 
  }catch(java.io.IOException e){}
 
  =
 
  Is there anything wrong with my code or I need more memory?
 
 
 
  Thanks for any help!
 
  Ying
 
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Re: Re: OutOfMemoryError

2004-08-19 Thread Otis Gospodnetic
Terence,

Calling close() on IndexSearcher will not release the memory
immediately.  It will only release resources (e.g. other Java objects
used by IndexSearcher), and it is up to the JVM's garbage collector to
actually reclaim/release the previously used memory.  There are
command-line parameters you can use to tune garbage collection.  Here
is one example:

java -XX:+UseParallelGC -XX:PermSize=20M -XX:MaxNewSize=32M
-XX:NewSize=32M .

This works with Sun's JVM.  The above is just an example - you need to
play with the options and see what works for you.  There are other
options, too:

-Xnoclassgc   disable class garbage collection
-Xincgc   enable incremental garbage collection
-Xloggc:filelog GC status to a file with time stamps
-Xbatch   disable background compilation
-Xmssizeset initial Java heap size
-Xmxsizeset maximum Java heap size
-Xsssizeset java thread stack size
-Xprofoutput cpu profiling data
-Xrunhprof[:help]|[:option=value, ...]
  perform JVMPI heap, cpu, or monitor profiling


Otis



--- Terence Lai [EMAIL PROTECTED] wrote:

 Hi David,
 
 In my test program, I invoke the IndexSearcher.close() method at the
 end of the loop. However, it doesn't seems to release the memory. My
 concern is that even though I put the IndexSearcher.close() statement
 in the hook methods, it may not release all the memory until the
 application server is shut down. Every time the EJB object is
 re-actived, a new IndexSearcher is open. If the resources allocated
 to the previous IndexSearcher cannot be fully released, the system
 will use up more memory. Eventually, it may run into the
 OutOfMemoryError.
 
 I am not very familiar with EJB. My interpretation could be wrong. I
 am going to try the hook methods. Thanks for pointing this out to me.
 
 Terence
 
   I tried to reuse the IndexSearcher, but I have another question.
 What
   happen if an application server unloads the class after it is
 idle for a
   while, and then re-instantiate the object back when it recieves a
 new
   request?
  
  The EJB spec takes this into account, as there are hook methods you
 can 
  define that get called when your EJB object is about to be
 passivated or 
  activated.  Search for something like passivate/active and/or 
  ejbLoad/ejbSave.  This is where you should close/open your single
 index 
  searcher object.
  
  -- 
  Cheers,
  David
  
  This message is intended only for the named recipient.  If you are
 not the 
  intended recipient you are notified that disclosing, copying,
 distributing 
  or taking any action  in reliance on the contents of this
 information is 
  strictly prohibited.
  
  
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
  
 
 
 
 
 --
 Get your free email account from http://www.trekspace.com
   Your Internet Virtual Desktop!
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Re: OutOfMemoryError

2004-08-19 Thread Otis Gospodnetic
Use the life-cycle hooks mentioned in another email
(activate/passivate) and when you detect that the server is about to
unload your class, call close() on IndexSearcher.  I haven't used
Lucene in an EJB environment, so I don't know the details,
unfortunately. :(

Your simulation may be too fast for the JVM.  Like I mentioned in the
previous email, close() doesn't release the memory, it's the JVM that
has to reclaim it.  Your for loop is very fast (no pauses anywhere,
probably), so maybe the garbage collector doesn't have time to reclaim
the needed memory.  I don't know enough about the low-level JVM stuff
to be certain about this statement, but you could try adding some
Thread.sleep calls in your test code.

Otis

--- Terence Lai [EMAIL PROTECTED] wrote:

 Hi,
 
 I tried to reuse the IndexSearcher, but I have another question. What
 happen if an application server unloads the class after it is idle
 for a while, and then re-instantiate the object back when it recieves
 a new request?
 
 Everytime the server re-instantiates the class, a new IndexSearcher
 instance will be created. If the IndexSearcher.close() method does
 not release all the memory and the server keeps unloading and
 re-instantiating the class, it will eventually hit the
 OutOfMemoryError issue. The test program from my previous email is
 simulating this condition. The reason why I instantiate/close the
 IndexSearcher inside the loop is to simulate the scenario when the
 server unloads and re-instantiates the object. I think that the same
 issue will happen if the application is written in servlet.
 
 Although the singleton pattern may resolve the problem that I
 described above; however, it isn't permitted by the J2EE spec
 according to some news letters. In order words, I can't use singleton
 pattern in EJB. Please correct me if I am wrong on this.
 
 Thanks,
 Terence
 
  Reuse your IndexSearcher! :)
  
  Also, I think somebody has written some EJB stuff to work with
 Lucene. 
  The project is on SF.net.
  
  Otis
  
  --- Terence Lai [EMAIL PROTECTED] wrote:
  
   Hi All,
   
   I am getting a OutOfMemoryError when I deploy my EJB application.
 To
   debug the problem, I wrote the following test program:
   
   public static void main(String[] args) {
   try {
   Query query = getQuery();
   
   for (int i=0; i1000; i++) {
   search(query);
   
   if ( i%50 == 0 ) {
   System.out.println(Sleep...);
   Thread.currentThread().sleep(5000);
   System.out.println(Wake up!);
   }
   }
   } catch (Exception e) {
   e.printStackTrace();
   }
   }
   
   private static void search(Query query) throws IOException {
   FSDirectory fsDir = null;
   IndexSearcher is = null;
   Hits hits = null;
   
   try {
   fsDir = FSDirectory.getDirectory(C:\\index, false);
   is = new IndexSearcher(fsDir);
   SortField sortField = new
   SortField(profile_modify_date,
   SortField.STRING, true);
   
   hits = is.search(query, new Sort(sortField));
   } finally {
   if (is != null) {
   try {
   is.close();
   } catch (Exception ex) {
   }
   }
   
   if (fsDir != null) {
   try {
   is.close();
   } catch (Exception ex) {
   }
   }
   }
   
   }
   
   In the test program, I wrote a loop to keep calling the search
   method. Everytime it enters the search method, I would
 instantiate
   the IndexSearcher. Before I exit the method, I close the
   IndexSearcher and FSDirectory. I also made the Thread sleep for 5
   seconds in every 50 searches. Hopefully, this will give some time
 for
   the java to do the Garbage Collection. Unfortunately, when I
 observe
   the memory usage of my process, it keeps increasing until I got
 the
   java.lang.OutOfMemoryError.
   
   Note that I invoke the IndexSearcher.search(Query query, Sort
 sort)
   to process the search. If I don't specify the Sort field(i.e.
 using
   IndexSearcher.search(query)), I don't have this problem, and the
   memory usage keeps at a very static level.
   
   Does anyone experience a similar problem? Did I do something
 wrong in
   the test program. I throught by closing the IndexSearcher and the
   FSDirectory, the memory will be able to release during the
 Garbage
   Collection.
   
   Thanks,
   Terence
   
   
   
   
   --
   Get your free email account from http://www.trekspace.com
 Your Internet Virtual Desktop

RE: Re: OutOfMemoryError

2004-08-19 Thread Otis Gospodnetic
Terence,

 2) I have a background process to update the index files. If I keep
 the IndexSearcher opened, I am not sure whether it will pick up the
 changes from the index updates done in the background process.

This is a frequently asked question.  Basically, you have to make use
of IndexReader's method for checking the index version.  You can do it
as often as you want, it's really up to you, and when you detect that
the index has been modified, throw away the old IndexSearcher and make
a new one.  If you are sure nobody is using your old IndexSearcher, you
can close() it, but if somebody (e.g. another thread) is still using it
and you close() it, you will get an error.

Otis



  Reuse your IndexSearcher! :)
  
  Also, I think somebody has written some EJB stuff to work with
 Lucene. 
  The project is on SF.net.
  
  Otis
  
  --- Terence Lai [EMAIL PROTECTED] wrote:
  
   Hi All,
   
   I am getting a OutOfMemoryError when I deploy my EJB application.
 To
   debug the problem, I wrote the following test program:
   
   public static void main(String[] args) {
   try {
   Query query = getQuery();
   
   for (int i=0; i1000; i++) {
   search(query);
   
   if ( i%50 == 0 ) {
   System.out.println(Sleep...);
   Thread.currentThread().sleep(5000);
   System.out.println(Wake up!);
   }
   }
   } catch (Exception e) {
   e.printStackTrace();
   }
   }
   
   private static void search(Query query) throws IOException {
   FSDirectory fsDir = null;
   IndexSearcher is = null;
   Hits hits = null;
   
   try {
   fsDir = FSDirectory.getDirectory(C:\\index, false);
   is = new IndexSearcher(fsDir);
   SortField sortField = new
   SortField(profile_modify_date,
   SortField.STRING, true);
   
   hits = is.search(query, new Sort(sortField));
   } finally {
   if (is != null) {
   try {
   is.close();
   } catch (Exception ex) {
   }
   }
   
   if (fsDir != null) {
   try {
   is.close();
   } catch (Exception ex) {
   }
   }
   }
   
   }
   
   In the test program, I wrote a loop to keep calling the search
   method. Everytime it enters the search method, I would
 instantiate
   the IndexSearcher. Before I exit the method, I close the
   IndexSearcher and FSDirectory. I also made the Thread sleep for 5
   seconds in every 50 searches. Hopefully, this will give some time
 for
   the java to do the Garbage Collection. Unfortunately, when I
 observe
   the memory usage of my process, it keeps increasing until I got
 the
   java.lang.OutOfMemoryError.
   
   Note that I invoke the IndexSearcher.search(Query query, Sort
 sort)
   to process the search. If I don't specify the Sort field(i.e.
 using
   IndexSearcher.search(query)), I don't have this problem, and the
   memory usage keeps at a very static level.
   
   Does anyone experience a similar problem? Did I do something
 wrong in
   the test program. I throught by closing the IndexSearcher and the
   FSDirectory, the memory will be able to release during the
 Garbage
   Collection.
   
   Thanks,
   Terence
   
   
   
   
   --
   Get your free email account from http://www.trekspace.com
 Your Internet Virtual Desktop!
   
  
 -
   To unsubscribe, e-mail:
 [EMAIL PROTECTED]
   For additional commands, e-mail:
 [EMAIL PROTECTED]
   
   
  
  
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
  
 
 
 
 
 --
 Get your free email account from http://www.trekspace.com
   Your Internet Virtual Desktop!
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: OutOfMemoryError

2004-08-18 Thread John Moylan
Terence, 

This may help:
http://issues.apache.org/bugzilla/show_bug.cgi?id=30628
I had the problem, above...but I managed to resolve it be not closing
the indexsearcher. Instead I now reuse the same indexsearcher all of the
time within my JSP code as an application variable. GC keeps memory in
check on my System now and search is faster too.

Also, make sure that you are using 1.4.1 as it fixes a sort caching
problem in 1.4

  if ((application.getAttribute(searcher)) != null){
searcher = (IndexSearcher)application.getAttribute(searcher);
  } 
  else {
searcher = new IndexSearcher(IndexReader.open(indexName));
  application.setAttribute(searcher, searcher);
  }

   


On Tue, 2004-08-17 at 23:39, Terence Lai wrote:
 Sorry. I should make it more clear in my last email. I have implemented an EJB 
 Session Bean executing the Lucene search. At the beginning, the session been is 
 working fine. It returns the correct search results to me. As more and more search 
 requests being processed, the server ends up having the OutOfMemoryError. If I 
 restart the server, every thing works fine again.
 
 Terence
 
  Hi All,
  
  I am getting a OutOfMemoryError when I deploy my EJB application. To debug the 
  problem, 
  I wrote the following test program:
  
  public static void main(String[] args) {
  try {
  Query query = getQuery();
  
  for (int i=0; i1000; i++) {
  search(query);
  
  if ( i%50 == 0 ) {
  System.out.println(Sleep...);
  Thread.currentThread().sleep(5000);
  System.out.println(Wake up!);
  }
  }
  } catch (Exception e) {
  e.printStackTrace();
  }
  }
  
  private static void search(Query query) throws IOException {
  FSDirectory fsDir = null;
  IndexSearcher is = null;
  Hits hits = null;
  
  try {
  fsDir = FSDirectory.getDirectory(C:\\index, false);
  is = new IndexSearcher(fsDir);
  SortField sortField = new SortField(profile_modify_date,
  SortField.STRING, true);
  
  hits = is.search(query, new Sort(sortField));
  } finally {
  if (is != null) {
  try {
  is.close();
  } catch (Exception ex) {
  }
  }
  
  if (fsDir != null) {
  try {
  is.close();
  } catch (Exception ex) {
  }
  }
  }
  
  }
  
  In the test program, I wrote a loop to keep calling the search method. Everytime 
  it enters the search method, I would instantiate the IndexSearcher. Before I exit 
  the method, I close the IndexSearcher and FSDirectory. I also made the Thread 
  sleep 
  for 5 seconds in every 50 searches. Hopefully, this will give some time for the 
  java to do the Garbage Collection. Unfortunately, when I observe the memory usage 
  of my process, it keeps increasing until I got the java.lang.OutOfMemoryError.
  
  Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the 
  search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), 
  I don't have this problem, and the memory usage keeps at a very static level.
  
  Does anyone experience a similar problem? Did I do something wrong in the test 
  program. 
  I throught by closing the IndexSearcher and the FSDirectory, the memory will be 
  able to release during the Garbage Collection.
  
  Thanks,
  Terence
  
  
  
  
  --
  Get your free email account from http://www.trekspace.com
Your Internet Virtual Desktop!
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 
 
 
 --
 Get your free email account from http://www.trekspace.com
   Your Internet Virtual Desktop!
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
-- 
John Moylan
RT ePublishing,
Montrose House,
Donnybrook,
Dublin 4
T: +353 1 2083564
E: [EMAIL PROTECTED]


**
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful.
Please note that emails

Re: OutOfMemoryError

2004-08-18 Thread Otis Gospodnetic
Reuse your IndexSearcher! :)

Also, I think somebody has written some EJB stuff to work with Lucene. 
The project is on SF.net.

Otis

--- Terence Lai [EMAIL PROTECTED] wrote:

 Hi All,
 
 I am getting a OutOfMemoryError when I deploy my EJB application. To
 debug the problem, I wrote the following test program:
 
 public static void main(String[] args) {
 try {
 Query query = getQuery();
 
 for (int i=0; i1000; i++) {
 search(query);
 
 if ( i%50 == 0 ) {
 System.out.println(Sleep...);
 Thread.currentThread().sleep(5000);
 System.out.println(Wake up!);
 }
 }
 } catch (Exception e) {
 e.printStackTrace();
 }
 }
 
 private static void search(Query query) throws IOException {
 FSDirectory fsDir = null;
 IndexSearcher is = null;
 Hits hits = null;
 
 try {
 fsDir = FSDirectory.getDirectory(C:\\index, false);
 is = new IndexSearcher(fsDir);
 SortField sortField = new
 SortField(profile_modify_date,
 SortField.STRING, true);
 
 hits = is.search(query, new Sort(sortField));
 } finally {
 if (is != null) {
 try {
 is.close();
 } catch (Exception ex) {
 }
 }
 
 if (fsDir != null) {
 try {
 is.close();
 } catch (Exception ex) {
 }
 }
 }
 
 }
 
 In the test program, I wrote a loop to keep calling the search
 method. Everytime it enters the search method, I would instantiate
 the IndexSearcher. Before I exit the method, I close the
 IndexSearcher and FSDirectory. I also made the Thread sleep for 5
 seconds in every 50 searches. Hopefully, this will give some time for
 the java to do the Garbage Collection. Unfortunately, when I observe
 the memory usage of my process, it keeps increasing until I got the
 java.lang.OutOfMemoryError.
 
 Note that I invoke the IndexSearcher.search(Query query, Sort sort)
 to process the search. If I don't specify the Sort field(i.e. using
 IndexSearcher.search(query)), I don't have this problem, and the
 memory usage keeps at a very static level.
 
 Does anyone experience a similar problem? Did I do something wrong in
 the test program. I throught by closing the IndexSearcher and the
 FSDirectory, the memory will be able to release during the Garbage
 Collection.
 
 Thanks,
 Terence
 
 
 
 
 --
 Get your free email account from http://www.trekspace.com
   Your Internet Virtual Desktop!
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Re: OutOfMemoryError

2004-08-18 Thread Terence Lai
Hi Otis,

The reason why I ran into this problem is that I partition my search documents into 
multiple index directories ordered by document modified date. My application only 
returns the lastest 500 documents that matches the criteria. By partitioning the 
documents into different directories, we have a huge performance gain. Considering I 
have the following partitions,

- partition 1 (earliest documents are in this partition)
- partition 2
- partition 3
- partition 4 (latest documents are in this partition)

If I only need the lastest 500 documents, I will start searching from partition 4. If 
I got 500 documents matched, I don't need to search for the remaining partitions. 
Otherwise, I will perform another search on partition 3 and so forth util I get 500 
documents or I go through all the partitions. I can also make use of the MultiSearcher 
and ParallelMultiSearcher in my search.

Now, the problems that I am having if I keep the IndexSearcher opened are the 
followings:

1) As the number of documents increases, my number of my index partition directories 
will also increase since I set a upper limit of the number of documents in each 
partition. If it reaches the limit, I will create a new partitions. As the number of 
IndexSearcher increases, it will eventually runs out of memory if I cannot close the 
IndexSearcher and release the memory.

2) I have a background process to update the index files. If I keep the IndexSearcher 
opened, I am not sure whether it will pick up the changes from the index updates done 
in the background process.

Any idea how I can work around this problem?

Thanks,
Terence
 Reuse your IndexSearcher! :)
 
 Also, I think somebody has written some EJB stuff to work with Lucene. 
 The project is on SF.net.
 
 Otis
 
 --- Terence Lai [EMAIL PROTECTED] wrote:
 
  Hi All,
  
  I am getting a OutOfMemoryError when I deploy my EJB application. To
  debug the problem, I wrote the following test program:
  
  public static void main(String[] args) {
  try {
  Query query = getQuery();
  
  for (int i=0; i1000; i++) {
  search(query);
  
  if ( i%50 == 0 ) {
  System.out.println(Sleep...);
  Thread.currentThread().sleep(5000);
  System.out.println(Wake up!);
  }
  }
  } catch (Exception e) {
  e.printStackTrace();
  }
  }
  
  private static void search(Query query) throws IOException {
  FSDirectory fsDir = null;
  IndexSearcher is = null;
  Hits hits = null;
  
  try {
  fsDir = FSDirectory.getDirectory(C:\\index, false);
  is = new IndexSearcher(fsDir);
  SortField sortField = new
  SortField(profile_modify_date,
  SortField.STRING, true);
  
  hits = is.search(query, new Sort(sortField));
  } finally {
  if (is != null) {
  try {
  is.close();
  } catch (Exception ex) {
  }
  }
  
  if (fsDir != null) {
  try {
  is.close();
  } catch (Exception ex) {
  }
  }
  }
  
  }
  
  In the test program, I wrote a loop to keep calling the search
  method. Everytime it enters the search method, I would instantiate
  the IndexSearcher. Before I exit the method, I close the
  IndexSearcher and FSDirectory. I also made the Thread sleep for 5
  seconds in every 50 searches. Hopefully, this will give some time for
  the java to do the Garbage Collection. Unfortunately, when I observe
  the memory usage of my process, it keeps increasing until I got the
  java.lang.OutOfMemoryError.
  
  Note that I invoke the IndexSearcher.search(Query query, Sort sort)
  to process the search. If I don't specify the Sort field(i.e. using
  IndexSearcher.search(query)), I don't have this problem, and the
  memory usage keeps at a very static level.
  
  Does anyone experience a similar problem? Did I do something wrong in
  the test program. I throught by closing the IndexSearcher and the
  FSDirectory, the memory will be able to release during the Garbage
  Collection.
  
  Thanks,
  Terence
  
  
  
  
  --
  Get your free email account from http://www.trekspace.com
Your Internet Virtual Desktop!
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED

RE: Re: OutOfMemoryError

2004-08-18 Thread Terence Lai
Hi,

I tried to reuse the IndexSearcher, but I have another question. What happen if an 
application server unloads the class after it is idle for a while, and then 
re-instantiate the object back when it recieves a new request?

Everytime the server re-instantiates the class, a new IndexSearcher instance will be 
created. If the IndexSearcher.close() method does not release all the memory and the 
server keeps unloading and re-instantiating the class, it will eventually hit the 
OutOfMemoryError issue. The test program from my previous email is simulating this 
condition. The reason why I instantiate/close the IndexSearcher inside the loop is to 
simulate the scenario when the server unloads and re-instantiates the object. I think 
that the same issue will happen if the application is written in servlet.

Although the singleton pattern may resolve the problem that I described above; 
however, it isn't permitted by the J2EE spec according to some news letters. In order 
words, I can't use singleton pattern in EJB. Please correct me if I am wrong on this.

Thanks,
Terence

 Reuse your IndexSearcher! :)
 
 Also, I think somebody has written some EJB stuff to work with Lucene. 
 The project is on SF.net.
 
 Otis
 
 --- Terence Lai [EMAIL PROTECTED] wrote:
 
  Hi All,
  
  I am getting a OutOfMemoryError when I deploy my EJB application. To
  debug the problem, I wrote the following test program:
  
  public static void main(String[] args) {
  try {
  Query query = getQuery();
  
  for (int i=0; i1000; i++) {
  search(query);
  
  if ( i%50 == 0 ) {
  System.out.println(Sleep...);
  Thread.currentThread().sleep(5000);
  System.out.println(Wake up!);
  }
  }
  } catch (Exception e) {
  e.printStackTrace();
  }
  }
  
  private static void search(Query query) throws IOException {
  FSDirectory fsDir = null;
  IndexSearcher is = null;
  Hits hits = null;
  
  try {
  fsDir = FSDirectory.getDirectory(C:\\index, false);
  is = new IndexSearcher(fsDir);
  SortField sortField = new
  SortField(profile_modify_date,
  SortField.STRING, true);
  
  hits = is.search(query, new Sort(sortField));
  } finally {
  if (is != null) {
  try {
  is.close();
  } catch (Exception ex) {
  }
  }
  
  if (fsDir != null) {
  try {
  is.close();
  } catch (Exception ex) {
  }
  }
  }
  
  }
  
  In the test program, I wrote a loop to keep calling the search
  method. Everytime it enters the search method, I would instantiate
  the IndexSearcher. Before I exit the method, I close the
  IndexSearcher and FSDirectory. I also made the Thread sleep for 5
  seconds in every 50 searches. Hopefully, this will give some time for
  the java to do the Garbage Collection. Unfortunately, when I observe
  the memory usage of my process, it keeps increasing until I got the
  java.lang.OutOfMemoryError.
  
  Note that I invoke the IndexSearcher.search(Query query, Sort sort)
  to process the search. If I don't specify the Sort field(i.e. using
  IndexSearcher.search(query)), I don't have this problem, and the
  memory usage keeps at a very static level.
  
  Does anyone experience a similar problem? Did I do something wrong in
  the test program. I throught by closing the IndexSearcher and the
  FSDirectory, the memory will be able to release during the Garbage
  Collection.
  
  Thanks,
  Terence
  
  
  
  
  --
  Get your free email account from http://www.trekspace.com
Your Internet Virtual Desktop!
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Re: OutOfMemoryError

2004-08-18 Thread David Sitsky
 I tried to reuse the IndexSearcher, but I have another question. What
 happen if an application server unloads the class after it is idle for a
 while, and then re-instantiate the object back when it recieves a new
 request?

The EJB spec takes this into account, as there are hook methods you can 
define that get called when your EJB object is about to be passivated or 
activated.  Search for something like passivate/active and/or 
ejbLoad/ejbSave.  This is where you should close/open your single index 
searcher object.

-- 
Cheers,
David

This message is intended only for the named recipient.  If you are not the 
intended recipient you are notified that disclosing, copying, distributing 
or taking any action  in reliance on the contents of this information is 
strictly prohibited.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Re: Re: OutOfMemoryError

2004-08-18 Thread Terence Lai
Hi David,

In my test program, I invoke the IndexSearcher.close() method at the end of the loop. 
However, it doesn't seems to release the memory. My concern is that even though I put 
the IndexSearcher.close() statement in the hook methods, it may not release all the 
memory until the application server is shut down. Every time the EJB object is 
re-actived, a new IndexSearcher is open. If the resources allocated to the previous 
IndexSearcher cannot be fully released, the system will use up more memory. 
Eventually, it may run into the OutOfMemoryError.

I am not very familiar with EJB. My interpretation could be wrong. I am going to try 
the hook methods. Thanks for pointing this out to me.

Terence

  I tried to reuse the IndexSearcher, but I have another question. What
  happen if an application server unloads the class after it is idle for a
  while, and then re-instantiate the object back when it recieves a new
  request?
 
 The EJB spec takes this into account, as there are hook methods you can 
 define that get called when your EJB object is about to be passivated or 
 activated.  Search for something like passivate/active and/or 
 ejbLoad/ejbSave.  This is where you should close/open your single index 
 searcher object.
 
 -- 
 Cheers,
 David
 
 This message is intended only for the named recipient.  If you are not the 
 intended recipient you are notified that disclosing, copying, distributing 
 or taking any action  in reliance on the contents of this information is 
 strictly prohibited.
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



OutOfMemoryError

2004-08-17 Thread Terence Lai
Hi All,

I am getting a OutOfMemoryError when I deploy my EJB application. To debug the 
problem, I wrote the following test program:

public static void main(String[] args) {
try {
Query query = getQuery();

for (int i=0; i1000; i++) {
search(query);

if ( i%50 == 0 ) {
System.out.println(Sleep...);
Thread.currentThread().sleep(5000);
System.out.println(Wake up!);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}

private static void search(Query query) throws IOException {
FSDirectory fsDir = null;
IndexSearcher is = null;
Hits hits = null;

try {
fsDir = FSDirectory.getDirectory(C:\\index, false);
is = new IndexSearcher(fsDir);
SortField sortField = new SortField(profile_modify_date,
SortField.STRING, true);

hits = is.search(query, new Sort(sortField));
} finally {
if (is != null) {
try {
is.close();
} catch (Exception ex) {
}
}

if (fsDir != null) {
try {
is.close();
} catch (Exception ex) {
}
}
}

}

In the test program, I wrote a loop to keep calling the search method. Everytime it 
enters the search method, I would instantiate the IndexSearcher. Before I exit the 
method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep for 5 
seconds in every 50 searches. Hopefully, this will give some time for the java to do 
the Garbage Collection. Unfortunately, when I observe the memory usage of my process, 
it keeps increasing until I got the java.lang.OutOfMemoryError.

Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the 
search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), I 
don't have this problem, and the memory usage keeps at a very static level.

Does anyone experience a similar problem? Did I do something wrong in the test 
program. I throught by closing the IndexSearcher and the FSDirectory, the memory will 
be able to release during the Garbage Collection.

Thanks,
Terence




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: OutOfMemoryError

2004-08-17 Thread Terence Lai
Sorry. I should make it more clear in my last email. I have implemented an EJB Session 
Bean executing the Lucene search. At the beginning, the session been is working fine. 
It returns the correct search results to me. As more and more search requests being 
processed, the server ends up having the OutOfMemoryError. If I restart the server, 
every thing works fine again.

Terence

 Hi All,
 
 I am getting a OutOfMemoryError when I deploy my EJB application. To debug the 
 problem, 
 I wrote the following test program:
 
 public static void main(String[] args) {
 try {
 Query query = getQuery();
 
 for (int i=0; i1000; i++) {
 search(query);
 
 if ( i%50 == 0 ) {
 System.out.println(Sleep...);
 Thread.currentThread().sleep(5000);
 System.out.println(Wake up!);
 }
 }
 } catch (Exception e) {
 e.printStackTrace();
 }
 }
 
 private static void search(Query query) throws IOException {
 FSDirectory fsDir = null;
 IndexSearcher is = null;
 Hits hits = null;
 
 try {
 fsDir = FSDirectory.getDirectory(C:\\index, false);
 is = new IndexSearcher(fsDir);
 SortField sortField = new SortField(profile_modify_date,
 SortField.STRING, true);
 
 hits = is.search(query, new Sort(sortField));
 } finally {
 if (is != null) {
 try {
 is.close();
 } catch (Exception ex) {
 }
 }
 
 if (fsDir != null) {
 try {
 is.close();
 } catch (Exception ex) {
 }
 }
 }
 
 }
 
 In the test program, I wrote a loop to keep calling the search method. Everytime 
 it enters the search method, I would instantiate the IndexSearcher. Before I exit 
 the method, I close the IndexSearcher and FSDirectory. I also made the Thread sleep 
 for 5 seconds in every 50 searches. Hopefully, this will give some time for the 
 java to do the Garbage Collection. Unfortunately, when I observe the memory usage 
 of my process, it keeps increasing until I got the java.lang.OutOfMemoryError.
 
 Note that I invoke the IndexSearcher.search(Query query, Sort sort) to process the 
 search. If I don't specify the Sort field(i.e. using IndexSearcher.search(query)), 
 I don't have this problem, and the memory usage keeps at a very static level.
 
 Does anyone experience a similar problem? Did I do something wrong in the test 
 program. 
 I throught by closing the IndexSearcher and the FSDirectory, the memory will be 
 able to release during the Garbage Collection.
 
 Thanks,
 Terence
 
 
 
 
 --
 Get your free email account from http://www.trekspace.com
   Your Internet Virtual Desktop!
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OutOfMemoryError

2004-08-17 Thread Daniel Naber
On Wednesday 18 August 2004 00:30, Terence Lai wrote:

   if (fsDir != null) {
 try {
   is.close();
 } catch (Exception ex) {
 }
   }

You close is here again, not fsDir. Also, it's a good idea to never ignore 
exceptions, you should at least print them out, even if it's just a 
close() that fails.

Regards
 Daniel

-- 
http://www.danielnaber.de


RE: Re: OutOfMemoryError

2004-08-17 Thread Terence Lai
Thanks for pointing this out. Even I fixed the code to close the fsDir and also add 
the ex.printStackTrace(System.out), I am still hitting the OutOfMemeoryError.

Terence

 On Wednesday 18 August 2004 00:30, Terence Lai wrote:
if (fsDir != null) { try {
is.close(); } catch (Exception ex) 
 { }   }
 You close is here again, not fsDir. Also, it's a good idea to never ignore 
 exceptions, 
 you should at least print them out, even if it's just a close() that fails.
 Regards Daniel
 -- http://www.danielnaber.de
 




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

2004-04-14 Thread Kevin A. Burton
petite_abeille wrote:

On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:

He mentioned that I might be able to squeeze 5-10% out of index 
merges this way.


Talking of which... what strategy(ies) do people use to minimize 
downtime when updating an index?

This should probably be a wiki page.

Anyway... two thoughts I had on the subject a while back:

You maintain two disk (not RAID ... you get reliability through software).

Searches are load balanced between disks for performance reasons.  If 
one fails you just stop using it.

When you want to do an index merge you read from disk0 and write to 
disk1.  Then you take disk0 out of search rotation and add disk1 and 
copy the contents of disk1 to disk two.  Users shouldn't notice much of 
a performance issue during the merge because it will be VERY fast and 
it's just reads from disk0.

Kevin

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

2004-04-13 Thread petite_abeille
On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:

He mentioned that I might be able to squeeze 5-10% out of index merges 
this way.
Talking of which... what strategy(ies) do people use to minimize 
downtime when updating an index?

My current strategy is as follow:

(1) use a temporary RAMDirectory for ongoing updates.
(2) perform a copy on write when flushing the RAMDirectory into the 
persistent index.

The second step means that I create an offline copy of a live index 
before invoking addIndexes() and then substitute the old index with the 
new, updated, one. While this effectively increase the time it takes to 
update an index, it nonetheless reduce the *perceived* downtime for it.

Thoughts? Alternative strategies?

TIA.

Cheers,

PA.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

2004-04-13 Thread Stephane James Vaucher
I'm actually pretty lazy about index updates, and haven't had the need for 
efficiency, since my requirement is that new documents should be 
available on a next working day basis.

I reindex everything from scatch every night (400,000 docs) and store it 
in an timestamped index. When the reindexing is done, I alert a controller 
of the new active index. I keep a few versions of the index in case of 
a failure somewhere and I can always send a message to the controller to 
use an old index.

cheers,
sv

On Tue, 13 Apr 2004, petite_abeille wrote:

 
 On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:
 
  He mentioned that I might be able to squeeze 5-10% out of index merges 
  this way.
 
 Talking of which... what strategy(ies) do people use to minimize 
 downtime when updating an index?
 
 My current strategy is as follow:
 
 (1) use a temporary RAMDirectory for ongoing updates.
 (2) perform a copy on write when flushing the RAMDirectory into the 
 persistent index.
 
 The second step means that I create an offline copy of a live index 
 before invoking addIndexes() and then substitute the old index with the 
 new, updated, one. While this effectively increase the time it takes to 
 update an index, it nonetheless reduce the *perceived* downtime for it.
 
 Thoughts? Alternative strategies?
 
 TIA.
 
 Cheers,
 
 PA.
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



OutOfMemoryError when using wildcard queries

2003-10-15 Thread Âkila
Hi,

Am using Lucene 1.2 and getting  OutOfMemoryError when searching using 
some wildcard queries.
Is there some provision that restricts the number of terms for wildcard 
queries?

Thanks,
Akila
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: OutOfMemoryError when using wildcard queries

2003-10-15 Thread Dan Quaroni
No, but the JVM does have a memory limit.  By default it's 64 megs, I
believe.  To increase it, use the -Xmx option when you run java.

For example, to give the JVM 100 megs of ram, you would write:

java -Xmx100m YourClassHere

-Original Message-
From: Âkila [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 15, 2003 9:15 AM
To: Lucene Users List
Subject: OutOfMemoryError when using wildcard queries


Hi,

Am using Lucene 1.2 and getting  OutOfMemoryError when searching using 
some wildcard queries.
Is there some provision that restricts the number of terms for wildcard 
queries?

Thanks,
Akila


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OutOfMemoryError when using wildcard queries

2003-10-15 Thread Victor Hadianto
 No, but the JVM does have a memory limit.  By default it's 64 megs, I
 believe.  To increase it, use the -Xmx option when you run java.

Dan  Akila,

I may be wrong. But I remembered a while back there is a dicussion about
limiting the number of terms expanded using the wildcard query. I'm not sure
whether it has been implemented or not. Searching the mailing list should
give you some pointers. Then you can implemented the patch back to 1.2 for
your ow need.

/vh


 For example, to give the JVM 100 megs of ram, you would write:

 java -Xmx100m YourClassHere

 -Original Message-
 From: Âkila [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 15, 2003 9:15 AM
 To: Lucene Users List
 Subject: OutOfMemoryError when using wildcard queries


 Hi,

 Am using Lucene 1.2 and getting  OutOfMemoryError when searching using
 some wildcard queries.
 Is there some provision that restricts the number of terms for wildcard
 queries?

 Thanks,
 Akila


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: too many hits - OutOfMemoryError

2003-05-29 Thread Cory Albright
Thanks for the info, but unfortunately it still is getting an OutOfMemoryError,
Here's my code:
--
final BitSet bits = new BitSet();
HitCollector hc = new HitCollector() {
 public void collect(int doc, float score){
 System.out.println(collect);
 if (score  THRESHOLD) {
 bits.set(doc);
 }
 }
};
mSearcher.search(query, hc);
System.out.println(  results (+bits.cardinality()+):\n);
-
When I search with a low-hit query, collect is printed many times.
When I search with a query I know will hit most of the 1.8 million records, 
the collect print
does not even print, it eats up the 700+MB I allocated and then throws an 
OutOfMemoryError.  Did
I do something wrong?

Thanks for you help,

Cory









At 09:43 PM 5/27/2003 +0200, you wrote:
 Hits hits = searcher.search(myQuery);

BitSet results = new BitSet();

searcher.search(myQuery, new HitCollector()
{
  public void collect(int doc, float score)
  {
if (score  THRESHOLD)
  results.set(doc);
  }
});
--
Eric Jain
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: too many hits - OutOfMemoryError

2003-05-29 Thread Eric Jain
 When I search with a query I know will hit most of the 1.8 million
 records, the collect print
 does not even print, it eats up the 700+MB I allocated and then
 throws an OutOfMemoryError.

Are you using wildcard queries?

--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: too many hits - OutOfMemoryError

2003-05-29 Thread Cory Albright
Yes. Is that the problem?

At 05:13 PM 5/28/2003 +0200, you wrote:
 When I search with a query I know will hit most of the 1.8 million
 records, the collect print
 does not even print, it eats up the 700+MB I allocated and then
 throws an OutOfMemoryError.
Are you using wildcard queries?

--
Eric Jain
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: too many hits - OutOfMemoryError

2003-05-29 Thread Eric Jain
 Yes. Is that the problem?

I believe a term with a wildcard is expanded into all possible terms in
memory before searching for it, so if the term is 'a*', and you have a
million different terms starting with 'a' occuring in your documents,
it's quite possible to run out of memory.

Does anyone know if there is a setting that limits the number of terms
for wildcard queries?

--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: too many hits - OutOfMemoryError

2003-05-29 Thread David_Birthwell

Cory,

When performing wildcard queries, the bulk of the memory is used during
wildcard term expansion.  The memory requirement is proportional to the
number of matching terms, not the number of hits.

You should make sure you are using the latest Lucene.  There was a fix in
1.3 to reduce the memory requirements of all all queries.

But, wildcard queries that expand to many terms are allways going to be
memory intensive in Lucene.  We ran into this problem and decided to put a
check on the number of expanded terms and abort the query if the number got
too high.  If you're ambitious, you could modify the Lucene source to
serialize the query process for queries with a large number of terms, but
that would be a bit of work.  If you absolutely require these huge wildcard
queries, then you may have to look into it, though.

Non-wildcard queries that return a large result set should not be a memory
problem, though.

Dave




   
 
  Cory Albright
 
  [EMAIL PROTECTED]To:   Lucene Users List  
  
  com  [EMAIL PROTECTED]
   cc: 
 
  05/28/03 11:16 AMSubject:  Re: too many hits - 
OutOfMemoryError   
  Please respond to
 
  Lucene Users
 
  List
 
   
 
   
 




Yes. Is that the problem?

At 05:13 PM 5/28/2003 +0200, you wrote:
  When I search with a query I know will hit most of the 1.8 million
  records, the collect print
  does not even print, it eats up the 700+MB I allocated and then
  throws an OutOfMemoryError.

Are you using wildcard queries?

--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: too many hits - OutOfMemoryError

2003-05-29 Thread Cory Albright
Thanks for the help!  Yes, it works fine without a wildcard search, which
I believe at this point will be ok for our app.
Thanks again,

Cory

At 11:50 AM 5/28/2003 -0400, you wrote:

Cory,

When performing wildcard queries, the bulk of the memory is used during
wildcard term expansion.  The memory requirement is proportional to the
number of matching terms, not the number of hits.
You should make sure you are using the latest Lucene.  There was a fix in
1.3 to reduce the memory requirements of all all queries.
But, wildcard queries that expand to many terms are allways going to be
memory intensive in Lucene.  We ran into this problem and decided to put a
check on the number of expanded terms and abort the query if the number got
too high.  If you're ambitious, you could modify the Lucene source to
serialize the query process for queries with a large number of terms, but
that would be a bit of work.  If you absolutely require these huge wildcard
queries, then you may have to look into it, though.
Non-wildcard queries that return a large result set should not be a memory
problem, though.
Dave





  Cory 
Albright
  [EMAIL PROTECTED]To:   Lucene Users 
List
  com 
[EMAIL PROTECTED]
   cc: 

  05/28/03 11:16 AMSubject:  Re: too many 
hits - OutOfMemoryError
  Please respond 
to
  Lucene 
Users
  List 









Yes. Is that the problem?

At 05:13 PM 5/28/2003 +0200, you wrote:
  When I search with a query I know will hit most of the 1.8 million
  records, the collect print
  does not even print, it eats up the 700+MB I allocated and then
  throws an OutOfMemoryError.

Are you using wildcard queries?

--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: too many hits - OutOfMemoryError

2003-05-29 Thread Eric Jain
 We ran into this problem and decided to put a check
 on the number of expanded terms and abort the query
 if the number got too high.

Is it possible to perform this check without having to modify Lucene's
source code?


--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: too many hits - OutOfMemoryError

2003-05-29 Thread David_Birthwell

Unfortunately, no.
The modifications are not very extreme, though.
If you're interested in seeing our approach, let me know.

DaveB




   
 
  Eric Jain  
 
  [EMAIL PROTECTED]To:   Lucene Users List  
  
  b.ch [EMAIL PROTECTED]
   cc: 
 
  05/28/03 12:22 PMSubject:  Re: too many hits - 
OutOfMemoryError   
  Please respond to
 
  Lucene Users
 
  List
 
   
 
   
 




 We ran into this problem and decided to put a check
 on the number of expanded terms and abort the query
 if the number got too high.

Is it possible to perform this check without having to modify Lucene's
source code?


--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: too many hits - OutOfMemoryError

2003-05-29 Thread Harpreet S Walia
Hi,

I have been following this discussion and as i am anticipating such a
problem when my index size grows , i would like to hear your approach on
limiting the query expansion.

Regards,
Harpreet

- Original Message -
From: [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, May 28, 2003 10:21 PM
Subject: Re: too many hits - OutOfMemoryError



 Unfortunately, no.
 The modifications are not very extreme, though.
 If you're interested in seeing our approach, let me know.

 DaveB





   Eric Jain
   [EMAIL PROTECTED]To:   Lucene Users
List
   b.ch
[EMAIL PROTECTED]
cc:
   05/28/03 12:22 PMSubject:  Re: too many
hits - OutOfMemoryError
   Please respond to
   Lucene Users
   List






  We ran into this problem and decided to put a check
  on the number of expanded terms and abort the query
  if the number got too high.

 Is it possible to perform this check without having to modify Lucene's
 source code?


 --
 Eric Jain


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]







 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



too many hits - OutOfMemoryError

2003-05-27 Thread Cory Albright
Hi -
	I have created an index of 1.8 million documents, each document containing 
5-10 fields.  When I run a search, that I know has a small number of hits, 
it works great.  However, if I run a search that I know will hit most of 
the documents, I get  an OutOfMemoryError.I am using the basic search call:

	Hits hits = searcher.search(myQuery);

	Is there a better way to handle this problem?  Without knowing the number 
hits that a given query will return, how do I prevent against this problem?

Thank you,

Cory Albright

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: too many hits - OutOfMemoryError

2003-05-27 Thread David Medinets
Out of curiosity, how much free RAM does the computer normally have? And
have you tried increasing the amount available to the JVM?

David Medinets
http://www.codebits.com



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: too many hits - OutOfMemoryError

2003-05-27 Thread Cory Albright
The computer is a 1.7Ghz P4 with 1.25GB Ram.  I tried the jvm 
arg:  -Xmx700M  (as I had a little over 700MB free).

Cory Albright

At 02:37 PM 5/27/2003 -0400, you wrote:
Out of curiosity, how much free RAM does the computer normally have? And
have you tried increasing the amount available to the JVM?
David Medinets
http://www.codebits.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: too many hits - OutOfMemoryError

2003-05-27 Thread Eric Jain
 Hits hits = searcher.search(myQuery);

BitSet results = new BitSet();

searcher.search(myQuery, new HitCollector()
{
  public void collect(int doc, float score)
  {
if (score  THRESHOLD)
  results.set(doc);
  }
});

--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OutOfMemoryError with boolean queries

2003-03-19 Thread Otis Gospodnetic
Robert,

I'm moving this to lucene-user, which is a more appropriate list for
this type of a problem.
You are not saying whether you are using some of those handy -X (-Xms
-Xmx) command line switches when you invoke your application that dies
with OutOfMemoryError.
If you are not, try that, it may help.  I recall a few other people
reporting the same problem and using -Xms and -Xmx solved their
problem.

If your machine doesn't have the RAM it needs this won't help, of
course :)

Otis


--- Robert_Wennström [EMAIL PROTECTED] wrote:
 Hi,
 
 I'm experiencing OutOfMemoryErrors when searching using many logical
 ANDs
 combined with prefix queries.
 The reason is clearly too many returned hits waiting for combined
 evaluation.
 
 I was wondering if there are any thoughts of changing the search
 approach to
 something less memory consuming.
 This is quite a big problem even with small sets of documents.
 
 Example:
 my 55000 document index runs out of memory when searching for  a* AND
 e*
 
 Could you estimate the difficulty to change the behaviour to search
 for e* in
 just the hits matching a* ?
 I'm about to put a BitSet at the innermost hit collection to sort out
 AND-clauses that hasn't been matched by previous AND-clauses.
 Is there a better approach ?
 
 
 Thanks for a great java project guys
 
 
 Robert Wennström [developer, netadmin]
 robert -at- agent25.com
 www.agent25.com
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: OutOfMemoryError with boolean queries

2003-03-19 Thread Robert Wennstrm
Sorry. I wasn't verbose enough.

I use the default memory settings. But my issue was the core structure of Lucene
taking up (it seems to me) more memory than it would have to, if it had a
different approach.
Correct me if I'm wrong, but it seems to me that BooleanQuery stores all hits
(as Bucket objects) from all terms in the query even if it is a simple  war* AND
wash* AND sad*. Instead of looking for wash* just in the war* hits (and then
looking for sad* in the remaining hits) it makes three separate searches, which
would be a waste of memory.

- test output begins -

Index size = 55000
Query: a*
Total memory before: 2031616
Searching for: a* (org.apache.lucene.search.PrefixQuery)
Total memory after: 55128064
53527 total matching documents (1984ms)
Query: e*
Total memory before: 55128064
Searching for: e* (org.apache.lucene.search.PrefixQuery)
Total memory after: 55128064
52456 total matching documents (984ms)
Query: a* AND e*
Total memory before: 55128064
Searching for: +a* +e* (org.apache.lucene.search.BooleanQuery)
Total memory after: 124882944
51267 total matching documents (2468ms)

- test output ends -

In my perfect world the memory allocation, when searching for  a* AND e*, should
not increase at all after the both separate searches  a*  and  e*, cause it
would just allocate space for a*-hits, and ignoring e*-hits that has no previous
hit.


My biggest index lies at 2,34 million documents during testing, but should grow
with approximately 1 docs/day in production.
With that figure I wish for the best possible memory handling.


At the moment we use a search engine that, given the right question (or wrong),
consumes memory like a starving wolf and crashes the whole thing. The search
engine should be able to play with about 1GB RAM on the machine.
I just don't want the same possibilities of a crash with Lucene too.


I want to know if the Lucene developers feel that there are things to optimize
or if they have done everything like it should be from the start ?


thanks
/RW


 -Ursprungligt meddelande-
 Frn: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
 Skickat: den 19 mars 2003 16:19
 Till: [EMAIL PROTECTED]
 mne: Re: OutOfMemoryError with boolean queries
 
 
 Robert,
 
 I'm moving this to lucene-user, which is a more appropriate list for
 this type of a problem.
 You are not saying whether you are using some of those handy -X (-Xms
 -Xmx) command line switches when you invoke your application that dies
 with OutOfMemoryError.
 If you are not, try that, it may help.  I recall a few other people
 reporting the same problem and using -Xms and -Xmx solved their
 problem.
 
 If your machine doesn't have the RAM it needs this won't help, of
 course :)
 
 Otis
 
 
 --- Robert_Wennstrm [EMAIL PROTECTED] wrote:
  Hi,
  
  I'm experiencing OutOfMemoryErrors when searching using many logical
  ANDs
  combined with prefix queries.
  The reason is clearly too many returned hits waiting for combined
  evaluation.
  
  I was wondering if there are any thoughts of changing the search
  approach to
  something less memory consuming.
  This is quite a big problem even with small sets of documents.
  
  Example:
  my 55000 document index runs out of memory when searching 
 for  a* AND
  e*
  
  Could you estimate the difficulty to change the behaviour to search
  for e* in
  just the hits matching a* ?
  I'm about to put a BitSet at the innermost hit collection 
 to sort out
  AND-clauses that hasn't been matched by previous AND-clauses.
  Is there a better approach ?
  
  
  Thanks for a great java project guys
  
  
  Robert Wennstrm [developer, netadmin]
  robert -at- agent25.com
  www.agent25.com
  
  
  
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 
 __
 Do you Yahoo!?
 Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
 http://platinum.yahoo.com
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OutOfMemoryError

2001-11-29 Thread Chantal Ackermann

hi Ian, hi Winton, hi all,

sorry I meant heap size of 100Mb. I'm  starting java with -Xmx100m. I'm not 
setting -Xms.

For what I know now, I had a bug in my own code. still I don't understand 
where these OutOfMemoryErrors came from. I will try to index again in one 
thread without RAMDirectory just to check if the program is sane.

the problem that the files get to big while merging remains. I wonder why 
there is not the possibility to tell lucene not to create files that are 
bigger than the system limit. how am i supposed to know after how many 
documents this limit is reached? lucene creates the documents - i just know 
the average size of a piece of text that is the input for a document. or am I 
missing something?!

chantal

Am Mittwoch, 28. November 2001 20:14 schrieben Sie:
 Were you using -mx and -ms (setting heap size ?)

   Cheers,
Winton

 As I run the program on a multi-processor machine I now changed the code
  to index each file in a single thread and write to one single
  IndexWriter. the merge factor is still at 10. maxMergeDocs is at
  1.000.000. I set the maximum heap size to 1MB.
 

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: OutOfMemoryError

2001-11-29 Thread Ian Lea

Doug sent the message below to the list on 3-Nov in response to
a query about file size limits.  There may have been more
related stuff on the thread as well.


--
Ian.



   *** Anyway, is there anyway to control how big the indexes 
 grow ? 

The easiset thing is to set IndexWriter.maxMergeDocs. Since you hit 2GB at
8M docs, set this to 7M.  That will keep Lucene from trying to merge an
index that won't fit in your filesystem.  (It will actually effectively
round this down to the next lower power of Index.mergeFactor.  So with the
default mergeFactor=10, maxMergeDocs=7M will generate a series of 1M
document indexes, since merging 10 of these would exceed the max.)

Slightly more complex: you could further minimize the number of segments,
if, when you've added seven million documents, optimize the index and start
a new index.  Then use MultiSearcher to search.

Even more complex and optimal: write a version of FSDirectory that, when a
file exceeds 2GB, creates a subdirectory and represents the file as a series
of files.  (I've done this before, and found that, on at least the version
of Solaris that I was using, the files had to be a few 100k less than 2GB
for programs like 'cp' and 'ftp' to operate correctly on them.)

Doug




Chantal Ackermann wrote:
 
 hi Ian, hi Winton, hi all,
 
 sorry I meant heap size of 100Mb. I'm  starting java with -Xmx100m. I'm not
 setting -Xms.
 
 For what I know now, I had a bug in my own code. still I don't understand
 where these OutOfMemoryErrors came from. I will try to index again in one
 thread without RAMDirectory just to check if the program is sane.
 
 the problem that the files get to big while merging remains. I wonder why
 there is not the possibility to tell lucene not to create files that are
 bigger than the system limit. how am i supposed to know after how many
 documents this limit is reached? lucene creates the documents - i just know
 the average size of a piece of text that is the input for a document. or am I
 missing something?!
 
 chantal

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: OutOfMemoryError

2001-11-29 Thread Steven J. Owens

Chantal,
 For what I know now, I had a bug in my own code. still I don't understand 
 where these OutOfMemoryErrors came from. I will try to index again in one 
 thread without RAMDirectory just to check if the program is sane.

 Java often has misleading error messages.  For example, on
solaris machines the default ulimit used to be 24 - that's 24 open
file handles!  Yeesh. This will cause an OutOfMemoryError.  So don't
assume it's actually a memory problem, particularly if a memory
problem doesn't particularly make sense.  Just a thought.

Steven J. Owens
[EMAIL PROTECTED]

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: OutOfMemoryError

2001-11-29 Thread Steven J. Owens

I wrote:
   Java often has misleading error messages.  For example, on
  solaris machines the default ulimit used to be 24 - that's 24 open
  file handles!  Yeesh. This will cause an OutOfMemoryError.  So don't

Jeff Trent replied:
 Wow.  I did not know that!
 
 I also don't see an option to increase that limit from java -X.  Do you know
 how to increase that limit?

 That's used to be, I think it's larger on newer machines.  I
don't think there's a java command line option to set this, it's a
system limit.  The solaris command to check it is ulimit.  To set it
for a given login process (assuming sufficient privileges) use ulimit
number (i.e.  ulimit 128).  ulimit -a prints out all limits.

Steven J. Owens
[EMAIL PROTECTED]



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




OutOfMemoryError

2001-11-28 Thread Chantal Ackermann

hi to all,

please help! I think I mixed my brain up already with this stuff...

I'm trying to index about 29 textfiles where the biggest one is ~700Mb and 
the smallest ~300Mb. I achieved once to run the whole index, with a merge 
factor = 10 and maxMergeDocs=1. This took more than 35 hours I think 
(don't know exactly) and it didn't use much RAM (though it could have). 
unfortunately I had a call to optimize at the end and while optimization an 
IOException (File to big) occured (while merging).

As I run the program on a multi-processor machine I now changed the code to 
index each file in a single thread and write to one single IndexWriter. the 
merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum 
heap size to 1MB.

I tried to use RAMDirectory (as mentioned in the mailing list) and just use 
IndexWriter.addDocument(). At the moment it seems not to make any difference. 
after a while _all_ the threads exit one after another (not all at once!) 
with an OutOfMemoryError. the priority of all of them is at the minimum.

even if the multithreading doesn't increase performance I would be glad if I 
could just once get it running again.

I would be even happier if someone could give me a hint what would be the 
best way to index this amount of data. (the average size of an entry that 
gets parsed for a Document is about 1Kb.)

thanx for any help!
chantal

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: OutOfMemoryError

2001-11-28 Thread Ian Lea

I've loaded a large (but not as large as yours) index with mergeFactor
set to 1000.  Was substantially faster than with default setting. 
Making it higher didn't seem to make things much faster but did cause
it to use more memory. In addition I loaded the data in chunks in
separate processes and optimized the index after each chunk, again
in a separate process.  All done straight to disk, no messing about
with RAMDirectories.

Didn't play with maxMergeDocs and am not sure what you mean by
maximum heap size but 1MB doesn't sound very large.



--
Ian.
[EMAIL PROTECTED]


Chantal Ackermann wrote:
 
 hi to all,
 
 please help! I think I mixed my brain up already with this stuff...
 
 I'm trying to index about 29 textfiles where the biggest one is ~700Mb and
 the smallest ~300Mb. I achieved once to run the whole index, with a merge
 factor = 10 and maxMergeDocs=1. This took more than 35 hours I think
 (don't know exactly) and it didn't use much RAM (though it could have).
 unfortunately I had a call to optimize at the end and while optimization an
 IOException (File to big) occured (while merging).
 
 As I run the program on a multi-processor machine I now changed the code to
 index each file in a single thread and write to one single IndexWriter. the
 merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum
 heap size to 1MB.
 
 I tried to use RAMDirectory (as mentioned in the mailing list) and just use
 IndexWriter.addDocument(). At the moment it seems not to make any difference.
 after a while _all_ the threads exit one after another (not all at once!)
 with an OutOfMemoryError. the priority of all of them is at the minimum.
 
 even if the multithreading doesn't increase performance I would be glad if I
 could just once get it running again.
 
 I would be even happier if someone could give me a hint what would be the
 best way to index this amount of data. (the average size of an entry that
 gets parsed for a Document is about 1Kb.)
 
 thanx for any help!
 chantal

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]