Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Hi All, Does anybody have experience dynamically varying maxBufferedDocs? In my app, I can never truncate docs and so work with maxFieldLength set to Integer.MAX_VALUE. Some documents are large, over 100 MBytes. Most documents are tiny. So a fixed value of maxBufferedDocs to avoid OOM's

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Yonik Seeley
On 11/9/06, Chuck Williams [EMAIL PROTECTED] wrote: My main concern is that the mergeFactor escalation merging logic will somehow behave poorly in the presence of dynamically varying initial segment sizes. Things will work as expected with varying segments sizes, but *not* varying

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Thanks Yonik! Poor wording on my part. I won't vary maxBufferedDocs, just am making flushRamSegments() public and calling it externally (properly synchronized), earlier than it would otherwise be called from ongoing addDocument-driven merging. Sounds like this should work. Chuck Yonik Seeley

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Yonik Seeley
On 11/9/06, Chuck Williams [EMAIL PROTECTED] wrote: Thanks Yonik! Poor wording on my part. I won't vary maxBufferedDocs, just am making flushRamSegments() public and calling it externally (properly synchronized), earlier than it would otherwise be called from ongoing addDocument-driven

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Yonik Seeley wrote on 11/09/2006 08:50 AM: For best behavior, you probably want to be using the current (svn-trunk) version of Lucene with the new merge policy. It ensures there are mergeFactor segments with size = maxBufferedDocs before triggering a merge. This makes for faster indexing in

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Chuck Williams wrote on 11/09/2006 08:55 AM: Yonik Seeley wrote on 11/09/2006 08:50 AM: For best behavior, you probably want to be using the current (svn-trunk) version of Lucene with the new merge policy. It ensures there are mergeFactor segments with size = maxBufferedDocs before

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Michael Busch
I had the same problem with large documents causing memory problems. I solved this problem by introducing a new setting in IndexWriter setMaxBufferSize(long). Now a merge is either triggered when bufferedDocs==maxBufferedDocs *or* the size of the bufferedDocs = maxBufferSize. I made these

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
This sounds good. Michael, I'd love to see your patch, Chuck Michael Busch wrote on 11/09/2006 09:13 AM: I had the same problem with large documents causing memory problems. I solved this problem by introducing a new setting in IndexWriter setMaxBufferSize(long). Now a merge is either

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Michael Busch
This sounds good. Michael, I'd love to see your patch, Chuck Ok, I'll probably need a few days before I can submit it (have to code unit tests and check if it compiles with the current head), because I'm quite busy with other stuff right now. But you will get it soon :-)

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Michael Busch wrote on 11/09/2006 09:56 AM: This sounds good. Michael, I'd love to see your patch, Chuck Ok, I'll probably need a few days before I can submit it (have to code unit tests and check if it compiles with the current head), because I'm quite busy with other stuff right now.