Re: Lucene memory usage

2009-06-12 Thread Michael McCandless
On Thu, Jun 11, 2009 at 4:30 PM, Jason Rutherglen wrote: >> Yes please post feature requests to Sun ;) > > I signed up for > http://mail.openjdk.java.net/mailman/listinfo/nio-discuss Looks like a fun list ;) >> But I think in the short term Lucene will have to drop to > native code to tell OS not

Re: Lucene memory usage

2009-06-11 Thread Jason Rutherglen
> Yes please post feature requests to Sun ;) I signed up for http://mail.openjdk.java.net/mailman/listinfo/nio-discuss > But I think in the short term Lucene will have to drop to native code to tell OS not to cache bytes read by segment merging... LUCENE-1121 uses transferTo which presumably doe

Re: Lucene memory usage

2009-06-11 Thread Michael McCandless
On Thu, Jun 11, 2009 at 3:21 PM, Jason Rutherglen wrote: > Makes sense. > > Currently MMapDirectory doesn't write using mapped byte buffers, > would the memory management of the OS behave differently if we > were writing to the MMapped bytebuffers as opposed to writing to > an RAF (like with FSDir)

Re: Lucene memory usage

2009-06-11 Thread Jason Rutherglen
Maybe we can put together our requested IO operations and submit them for inclusion in NIO Java 7? http://openjdk.java.net/projects/nio/ On Thu, Jun 11, 2009 at 12:21 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Makes sense. > > Currently MMapDirectory doesn't write using mapped b

Re: Lucene memory usage

2009-06-11 Thread Jason Rutherglen
Makes sense. Currently MMapDirectory doesn't write using mapped byte buffers, would the memory management of the OS behave differently if we were writing to the MMapped bytebuffers as opposed to writing to an RAF (like with FSDir)? > it's blind LRU approach is often a poor policy (eg for terms di

Re: Lucene memory usage

2009-06-11 Thread Michael McCandless
On Wed, Jun 10, 2009 at 9:24 PM, Jason Rutherglen wrote: > I read over the LUCENE-1458 comments again. Interesting. I think > the most compelling argument is that the various files we're > normally loading into the heap are, after merging, in the IO > cache. If we can simply reuse the IO cache rath

Re: Lucene memory usage

2009-06-10 Thread Jason Rutherglen
I read over the LUCENE-1458 comments again. Interesting. I think the most compelling argument is that the various files we're normally loading into the heap are, after merging, in the IO cache. If we can simply reuse the IO cache rather then allocate a bunch of redundant arrays in heap, we could be

Re: Lucene memory usage

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 7:23 PM, Jason Rutherglen wrote: > Cool! Sounds like with LUCENE-1458 we can experiment with some > of these things. Does CSF become just another codec? I believe LUCENE-1458 currently only makes terms dict & postings pluggable... >> I'm leary of having terms dict live ent

Re: Lucene memory usage

2009-06-10 Thread Jason Rutherglen
Cool! Sounds like with LUCENE-1458 we can experiment with some of these things. Does CSF become just another codec? > I'm leary of having terms dict live entirely on disk, though we should certainly explore it. Yeah, it should theoretically help with reloading, it could use a skiplist (as we have

Re: Lucene memory usage

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 4:13 PM, Jason Rutherglen wrote: > Great! If I understand correctly it looks like RAM savings? Will > there be an improvement in lookup speed? (We're using binary > search here?). Yes, sizable RAM reduction for apps that have many unique terms. And, init'ing (warming) the

Re: Lucene memory usage

2009-06-10 Thread Jason Rutherglen
Great! If I understand correctly it looks like RAM savings? Will there be an improvement in lookup speed? (We're using binary search here?). Is there a precedence in database systems for what was mentioned about placing the term dict, delDocs, and filters onto disk and reading them from there (wit