Re: Pooling of posting objects in DocumentsWriter

2008-04-17 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > On Apr 13, 2008, at 2:28 AM, Michael McCandless wrote: > > > > In fact, RawPosting "objects" aren't even malloc'd individually -- they're > > > assigned memory chunks by a custom MemoryPool object (which *does* have to > > > maintain awareness of alig

Re: Pooling of posting objects in DocumentsWriter

2008-04-15 Thread Marvin Humphrey
On Apr 13, 2008, at 2:28 AM, Michael McCandless wrote: In fact, RawPosting "objects" aren't even malloc'd individually -- they're assigned memory chunks by a custom MemoryPool object (which *does* have to maintain awareness of alignment issues). You can't do this: Right, it's that alignm

Re: Pooling of posting objects in DocumentsWriter

2008-04-13 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > On Apr 12, 2008, at 2:29 AM, Michael McCandless wrote: > > > > > > > The total allocation for any given RawPosting can be calculated like so: > > > > > > offsetof(RawPosting, blob) + posting->content_len + posting->aux_len > > > > > > > Probably you

Re: Pooling of posting objects in DocumentsWriter

2008-04-12 Thread Marvin Humphrey
On Apr 12, 2008, at 2:29 AM, Michael McCandless wrote: The total allocation for any given RawPosting can be calculated like so: offsetof(RawPosting, blob) + posting->content_len + posting- >aux_len Probably you also need to add in bytes lost to alignment requirements, when content_len +

Re: Pooling of posting objects in DocumentsWriter

2008-04-12 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > On Apr 10, 2008, at 2:37 AM, Michael McCandless wrote: > > > > > > IMO, the abstract base Posting class should not track text. It should > > > include only one datum: a document number. This keeps it in line with the > > > simplest IR definition for

Re: Pooling of posting objects in DocumentsWriter

2008-04-10 Thread Marvin Humphrey
On Apr 10, 2008, at 2:37 AM, Michael McCandless wrote: IMO, the abstract base Posting class should not track text. It should include only one datum: a document number. This keeps it in line with the simplest IR definition for a "posting": one document matching one term. But how do you t

Re: Flexible indexing design (was Re: Pooling of posting objects in DocumentsWriter)

2008-04-10 Thread Michael McCandless
Michael Busch <[EMAIL PROTECTED]> wrote: > > I agree we would have an abstract base Posting class that just tracks > > the term text. > > > > Then, DocumentsWriter manages inverting each field, maintaining the > > per-field hash of term Text -> abstract Posting instances, exposing > > the methods

Re: Pooling of posting objects in DocumentsWriter

2008-04-10 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > On Apr 8, 2008, at 10:25 AM, Michael McCandless wrote: > > > I've actually been working on factoring DocumentsWriter, as a first > > step towards flexible indexing. > > > > The way I handled this in KS was to turn Posting into a class akin to > TermB

Flexible indexing design (was Re: Pooling of posting objects in DocumentsWriter)

2008-04-09 Thread Michael Busch
Thanks for your quick answers. Michael McCandless wrote: Hi Michael, I've actually been working on factoring DocumentsWriter, as a first step towards flexible indexing. Cool, yeah separating the DocumentsWriter into multiple classes certainly helped understanding the complex code better.

Re: Pooling of posting objects in DocumentsWriter

2008-04-08 Thread Marvin Humphrey
On Apr 8, 2008, at 10:25 AM, Michael McCandless wrote: I've actually been working on factoring DocumentsWriter, as a first step towards flexible indexing. The way I handled this in KS was to turn Posting into a class akin to TermBuffer: the individual Posting object persists, but its values

Re: Pooling of posting objects in DocumentsWriter

2008-04-08 Thread Michael McCandless
Hi Michael, I've actually been working on factoring DocumentsWriter, as a first step towards flexible indexing. I agree we would have an abstract base Posting class that just tracks the term text. Then, DocumentsWriter manages inverting each field, maintaining the per-field hash of term Text ->

Pooling of posting objects in DocumentsWriter

2008-04-08 Thread Michael Busch
Hi, this is most likely a question for Mike. I'm trying to figure out what changes we need to make in order to support flexible indexing and LUCENE-1231. Currently I'm looking into the DocumentsWriter. If we want to support different posting lists, then we probably want to change the Posting