Re: Flexible indexing design

2008-04-29 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > Container is only aware of the single inStream, while codec can still > > think its operating on 3 even if it's really 1 or 2. > > > > I don't understand. If you have three streams, all of them are going to > have to get skipped, right? For the "all

Re: Flexible indexing design

2008-04-28 Thread Marvin Humphrey
On Apr 27, 2008, at 3:28 AM, Michael McCandless wrote: Actually, I was picturing that the container does the seeking itself (using skip data), to get "close" to the right point, and then it uses the codec to step through single docs at a time until it's at or beyond the right one. I believe i

Re: Flexible indexing design

2008-04-27 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > > Seeking might get a little weird, I suppose. > > > > Maybe not?: if the container is only aware of the single InStream, and > > say it's "indexed" with a multi-skip index, then when you ask > > container to seek, it forwards the request to multi-ski

Re: Flexible indexing design

2008-04-24 Thread Marvin Humphrey
On Apr 24, 2008, at 4:47 AM, Michael McCandless wrote: Seeking might get a little weird, I suppose. Maybe not?: if the container is only aware of the single InStream, and say it's "indexed" with a multi-skip index, then when you ask container to seek, it forwards the request to multi-skip whic

Re: Flexible indexing design

2008-04-24 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > On Apr 17, 2008, at 11:57 AM, Michael McCandless wrote: > > > > If I have a pluggable indexer, > > then on the querying side I need something (I'm not sure what/how) > > that knows how to create the right demuxer (container) and codec > > (decoder) to

Re: Flexible indexing design

2008-04-18 Thread Marvin Humphrey
On Apr 17, 2008, at 11:57 AM, Michael McCandless wrote: If I have a pluggable indexer, then on the querying side I need something (I'm not sure what/how) that knows how to create the right demuxer (container) and codec (decoder) to interact with whatever my indexing plugins wrote. So I don't t

Re: Flexible indexing design

2008-04-17 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > On Apr 13, 2008, at 2:35 AM, Michael McCandless wrote: > > > > I think the major difference is locality? In a compound file, you > > have to seek "far away" to reach the prx & skip data (if they are > > separate). > > There's another item worth mentio

Re: Flexible indexing design

2008-04-15 Thread Marvin Humphrey
On Apr 13, 2008, at 2:35 AM, Michael McCandless wrote: I think the major difference is locality? In a compound file, you have to seek "far away" to reach the prx & skip data (if they are separate). There's another item worth mentioning, something that Doug, Grant and I discussed when this

Re: Flexible indexing design

2008-04-13 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > On Apr 10, 2008, at 3:10 AM, Michael McCandless wrote: > > > > Can't you compartmentalize while still serializing skip data into the > > single frq/prx file? > > > > Yes, that's possible. > > The way KS is set up right now, PostingList objects maintai

Re: Flexible indexing design

2008-04-12 Thread Marvin Humphrey
On Apr 10, 2008, at 3:10 AM, Michael McCandless wrote: Can't you compartmentalize while still serializing skip data into the single frq/prx file? Yes, that's possible. The way KS is set up right now, PostingList objects maintain i/o state, and Posting's Read_Record() method just deals with

Re: Flexible indexing design

2008-04-10 Thread Michael McCandless
Marvin Humphrey <[EMAIL PROTECTED]> wrote: > On Apr 9, 2008, at 6:35 AM, Michael Busch wrote: > > > > We also need to come up with a good solution for the dictionary, because a > term with frq/prx postings needs to store two (or three for skiplist) file > pointers in the dictionary, whereas e. g. a

Re: Flexible indexing design (was Re: Pooling of posting objects in DocumentsWriter)

2008-04-10 Thread Michael McCandless
Michael Busch <[EMAIL PROTECTED]> wrote: > > I agree we would have an abstract base Posting class that just tracks > > the term text. > > > > Then, DocumentsWriter manages inverting each field, maintaining the > > per-field hash of term Text -> abstract Posting instances, exposing > > the methods

Re: Flexible indexing design

2008-04-09 Thread Marvin Humphrey
On Apr 9, 2008, at 6:35 AM, Michael Busch wrote: We also need to come up with a good solution for the dictionary, because a term with frq/prx postings needs to store two (or three for skiplist) file pointers in the dictionary, whereas e. g. a "binary" posting list only needs one pointer.

Flexible indexing design (was Re: Pooling of posting objects in DocumentsWriter)

2008-04-09 Thread Michael Busch
Thanks for your quick answers. Michael McCandless wrote: Hi Michael, I've actually been working on factoring DocumentsWriter, as a first step towards flexible indexing. Cool, yeah separating the DocumentsWriter into multiple classes certainly helped understanding the complex code better.