Re: Interleaving and new Lucene formats

2013-02-17 Thread Dawid Weiss
>From IndexReader API javadoc: There are two different types of IndexReaders: {@link AtomicReader}: These indexes do not consist of several sub-readers, they are atomic. They support retrieval of stored fields, doc values, terms, and postings. {@link CompositeReader}: Instances (like {@

Re: Interleaving and new Lucene formats

2013-02-17 Thread Sebastiano Vigna
On 16 February 2013 14:35, Robert Muir wrote: > > TermsEnum termsEnum = reader.terms("body").iterator(null); > boolean found = termsEnum.seekExact(new BytesRef("dogs"), false); > // pass 0, to not ask for frequencies > DocsEnum docsEnum = termsEnum.docs(reader.getLiveDocs(

Re: Interleaving and new Lucene formats

2013-02-16 Thread Sebastiano Vigna
On 16 February 2013 14:35, Robert Muir wrote: 2. index them, but specify you won't ask for them in the DocsEnum: and > just use that to iterate documents. > > TermsEnum termsEnum = reader.terms("body").iterator(null); > boolean found = termsEnum.seekExact(new BytesRef("dogs"), false);

Re: Interleaving and new Lucene formats

2013-02-16 Thread Robert Muir
On Sat, Feb 16, 2013 at 8:19 AM, Sebastiano Vigna wrote: > > I never asked for that. It looks like you're entirely missing my point. > Which is to do a fair benchmark between radically different implementations > of an index structure. "It would also be important for me to force PForDelta everywh

Re: Interleaving and new Lucene formats

2013-02-16 Thread Sebastiano Vigna
On 16 February 2013 13:19, Robert Muir wrote: I think you are missing my point: this interleaving is part of the > whole design of this postings format. You can't just turn it off and > force it to be always FOR: or you would need a new postings format > I never asked for that. It looks like you

Re: Interleaving and new Lucene formats

2013-02-16 Thread Robert Muir
On Sat, Feb 16, 2013 at 7:05 AM, Sebastiano Vigna wrote: > On 16 February 2013 11:45, Robert Muir wrote: > >> But forcing that wouldn't be testing the 4.1 index format, it would be >> something else (something not interesting). > > > Do you mind if I have my own share of knowledge and have my ide

Re: Interleaving and new Lucene formats

2013-02-16 Thread Sebastiano Vigna
On 16 February 2013 11:45, Robert Muir wrote: > But forcing that wouldn't be testing the 4.1 index format, it would be > something else (something not interesting). > Do you mind if I have my own share of knowledge and have my idea about interesting benchmarks? :) You didn't answer, but the und

Re: Interleaving and new Lucene formats

2013-02-16 Thread Robert Muir
On Sat, Feb 16, 2013 at 5:40 AM, Sebastiano Vigna wrote: > I'd like to redo the benchmarks published on MG4J's home page with Lucene > 4.1. However, for this I'd need to know whether when using PForDelta coding > the counts (a.k.a. within-document frequencies) are stored interleaved with > the