[jira] [Commented] (LUCENE-5236) Use broadword bit selection in EliasFanoDecoder

2014-03-13 Thread Sebastiano Vigna (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933030#comment-13933030 ] Sebastiano Vigna commented on LUCENE-5236: -- Sorry. http://vigna.di.unimi.it/Sux

[jira] [Commented] (LUCENE-5236) Use broadword bit selection in EliasFanoDecoder

2014-03-11 Thread Sebastiano Vigna (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930217#comment-13930217 ] Sebastiano Vigna commented on LUCENE-5236: -- Sorry guys—just happened to read

Re: Interleaving and new Lucene formats

2013-02-17 Thread Sebastiano Vigna
On 16 February 2013 14:35, Robert Muir rcm...@gmail.com wrote: TermsEnum termsEnum = reader.terms(body).iterator(null); boolean found = termsEnum.seekExact(new BytesRef(dogs), false); // pass 0, to not ask for frequencies DocsEnum docsEnum =

Interleaving and new Lucene formats

2013-02-16 Thread Sebastiano Vigna
I'd like to redo the benchmarks published on MG4J's home page with Lucene 4.1. However, for this I'd need to know whether when using PForDelta coding the counts (a.k.a. within-document frequencies) are stored interleaved with the document pointers as in 3.6.2 (and, if not so, the cheapest way

Re: Interleaving and new Lucene formats

2013-02-16 Thread Sebastiano Vigna
On 16 February 2013 11:45, Robert Muir rcm...@gmail.com wrote: But forcing that wouldn't be testing the 4.1 index format, it would be something else (something not interesting). Do you mind if I have my own share of knowledge and have my idea about interesting benchmarks? :) You didn't

Re: Interleaving and new Lucene formats

2013-02-16 Thread Sebastiano Vigna
On 16 February 2013 13:19, Robert Muir rcm...@gmail.com wrote: I think you are missing my point: this interleaving is part of the whole design of this postings format. You can't just turn it off and force it to be always FOR: or you would need a new postings format I never asked for that. It

Re: Interleaving and new Lucene formats

2013-02-16 Thread Sebastiano Vigna
On 16 February 2013 14:35, Robert Muir rcm...@gmail.com wrote: 2. index them, but specify you won't ask for them in the DocsEnum: and just use that to iterate documents. TermsEnum termsEnum = reader.terms(body).iterator(null); boolean found = termsEnum.seekExact(new

Re: Benchmarking on GOV2

2006-05-30 Thread Sebastiano Vigna
On Mon, 2006-05-29 at 14:35 -1000, Chuck Williams wrote: I'm not sure what form you would like that help to take, but here are a couple high-level points imho: Help in configuring Lucene so that it uses all resources available, and so that the results returned are identical to all other

Benchmarking on GOV2

2006-05-29 Thread Sebastiano Vigna
Dear Lucene developers, I'd be interested in doing some benchmarking on (at least) Lucene, Egothor and MG4J. There is no actual data around on publicly available collections, and it would be nice to have some more objective data on efficiency for a significantly large collection. We have GOV2