[
https://issues.apache.org/jira/browse/LUCENE-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933030#comment-13933030
]
Sebastiano Vigna commented on LUCENE-5236:
--
Sorry.
http://vigna.di.unimi.it/Sux
[
https://issues.apache.org/jira/browse/LUCENE-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930217#comment-13930217
]
Sebastiano Vigna commented on LUCENE-5236:
--
Sorry guys—just happened to read
On 16 February 2013 14:35, Robert Muir rcm...@gmail.com wrote:
TermsEnum termsEnum = reader.terms(body).iterator(null);
boolean found = termsEnum.seekExact(new BytesRef(dogs), false);
// pass 0, to not ask for frequencies
DocsEnum docsEnum =
I'd like to redo the benchmarks published on MG4J's home page with Lucene 4.1.
However, for this I'd need to know whether when using PForDelta coding the
counts (a.k.a. within-document frequencies) are stored interleaved with the
document pointers as in 3.6.2 (and, if not so, the cheapest way
On 16 February 2013 11:45, Robert Muir rcm...@gmail.com wrote:
But forcing that wouldn't be testing the 4.1 index format, it would be
something else (something not interesting).
Do you mind if I have my own share of knowledge and have my idea about
interesting benchmarks? :)
You didn't
On 16 February 2013 13:19, Robert Muir rcm...@gmail.com wrote:
I think you are missing my point: this interleaving is part of the
whole design of this postings format. You can't just turn it off and
force it to be always FOR: or you would need a new postings format
I never asked for that. It
On 16 February 2013 14:35, Robert Muir rcm...@gmail.com wrote:
2. index them, but specify you won't ask for them in the DocsEnum: and
just use that to iterate documents.
TermsEnum termsEnum = reader.terms(body).iterator(null);
boolean found = termsEnum.seekExact(new
On Mon, 2006-05-29 at 14:35 -1000, Chuck Williams wrote:
I'm not sure what form you would like that help to take, but here are a
couple high-level points imho:
Help in configuring Lucene so that it uses all resources available, and
so that the results returned are identical to all other
Dear Lucene developers,
I'd be interested in doing some benchmarking on (at least) Lucene,
Egothor and MG4J. There is no actual data around on publicly available
collections, and it would be nice to have some more objective data on
efficiency for a significantly large collection.
We have GOV2