I need to delete entries from posting list. How to do it in Lucene 4.0? I
need to do this to test different pruning algorithms.
Thanks in advance
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/delete-entries-from-posting-list-Lucene-4-0-tp3838649p3838649.html
Sent from
That is perfect
Thank you very much
Best regards
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/delete-entries-from-posting-list-Lucene-4-0-tp3838649p3839095.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
While using the pruning package, I realised that ridf is calculated in
RIDFTermPruningPolicy as follows:
Math.log(1 - Math.pow(Math.E, termPositions.freq() / maxDoc)) - df
However, according to the original paper (Blanco et al.) for residual idf,
it should be -log(df/D) + log (1 - e^(*-*tf/D)). T
wikipedia.alg in benchmark is only able to extract and index current pages
dumps. It does not take revisions into account. Do you know any way to do
this? Or should I change EnwikiContentSource to handle the versions?
Although, Wikipedia dumps are widely used especially for research purposes,
as f
Hi,
Thanks for the fix.
I also wonder if you know any collection (free ones) to test pruning
approaches. Almost all the papers use TREC collections which I don't have!!
For now, I use Reuters21578 collection and Carmel's Kendall's tau extension
to measure similarity. But I need a collection with
Hi,
In the pruning package, pruneAllPositions throws an exception. In the code
it is commented that it should not happen.
// should not happen!
throw new IOException("termPositions.doc > docs[docsPos].doc");
Can you please explain me why it happens and what should I do to fix it?
Thanks in a
Thanks for the link. I reviewed it.
Here are more details about the exception:
I used contrib/benchmark/conf/wikipedia.alg to index wikipedia dump with
MAddDocs: 20. I wanted to index only a specific period of time so I
added an if statement in doLogic of AddDocTask class.
I tried to prune t
Hi,
You can use kendall's tau. An article titled Comparing top k lists by Ronald
Fagin, Ravi Kumar and D. Sivakumar explaines different methods.
Best Regards,
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/Measuring-precision-and-recall-in-lucene-to-compare-two-sets-
Hi,
Thanks for your fix. I used it but I think there is something wrong with the
fix!!? because
I am using LATimes collection and with epsilon = 0.1 and k =10 I got 97%
pruned index. It means 3% of index left unchanged after pruning. In the the
original paper, "Static index pruning for IR systems
Hi,
In CarmelTopKTermPruningPolicy class, the threshold is calculated as
follows:
*float threshold = docs[k - 1].score - scoreDelta;*
docs[k - 1].score corresponds to z_t in the original paper (Carmel et al
2001) and scoreDelta = epsilon * r
Could you please explain me why it is calculated
Hi to all,
In pruning package, for pruneAllPositions(TermPositions termPositions, Term
t) methos it is said that :
"termPositions - positioned term positions. Implementations MUST NOT advance
this by calling TermPositions methods that advance either the position
pointer (next, skipTo) or term poi
Hi to all,
I found the problem and the solution. In PruningReader
super.getSequentialSubReaders(); is used. After 28118 super.next() is false
because it is a subreader for a segment and indexreader.maxDoc() is equal to
28118 for that segment. In pruneAllPositions, instead of comparing
termpostion
Hi to all,
I used pruning package with LA Times collection. The initial LA Times index
is created by lucene benchmark/conf/*.alg. Luke shows 131896 documents with
635614 terms for initial index. I pruned with CarmelTopKPruning policy with
epsilon = 0.1 by varying k. However, my results do not cor
Hi,
Do you have any information about when the pruning package will be available
for Lucene 4.0 ?
Best Regards
Thanks in advance
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/pruning-Lucene-4-0-tp4013363.html
Sent from the Lucene - Java Users mailing list archive a
Hi to all,
I started to use benchmark 4.0 to create submission report files with the
following code:
BufferedReader br = new BufferedReader(fr);
QualityQuery qqs[] = qReader.readQueries(br);
QualityQueryParser qqParser = new SimpleQQParser("title", "body");
Hi,
any news since?
Thanks,
Best regards,
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/pruning-Lucene-4-0-tp4013363p4041499.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
Hi,
I have the same question related to LMJelinekMercerSimiliarity class.
protected float score(BasicStats stats, float freq, float docLen) {
return stats.getTotalBoost() *
(float)Math.log(1 + ((1 - lambda) * freq / docLen) / (lambda *
((LMStats)stats).getCollectionProbability()));
Hi,
I am having a weird experience. I made a few changes with the source code
(Lucene 3.3). I created a basic application to test it. First, I added
Lucene 3.3 project to basic project as "required projects on the build path"
to be able to debug. When everything was ok, I removed it from required
18 matches
Mail list logo