delete entries from posting list Lucene 4.0

2012-03-19 Thread Zeynep P.
I need to delete entries from posting list. How to do it in Lucene 4.0? I need to do this to test different pruning algorithms. Thanks in advance ZP -- View this message in context: http://lucene.472066.n3.nabble.com/delete-entries-from-posting-list-Lucene-4-0-tp3838649p3838649.html Sent from

Re: delete entries from posting list Lucene 4.0

2012-03-19 Thread Zeynep P.
That is perfect Thank you very much Best regards ZP -- View this message in context: http://lucene.472066.n3.nabble.com/delete-entries-from-posting-list-Lucene-4-0-tp3838649p3839095.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: delete entries from posting list Lucene 4.0

2012-03-27 Thread Zeynep P.
While using the pruning package, I realised that ridf is calculated in RIDFTermPruningPolicy as follows: Math.log(1 - Math.pow(Math.E, termPositions.freq() / maxDoc)) - df However, according to the original paper (Blanco et al.) for residual idf, it should be -log(df/D) + log (1 - e^(*-*tf/D)). T

Wikipedia revision history dump + lucene benchmark

2012-04-10 Thread Zeynep P.
wikipedia.alg in benchmark is only able to extract and index current pages dumps. It does not take revisions into account. Do you know any way to do this? Or should I change EnwikiContentSource to handle the versions? Although, Wikipedia dumps are widely used especially for research purposes, as f

Re: delete entries from posting list Lucene 4.0

2012-04-23 Thread Zeynep P.
Hi, Thanks for the fix. I also wonder if you know any collection (free ones) to test pruning approaches. Almost all the papers use TREC collections which I don't have!! For now, I use Reuters21578 collection and Carmel's Kendall's tau extension to measure similarity. But I need a collection with

pruning package- pruneAllPositions

2012-05-02 Thread Zeynep P.
Hi, In the pruning package, pruneAllPositions throws an exception. In the code it is commented that it should not happen. // should not happen! throw new IOException("termPositions.doc > docs[docsPos].doc"); Can you please explain me why it happens and what should I do to fix it? Thanks in a

Re: pruning package- pruneAllPositions

2012-05-07 Thread Zeynep P.
Thanks for the link. I reviewed it. Here are more details about the exception: I used contrib/benchmark/conf/wikipedia.alg to index wikipedia dump with MAddDocs: 20. I wanted to index only a specific period of time so I added an if statement in doLogic of AddDocTask class. I tried to prune t

Re: Measuring precision and recall in lucene to compare two sets of results

2012-05-07 Thread Zeynep P.
Hi, You can use kendall's tau. An article titled Comparing top k lists by Ronald Fagin, Ravi Kumar and D. Sivakumar explaines different methods. Best Regards, ZP -- View this message in context: http://lucene.472066.n3.nabble.com/Measuring-precision-and-recall-in-lucene-to-compare-two-sets-

Re: pruning package- pruneAllPositions

2012-06-04 Thread Zeynep P.
Hi, Thanks for your fix. I used it but I think there is something wrong with the fix!!? because I am using LATimes collection and with epsilon = 0.1 and k =10 I got 97% pruned index. It means 3% of index left unchanged after pruning. In the the original paper, "Static index pruning for IR systems

threshold calculation in CarmelTopKTermPruningPolicy

2012-06-12 Thread Zeynep P.
Hi, In CarmelTopKTermPruningPolicy class, the threshold is calculated as follows: *float threshold = docs[k - 1].score - scoreDelta;* docs[k - 1].score corresponds to z_t in the original paper (Carmel et al 2001) and scoreDelta = epsilon * r Could you please explain me why it is calculated

pruning package- question about termpositions && skipTo

2012-08-14 Thread Zeynep P.
Hi to all, In pruning package, for pruneAllPositions(TermPositions termPositions, Term t) methos it is said that : "termPositions - positioned term positions. Implementations MUST NOT advance this by calling TermPositions methods that advance either the position pointer (next, skipTo) or term poi

Re: pruning package- question about termpositions && skipTo

2012-08-22 Thread Zeynep P.
Hi to all, I found the problem and the solution. In PruningReader super.getSequentialSubReaders(); is used. After 28118 super.next() is false because it is a subreader for a segment and indexreader.maxDoc() is equal to 28118 for that segment. In pruneAllPositions, instead of comparing termpostion

test LA Times with pruning package

2012-09-14 Thread Zeynep P.
Hi to all, I used pruning package with LA Times collection. The initial LA Times index is created by lucene benchmark/conf/*.alg. Luke shows 131896 documents with 635614 terms for initial index. I pruned with CarmelTopKPruning policy with epsilon = 0.1 by varying k. However, my results do not cor

pruning & Lucene 4.0

2012-10-12 Thread Zeynep P.
Hi, Do you have any information about when the pruning package will be available for Lucene 4.0 ? Best Regards Thanks in advance ZP -- View this message in context: http://lucene.472066.n3.nabble.com/pruning-Lucene-4-0-tp4013363.html Sent from the Lucene - Java Users mailing list archive a

Lucene 4.0 benchmark bug?

2012-10-17 Thread Zeynep P.
Hi to all, I started to use benchmark 4.0 to create submission report files with the following code: BufferedReader br = new BufferedReader(fr); QualityQuery qqs[] = qReader.readQueries(br); QualityQueryParser qqParser = new SimpleQQParser("title", "body");

Re: pruning & Lucene 4.0

2013-02-20 Thread Zeynep P.
Hi, any news since? Thanks, Best regards, ZP -- View this message in context: http://lucene.472066.n3.nabble.com/pruning-Lucene-4-0-tp4013363p4041499.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: Scoring function in LMDirichletSimilarity Class

2013-04-02 Thread Zeynep P.
Hi, I have the same question related to LMJelinekMercerSimiliarity class. protected float score(BasicStats stats, float freq, float docLen) { return stats.getTotalBoost() * (float)Math.log(1 + ((1 - lambda) * freq / docLen) / (lambda * ((LMStats)stats).getCollectionProbability()));

this IndexReader is closed only with jar

2011-10-17 Thread Zeynep P.
Hi, I am having a weird experience. I made a few changes with the source code (Lucene 3.3). I created a basic application to test it. First, I added Lucene 3.3 project to basic project as "required projects on the build path" to be able to debug. When everything was ok, I removed it from required