Re: strange problem of PForDelta decoder

2011-02-16 Thread Li Li
our recent experiments show that PFOR is not a good solution for and query we tested it with our dataset and users' queries. for most case, PFOR is slower than vint. we analyzed the reason may be that it's very likely there is a low-frequent term in most queries. So the scoring time is the

Re: strange problem of PForDelta decoder

2011-01-06 Thread Michael McCandless
Great :) Also, realize that certain queries (MultiTermQuery) spend lots of time in rewrite and (sometimes, anyway) comparatively small amounts of time in actually running the query. So it'd be nice to have a good way to let multiple threads participate in rewrite too. Thread

Re: strange problem of PForDelta decoder

2011-01-05 Thread Li Li
we recently are interested in this problem. if we come up with a patch, I'd like to share it with everyone. 2011/1/4 Michael McCandless luc...@mikemccandless.com: 2011/1/4 Li Li fancye...@gmail.com: I agree with you that we should not tie concurrency w/in a single search to index segments.

Re: strange problem of PForDelta decoder

2011-01-04 Thread Michael McCandless
2011/1/4 Li Li fancye...@gmail.com: I agree with you that we should not tie concurrency w/in a single search to index segments. That solution is just a hack. will lucene 4 support multithreads search for a single query? I haven't found any patch about this. Well, as things stand now, Lucene

Re: strange problem of PForDelta decoder

2011-01-03 Thread Michael McCandless
Here's the paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.8091 I haven't read it yet... In general I don't like tying concurrency w/in a single search to index segments; I'd rather they be (relatively?) independent. EG an optimized index would then force single thread

Re: strange problem of PForDelta decoder

2011-01-03 Thread Li Li
I agree with you that we should not tie concurrency w/in a single search to index segments. That solution is just a hack. will lucene 4 support multithreads search for a single query? I haven't found any patch about this. 2011/1/4 Michael McCandless luc...@mikemccandless.com: Here's the paper:

Re: strange problem of PForDelta decoder

2011-01-01 Thread Li Li
I sent a mail to MG4J group and Sebastiano Vigna recommended the paper Reducing query latencies in web search using fine-grained parallelism. World Wide Web, 12(4):441-460, 2009. I read it roughly. But there are some questions it says: it first coalesces all disk reads in a single

Re: strange problem of PForDelta decoder

2010-12-31 Thread Lance Norskog
Please purchase and read Lucene In Action 2. It will explain how all of this works and how to used Lucene efficiently. http://www.lucidimagination.com/blog/2009/03/11/lia2/ http://www.lucidimagination.com/blog/2010/08/01/lucene-in-action-2d-edition-authors-round-table-podcast/

Re: strange problem of PForDelta decoder

2010-12-30 Thread Michael McCandless
On Mon, Dec 27, 2010 at 5:08 AM, Li Li fancye...@gmail.com wrote: I integrated pfor codec into lucene 2.9.3 and the search time comparsion is as follows:                                   single term   and query   or query VINT in lucene 2.9.3         11.2            36.5           38.6 PFor

Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li
I did another test using lucene 4 trunk with default codecs. it's file is the same as lucene 2.9. the speed is almost the same as lucene 2.9 I think it could be the fact that AND query does block reads (64 doc/freqs at once) instead of doc-at-once? Ie, because of this, the query is efficitively

Re: strange problem of PForDelta decoder

2010-12-30 Thread Earwin Burrfoot
until we fix Lucene to run a single search concurrently (which we badly need to do). I am interested in this idea.(I have posted it before) do you have some resources such as papers or tech articles about it? I have tried but it need to modify index format dramatically and we use solr

Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li
searching multi segments is a alternative solution but it has some disadvantages. 1. idf is not global?(I am not familiar with its implementation) maybe it's easy to solve it by share global idf 2. each segments will has it's own tii and tis files, which may make search slower(that's why

Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li
plus 2 means search a term need seek many times for tis(if it's not cached in tii) 2010/12/31 Li Li fancye...@gmail.com: searching multi segments is a alternative solution but it has some disadvantages. 1. idf is not global?(I am not familiar with its implementation) maybe it's easy to solve

Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li
is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/) it says Multithreading. Indices can be queried and scored concurrently. maybe we can learn something from it. 2010/12/31 Li Li fancye...@gmail.com: plus 2 means search a term need seek many times for tis(if it's not cached in tii)

Re: strange problem of PForDelta decoder

2010-12-27 Thread Li Li
I integrated pfor codec into lucene 2.9.3 and the search time comparsion is as follows: single term and query or query VINT in lucene 2.9.3 11.236.5 38.6 PFor in lucene 2.9.3 8.7 27.6 33.4 VINT in

Re: strange problem of PForDelta decoder

2010-12-23 Thread Michael McCandless
Well, an early patch somewhere was able to run PFor on trunk, but the performance wasn't great because the trunk bulk-read API is a bottleneck (this is why the bulk postings branch was created). Mike On Wed, Dec 22, 2010 at 9:45 PM, Li Li fancye...@gmail.com wrote: I used the bulkpostings

Re: strange problem of PForDelta decoder

2010-12-22 Thread Michael McCandless
Those are nice speedups! Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test? Mike On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote: great improvement! I did a test in our data set. doc count is about 2M+ and index size after optimization is about

Re: strange problem of PForDelta decoder

2010-12-22 Thread Li Li
I used the bulkpostings branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene) does trunk have PForDelta decoder/encoder ? 2010/12/23 Michael McCandless luc...@mikemccandless.com: Those are nice speedups! Did you use the 4.0 branch (ie trunk) or the bulkpostings

Re: strange problem of PForDelta decoder

2010-12-20 Thread Li Li
I think random test is not sufficient. for normal situation, some branches are not executed. I tested http://code.google.com/p/integer-array-compress-kit/ with many random int arrays and it works. But when I use it in real indexing, when in optimize stage, it corrupted. Because PForDelta

Re: strange problem of PForDelta decoder

2010-12-20 Thread Michael McCandless
On Mon, Dec 20, 2010 at 5:49 AM, Li Li fancye...@gmail.com wrote:   I think random test is not sufficient.   for normal situation, some branches are not executed. I tested http://code.google.com/p/integer-array-compress-kit/ with many random int arrays and it works. But when I use it in real

Re: strange problem of PForDelta decoder

2010-12-20 Thread Li Li
OK we should have a look at that one still.  We need to converge on a good default codec for 4.0.  Fortunately it's trivial to take any int block encoder (fixed or variable block) and make a Lucene codec out of it! I suggests you not to use this one, I fixed dozens of bugs but it still failed

Re: strange problem of PForDelta decoder

2010-12-19 Thread Li Li
is ForDecompressImpl generated by codes or manully coded? I am frustrated by http://code.google.com/p/integer-array-compress-kit/ which contains too many bugs( I fixed more than 20 but there still existed bugs) Because decoder has too many branches and in normal situation, some branches seldom

Re: strange problem of PForDelta decoder

2010-12-16 Thread Li Li
hi Michael, lucene 4 has so much changes that I don't know how to index and search with specified codec. could you please give me some code snipplets that using PFor codec so I can trace the codes. in you blog http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html

Re: strange problem of PForDelta decoder

2010-12-16 Thread Michael McCandless
On the bulkpostings branch you can do something like this: CodecProvider cp = new CodecProvider(); cp.register(new PatchedFrameOfRefCodec()); cp.setDefaultFieldCodec(PatchedFrameOfRef); Then whenever you create an IW or IR, use the advanced method that accepts a CodecProvider. Then the

Re: strange problem of PForDelta decoder

2010-12-15 Thread Li Li
hi Michael you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723 I am not familiar with patch. do I need download LUCENE-2723.patch(there are many patches after this name, do I need the latest one?) and LUCENE-2723_termscorer.patch and patch them (patch -p1

Re: strange problem of PForDelta decoder

2010-12-15 Thread Michael McCandless
Hi Li Li, That issue has such a big patch, and enough of us are now iterating on it, that we cut a dedicated branch for it. But note that this branch is off of trunk (to be 4.0). You should be able to do this: svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

Re: strange problem of PForDelta decoder

2010-12-15 Thread Li Li
thanks. I'd like trying it and do some experiment on our dataset. 2010/12/15 Michael McCandless luc...@mikemccandless.com: Hi Li Li, That issue has such a big patch, and enough of us are now iterating on it, that we cut a dedicated branch for it. But note that this branch is off of trunk

Re: strange problem of PForDelta decoder

2010-12-14 Thread Michael McCandless
Likely you are seeing the startup cost of hotspot compiling the PFOR code? Ie, does your test first warmup the JRE and then do the real test? I've also found that running -Xbatch produces more consistent results from run to run, however, those results may not be as fast as running w/o -Xbatch.

Re: strange problem of PForDelta decoder

2010-12-14 Thread Li Li
but I let VINT and S9 decoder run first, their time is the same as when they are called in the end. it's stable 2010/12/14 Michael McCandless luc...@mikemccandless.com: Likely you are seeing the startup cost of hotspot compiling the PFOR code? Ie, does your test first warmup the JRE and then