subject:"Re\: strange problem of PForDelta decoder"

Re: strange problem of PForDelta decoder

2011-02-16 Thread Li Li

   our recent experiments show that PFOR is not a good solution for and query
we tested it with our dataset and users' queries. for most case, PFOR is slower
than vint. we analyzed the reason may be that it's very likely there
is a low-frequent
term in most queries. So the scoring time is the majority while decoding is not.
   e.g in our index, term beijing's df is 2557916 and park is
2313201, both them
are hight frequent terms. but the count of documents containing both
is only 1552
   for vint, it only need decode 1552 documents, while PFOR, it may decode many
blocks.
   for most search engines, and query is used. So PFOR is only good for or query
and and query whose terms are all high frequent.
   So we have to give up this in our application.
   partial decoder for PFOR? for all high frequent terms, using normal
PFOR decoder
;for quries with low frequent  terms, using partial decoder?
   partial decoder of PFOR many need many if/else and will be slower.
   Any one has any solution for this?


2010/12/27 Li Li fancye...@gmail.com:
 I integrated pfor codec into lucene 2.9.3 and the search time
 comparsion is as follows:
                                   single term   and query   or query
 VINT in lucene 2.9.3         11.2            36.5           38.6
 PFor in lucene 2.9.3         8.7              27.6           33.4
 VINT in lucene 4 branch   10.6             26.5           35.4
 PFor in lcuene 4 branch    8.1              22.5           30.7

 My test terms are high frequncy terms because we are interested in bad case
 It seems lucene 4 branch's implementation of and query(conjuction
 query) is well optimized that even for VINT codec, it's faster than
 PFor in lucene 2.9.3. Could any one tell me what optimization is done?
 is store docIDs and freqs separately making it faster? or anything
 else?

 Another querstion, Is there anyone interested in integrating pfor
 codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
 1.4). And how do I contribute this patch?

 2010/12/24 Michael McCandless luc...@mikemccandless.com:
 Well, an early patch somewhere was able to run PFor on trunk, but the
 performance wasn't great because the trunk bulk-read API is a
 bottleneck (this is why the bulk postings branch was created).

 Mike

 On Wed, Dec 22, 2010 at 9:45 PM, Li Li fancye...@gmail.com wrote:
 I used the bulkpostings
 branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
 does trunk have PForDelta decoder/encoder ?

 2010/12/23 Michael McCandless luc...@mikemccandless.com:
 Those are nice speedups!

 Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this 
 test?

 Mike

 On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote:
 great improvement!
 I did a test in our data set. doc count is about 2M+ and index size
 after optimization is about 13.3GB(including fdt)
 it seems lucene4's index format is better than lucene2.9.3. and PFor
 give good results.
 Besides BlockEncoder for frq and pos. is there any other modification
 for lucene 4?

       decoder    \ avg time     single word(ms)          and
 query(ms)     or query(ms)
  VINT in lucene 2.9                   11.2
 36.5                 38.6
  VINT in lucene 4 branch           10.6
 26.5                 35.4
  PFor in lucene 4 branch             8.1
 22.5                 30.7
 2010/12/21 Li Li fancye...@gmail.com:
 OK we should have a look at that one still.  We need to converge on a
 good default codec for 4.0.  Fortunately it's trivial to take any int
 block encoder (fixed or variable block) and make a Lucene codec out of
 it!

 I suggests you not to use this one, I fixed dozens of bugs but it
 still failed when with random tests. it's codes is hand coded rather
 than generated by program. But we may learn something from it.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2011-01-06 Thread Michael McCandless

Great :)

Also, realize that certain queries (MultiTermQuery) spend lots of time
in rewrite and (sometimes, anyway) comparatively small amounts of
time in actually running the query.  So it'd be nice to have a good
way to let multiple threads participate in rewrite too.

Thread scheduling/prioritization is also important, so that a big slow
query doesn't starve execution for fast queries.  Ideally
(unfortunately!) I think we'd want something like how the OS schedules
processes, ie new processes run with higher priority than long running
resource consuming processes.  So a new query would get higher
priority, but, if it runs for a long time, it's priority is reduced.
Not sure how we should solve that one...

Mike

On Wed, Jan 5, 2011 at 10:09 PM, Li Li fancye...@gmail.com wrote:
 we recently are interested in this problem. if we come up with a
 patch, I'd like
 to share it with everyone.

 2011/1/4 Michael McCandless luc...@mikemccandless.com:
 2011/1/4 Li Li fancye...@gmail.com:
 I agree with you that we should not tie concurrency w/in a single search to
 index segments.
 That solution is just a hack.
 will lucene 4 support multithreads search for a single query?
 I haven't found any patch about this.

 Well, as things stand now, Lucene 4 will only support the thread per
 segment hack.  The patch on LUCENE-2837 (still needs work) merges
 ParallelMultiSearcher into IndexSearcher, carrying over that hack.

 But this discussion seems like it could lead to a nice patch?  (If
 someone has the time/energy/itch to cons one up).

 Just dividing up the docID space equally seems like a simple solution
 that'd work well...

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2011-01-05 Thread Li Li

we recently are interested in this problem. if we come up with a
patch, I'd like
to share it with everyone.

2011/1/4 Michael McCandless luc...@mikemccandless.com:
 2011/1/4 Li Li fancye...@gmail.com:
 I agree with you that we should not tie concurrency w/in a single search to
 index segments.
 That solution is just a hack.
 will lucene 4 support multithreads search for a single query?
 I haven't found any patch about this.

 Well, as things stand now, Lucene 4 will only support the thread per
 segment hack.  The patch on LUCENE-2837 (still needs work) merges
 ParallelMultiSearcher into IndexSearcher, carrying over that hack.

 But this discussion seems like it could lead to a nice patch?  (If
 someone has the time/energy/itch to cons one up).

 Just dividing up the docID space equally seems like a simple solution
 that'd work well...

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2011-01-04 Thread Michael McCandless

2011/1/4 Li Li fancye...@gmail.com:
 I agree with you that we should not tie concurrency w/in a single search to
 index segments.
 That solution is just a hack.
 will lucene 4 support multithreads search for a single query?
 I haven't found any patch about this.

Well, as things stand now, Lucene 4 will only support the thread per
segment hack.  The patch on LUCENE-2837 (still needs work) merges
ParallelMultiSearcher into IndexSearcher, carrying over that hack.

But this discussion seems like it could lead to a nice patch?  (If
someone has the time/energy/itch to cons one up).

Just dividing up the docID space equally seems like a simple solution
that'd work well...

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2011-01-03 Thread Michael McCandless

Here's the paper:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.8091

I haven't read it yet...

In general I don't like tying concurrency w/in a single search to
index segments; I'd rather they be (relatively?) independent.  EG an
optimized index would then force single thread queries, which is
crazy...

In Lucene it should be easy to chop up the maxDoc(), ie if I have 10M
docs and I want to use 4 threads I just give each thread a chunk of
2.5M docs.  Each thread would first skip to its start doc, and then go
from there.

I'm not sure we'd want to share a single collector vs private
collector per thread and then merge in the end (this is a similar
tradeoff as we've discussed on the per-segment collectors).

Mike

2011/1/1 Li Li fancye...@gmail.com:
   I sent a mail to MG4J group and Sebastiano Vigna recommended the paper
 Reducing query latencies in web search using fine-grained parallelism.
 World Wide Web, 12(4):441-460, 2009.
   I read it roughly. But there are some questions
     it says:
    it first coalesces all disk reads in a single process, and then
 distributes the index data among
 the parallel threads. So when the index server receives a query, it
 loads from storage (disk or cache)
 the required posting lists, initializes the query execution, and then
 spawns lightweight threads, one per core.
 Each thread receives an equal-sized subset of document IDs to scan,
 together covering the entire index
 partition. All threads execute the same code on the same query, but
 with private data structures.
 The only writable shared data structure is a heap of the top-scoring
 hits, protected by a lock. At the end of the threads'
 execution, the heap contains the highest-scoring hits in the entire
 index partition, which is then transmitted to the query
 integrator as usual. Since the index contains skip-lists that permit
 near-random-access to any document ID, and since
 hits are generally distributed evenly in the index, our measurements
 show that all threads complete their work with
 little variability, within 1-2% of each other.

  I have some questions
  1.Since the index contains skip-lists that permit
 near-random-access to any document ID
   skip list can be near random access?(especially when it's in hard disk)

  2.For or query, it's easy. e.g. we search search and engine,
 we using one main thread to get postings the
 the 2 terms and divide their postings into 2 groups(e.g. even docIds
 and odd docIds) and using 2 threads to score
 them and finally merge it(or using locked priority queue)
     but for and query, we usally don't decode all the docIDs
 because we can skip many documents. especially when
 searching low frequent terms with high frequent terms. only a small
 number of docIDs of high frequent term is decoded.

   btw. I think or query is often much slower than and query. If we can
 parallel or query well, it's also very useful.

 On Dec 31, 2010, at 7:25 AM, Li Li wrote:

    Which one is used in MG4J to support multithreads searching? Are

 2010/12/31 Li Li fancye...@gmail.com:
 is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
 it says Multithreading. Indices can be queried and scored concurrently.
 maybe we can learn something from it.

 2010/12/31 Li Li fancye...@gmail.com:
 plus
 2 means search a term need seek many times for tis(if it's not cached in 
 tii)

 2010/12/31 Li Li fancye...@gmail.com:
 searching multi segments is a alternative solution but it has some
 disadvantages.
 1. idf is not global?(I am not familiar with its implementation) maybe
 it's easy to solve it by share global idf
 2. each segments will has it's own tii and tis files, which may make
 search slower(that's why optimization of
 index is neccessary)
 3. one term's  docList is distributed in many files rather than one.
 more than one frq files means
 hard disk must seek different tracks, it's time consuming. if there is
 only one segment, the are likely
 stored in a single track.


 2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so 
 finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd 
 group
 and finally merge their results into a

Re: strange problem of PForDelta decoder

2011-01-03 Thread Li Li

I agree with you that we should not tie concurrency w/in a single search to
index segments.
That solution is just a hack.
will lucene 4 support multithreads search for a single query?
I haven't found any patch about this.

2011/1/4 Michael McCandless luc...@mikemccandless.com:
 Here's the paper:

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.8091

 I haven't read it yet...

 In general I don't like tying concurrency w/in a single search to
 index segments; I'd rather they be (relatively?) independent.  EG an
 optimized index would then force single thread queries, which is
 crazy...

 In Lucene it should be easy to chop up the maxDoc(), ie if I have 10M
 docs and I want to use 4 threads I just give each thread a chunk of
 2.5M docs.  Each thread would first skip to its start doc, and then go
 from there.

 I'm not sure we'd want to share a single collector vs private
 collector per thread and then merge in the end (this is a similar
 tradeoff as we've discussed on the per-segment collectors).

 Mike

 2011/1/1 Li Li fancye...@gmail.com:
   I sent a mail to MG4J group and Sebastiano Vigna recommended the paper
 Reducing query latencies in web search using fine-grained parallelism.
 World Wide Web, 12(4):441-460, 2009.
   I read it roughly. But there are some questions
     it says:
    it first coalesces all disk reads in a single process, and then
 distributes the index data among
 the parallel threads. So when the index server receives a query, it
 loads from storage (disk or cache)
 the required posting lists, initializes the query execution, and then
 spawns lightweight threads, one per core.
 Each thread receives an equal-sized subset of document IDs to scan,
 together covering the entire index
 partition. All threads execute the same code on the same query, but
 with private data structures.
 The only writable shared data structure is a heap of the top-scoring
 hits, protected by a lock. At the end of the threads'
 execution, the heap contains the highest-scoring hits in the entire
 index partition, which is then transmitted to the query
 integrator as usual. Since the index contains skip-lists that permit
 near-random-access to any document ID, and since
 hits are generally distributed evenly in the index, our measurements
 show that all threads complete their work with
 little variability, within 1-2% of each other.

  I have some questions
  1.Since the index contains skip-lists that permit
 near-random-access to any document ID
   skip list can be near random access?(especially when it's in hard disk)

  2.For or query, it's easy. e.g. we search search and engine,
 we using one main thread to get postings the
 the 2 terms and divide their postings into 2 groups(e.g. even docIds
 and odd docIds) and using 2 threads to score
 them and finally merge it(or using locked priority queue)
     but for and query, we usally don't decode all the docIDs
 because we can skip many documents. especially when
 searching low frequent terms with high frequent terms. only a small
 number of docIDs of high frequent term is decoded.

   btw. I think or query is often much slower than and query. If we can
 parallel or query well, it's also very useful.

 On Dec 31, 2010, at 7:25 AM, Li Li wrote:

    Which one is used in MG4J to support multithreads searching? Are

 2010/12/31 Li Li fancye...@gmail.com:
 is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
 it says Multithreading. Indices can be queried and scored concurrently.
 maybe we can learn something from it.

 2010/12/31 Li Li fancye...@gmail.com:
 plus
 2 means search a term need seek many times for tis(if it's not cached in 
 tii)

 2010/12/31 Li Li fancye...@gmail.com:
 searching multi segments is a alternative solution but it has some
 disadvantages.
 1. idf is not global?(I am not familiar with its implementation) maybe
 it's easy to solve it by share global idf
 2. each segments will has it's own tii and tis files, which may make
 search slower(that's why optimization of
 index is neccessary)
 3. one term's  docList is distributed in many files rather than one.
 more than one frq files means
 hard disk must seek different tracks, it's time consuming. if there is
 only one segment, the are likely
 stored in a single track.


 2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so 
 finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping

Re: strange problem of PForDelta decoder

2011-01-01 Thread Li Li

   I sent a mail to MG4J group and Sebastiano Vigna recommended the paper
Reducing query latencies in web search using fine-grained parallelism.
World Wide Web, 12(4):441-460, 2009.
   I read it roughly. But there are some questions
 it says:
it first coalesces all disk reads in a single process, and then
distributes the index data among
the parallel threads. So when the index server receives a query, it
loads from storage (disk or cache)
the required posting lists, initializes the query execution, and then
spawns lightweight threads, one per core.
Each thread receives an equal-sized subset of document IDs to scan,
together covering the entire index
partition. All threads execute the same code on the same query, but
with private data structures.
The only writable shared data structure is a heap of the top-scoring
hits, protected by a lock. At the end of the threads'
execution, the heap contains the highest-scoring hits in the entire
index partition, which is then transmitted to the query
integrator as usual. Since the index contains skip-lists that permit
near-random-access to any document ID, and since
hits are generally distributed evenly in the index, our measurements
show that all threads complete their work with
little variability, within 1-2% of each other.

  I have some questions
  1.Since the index contains skip-lists that permit
near-random-access to any document ID
   skip list can be near random access?(especially when it's in hard disk)

  2.For or query, it's easy. e.g. we search search and engine,
we using one main thread to get postings the
the 2 terms and divide their postings into 2 groups(e.g. even docIds
and odd docIds) and using 2 threads to score
them and finally merge it(or using locked priority queue)
 but for and query, we usally don't decode all the docIDs
because we can skip many documents. especially when
searching low frequent terms with high frequent terms. only a small
number of docIDs of high frequent term is decoded.

   btw. I think or query is often much slower than and query. If we can
parallel or query well, it's also very useful.

On Dec 31, 2010, at 7:25 AM, Li Li wrote:

Which one is used in MG4J to support multithreads searching? Are

2010/12/31 Li Li fancye...@gmail.com:
 is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
 it says Multithreading. Indices can be queried and scored concurrently.
 maybe we can learn something from it.

 2010/12/31 Li Li fancye...@gmail.com:
 plus
 2 means search a term need seek many times for tis(if it's not cached in tii)

 2010/12/31 Li Li fancye...@gmail.com:
 searching multi segments is a alternative solution but it has some
 disadvantages.
 1. idf is not global?(I am not familiar with its implementation) maybe
 it's easy to solve it by share global idf
 2. each segments will has it's own tii and tis files, which may make
 search slower(that's why optimization of
 index is neccessary)
 3. one term's  docList is distributed in many files rather than one.
 more than one frq files means
 hard disk must seek different tracks, it's time consuming. if there is
 only one segment, the are likely
 stored in a single track.


 2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so 
 finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd 
 group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
e.g. shard1 has doc1 doc2
   shard2 has doc3 doc4
but term1 may only occur in doc1 and doc2
while term2 may only occur in doc3 and doc4
we may modify it
   shard1 doc1 doc3
   shard2 doc2 doc4
it's good for term1 and term2
but term3 may occur in doc1 and doc3...
So I think it's fine-grained

Re: strange problem of PForDelta decoder

2010-12-31 Thread Lance Norskog

Please purchase and read Lucene In Action 2. It will explain how all
of this works and how to used Lucene efficiently.

http://www.lucidimagination.com/blog/2009/03/11/lia2/
http://www.lucidimagination.com/blog/2010/08/01/lucene-in-action-2d-edition-authors-round-table-podcast/
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and-Videos/Lucene-in-Action

It is available from Manning as a downloadable E-Book (PDF) so you
don't have to wait for it to be shipped.

2010/12/30 Li Li fancye...@gmail.com:
is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
it says Multithreading. Indices can be queried and scored concurrently.
maybe we can learn something from it.

2010/12/31 Li Li fancye...@gmail.com:
plus
2 means search a term need seek many times for tis(if it's not cached in tii)

2010/12/31 Li Li fancye...@gmail.com:
searching multi segments is a alternative solution but it has some
disadvantages.
1. idf is not global?(I am not familiar with its implementation) maybe
it's easy to solve it by share global idf
2. each segments will has it's own tii and tis files, which may make
search slower(that's why optimization of
index is neccessary)
3. one term's docList is distributed in many files rather than one.
more than one frq files means
hard disk must seek different tracks, it's time consuming. if there is
only one segment, the are likely
stored in a single track.

2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
I am interested in this idea.(I have posted it before) do you have some
resources such as papers or tech articles about it?
I have tried but it need to modify index format dramatically and we use
solr distributed search to relieve the problem of response time. so
finally
give it up.
lucene4's index format is more flexible that it supports customed codecs
and it's now on development, I think it's good time to take it into
consideration
that let it support multithread searching for a single query.
I have a naive solution. dividing docList into many groups
e.g grouping docIds by it's even or odd
term1 df1=4 docList = 0 4 8 10
term1 df2=4 docList = 1 3 9 11

term2 df1=4 docList = 0 6 8 12
term2 df2=4 docList = 3 9 11 15
then we can use 2 threads to search topN docs on even group and odd
group
and finally merge their results into a single on just like solr
distributed search.
But it's better than solr distributed search.
First, it's in a single process and data communication between
threads is much
faster than network.
Second, each threads process the same number of documents.For solr
distributed
search, one shard may process 7 documents and another shard may 1 document
Even if we can make each shard have the same document number. we can not
make it uniformly for each term.
e.g. shard1 has doc1 doc2
shard2 has doc3 doc4
but term1 may only occur in doc1 and doc2
while term2 may only occur in doc3 and doc4
we may modify it
shard1 doc1 doc3
shard2 doc2 doc4
it's good for term1 and term2
but term3 may occur in doc1 and doc3...
So I think it's fine-grained distributed in index while solr
distributed search is coarse-
grained.
This is just crazy :)

The simple way is just to search different segments in parallel.
BalancedSegmentMergePolicy makes sure you have roughly even-sized
large segments (and small ones don't count, they're small!).
If you're bound on squeezing out that extra millisecond (and making
your life miserable along the way), you can search a single segment
with multiple threads (by dividing it in even chunks, and then doing
skipTo to position your iterators to the beginning of each chunk).

First approach is really easy to implement. Second one is harder, but
still doesn't require you to cook the number of CPU cores available
into your index!

It's the law of diminishing returns at play here. You're most likely
to search in parallel over mostly memory-resident index
(RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
tend to slow down considerably on parallel sequential reads, so you
already have pretty decent speed.
Searching different segments in parallel (with BSMP) makes you several
times faster.
Searching in parallel within a segment requires some weird hacks, but
has maybe a few percent advantage over previous solution.
Sharding posting lists requires a great deal of weird hacks, makes
index machine-bound, and boosts speed by another couple of percent.
Sounds worthless.

--
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-30 Thread Michael McCandless

On Mon, Dec 27, 2010 at 5:08 AM, Li Li fancye...@gmail.com wrote:
 I integrated pfor codec into lucene 2.9.3 and the search time
 comparsion is as follows:
                                   single term   and query   or query
 VINT in lucene 2.9.3         11.2            36.5           38.6
 PFor in lucene 2.9.3         8.7              27.6           33.4
 VINT in lucene 4 branch   10.6             26.5           35.4
 PFor in lcuene 4 branch    8.1              22.5           30.7

 My test terms are high frequncy terms because we are interested in bad case

I agree it's the bad cases we should focus on in general.  If a super
fast query gets somewhat slower it's relatively harmless (just a
capacity question for high volume sites) but if the bad queries get
slower it's awful (requires faster cutover to sharded architecture),
until we fix Lucene to run a single search concurrently (which we
badly need to do).

 It seems lucene 4 branch's implementation of and query(conjuction
 query) is well optimized that even for VINT codec, it's faster than
 PFor in lucene 2.9.3. Could any one tell me what optimization is done?
 is store docIDs and freqs separately making it faster? or anything
 else?

Actually vInt on the bulkpostings branch stores freq/doc together.  Ie
the format is the same as 2.9.x's format.  I think it could be the
fact that AND query does block reads (64 doc/freqs at once) instead of
doc-at-once?  Ie, because of this, the query is efficitively scanning
the next block of 64 docs instead of skipping to them?  Our skipping
impl is unfortunately rather costly so if skip will not skip that many
docs it's better to scan.

 Another querstion, Is there anyone interested in integrating pfor
 codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
 1.4). And how do I contribute this patch?

Realistically I don't think we can commit this to 2.9.x -- that branch
is purely bug fixes at this point.

Still it's possible others could make use of such a patch so if it's
not too much work you may as well post it?  It can lead to
improvements on the bulk postings branch too :)  The more patches the
merrier!

You only use PFor for the very high freq terms in 2.9.x right?  I've
wondered if we should do the same on bulkpostings... problem is for eg
range queries, that visit all docs for all terms b/w X and Y, you want
the bulk decode even for low freq terms...

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li

I did another test using lucene 4 trunk with default codecs. it's file
is the same as lucene 2.9.
the speed is almost the same as lucene 2.9

 I think it could be the
fact that AND query does block reads (64 doc/freqs at once) instead of
doc-at-once?  Ie, because of this, the query is efficitively scanning
the next block of 64 docs instead of skipping to them?  Our skipping
impl is unfortunately rather costly so if skip will not skip that many
docs it's better to scan.
I agree with this explanation. for high frequency terms, the skiplist can
not skip over many docs. it seems there are something need optimization.
e.g. for high frequent terms, we use scanning; for low frequent terms, we
use skiplist. but if we only care bad case, we can just care high frequent
terms only.

You only use PFor for the very high freq terms in 2.9.x right?
I use PFor if df is greater than 128. if not, I use VINT

until we fix Lucene to run a single search concurrently (which we
badly need to do).
I am interested in this idea.(I have posted it before) do you have some
resources such as papers or tech articles about it?
I have tried but it need to modify index format dramatically and we use
solr distributed search to relieve the problem of response time. so finally
give it up.
lucene4's index format is more flexible that it supports customed codecs
and it's now on development, I think it's good time to take it into
consideration
that let it support multithread searching for a single query.
I have a naive solution. dividing docList into many groups
e.g grouping docIds by it's even or odd
term1 df1=4  docList =  0  4  8  10
term1 df2=4  docList = 1  3  9  11

term2 df1=4  docList = 0  6  8  12
term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
and finally merge their results into a single on just like solr
distributed search.
But it's better than solr distributed search.
   First, it's in a single process and data communication between
threads is much
faster than network.
   Second, each threads process the same number of documents.For solr
distributed
search, one shard may process 7 documents and another shard may 1 document
Even if we can make each shard have the same document number. we can not
make it uniformly for each term.
e.g. shard1 has doc1 doc2
   shard2 has doc3 doc4
but term1 may only occur in doc1 and doc2
while term2 may only occur in doc3 and doc4
we may modify it
   shard1 doc1 doc3
   shard2 doc2 doc4
it's good for term1 and term2
but term3 may occur in doc1 and doc3...
So I think it's fine-grained distributed in index while solr
distributed search is coarse-
grained.







2010/12/30 Michael McCandless luc...@mikemccandless.com:
 On Mon, Dec 27, 2010 at 5:08 AM, Li Li fancye...@gmail.com wrote:
 I integrated pfor codec into lucene 2.9.3 and the search time
 comparsion is as follows:
                                   single term   and query   or query
 VINT in lucene 2.9.3         11.2            36.5           38.6
 PFor in lucene 2.9.3         8.7              27.6           33.4
 VINT in lucene 4 branch   10.6             26.5           35.4
 PFor in lcuene 4 branch    8.1              22.5           30.7

 My test terms are high frequncy terms because we are interested in bad case

 I agree it's the bad cases we should focus on in general.  If a super
 fast query gets somewhat slower it's relatively harmless (just a
 capacity question for high volume sites) but if the bad queries get
 slower it's awful (requires faster cutover to sharded architecture),
 until we fix Lucene to run a single search concurrently (which we
 badly need to do).

 It seems lucene 4 branch's implementation of and query(conjuction
 query) is well optimized that even for VINT codec, it's faster than
 PFor in lucene 2.9.3. Could any one tell me what optimization is done?
 is store docIDs and freqs separately making it faster? or anything
 else?

 Actually vInt on the bulkpostings branch stores freq/doc together.  Ie
 the format is the same as 2.9.x's format.  I think it could be the
 fact that AND query does block reads (64 doc/freqs at once) instead of
 doc-at-once?  Ie, because of this, the query is efficitively scanning
 the next block of 64 docs instead of skipping to them?  Our skipping
 impl is unfortunately rather costly so if skip will not skip that many
 docs it's better to scan.

 Another querstion, Is there anyone interested in integrating pfor
 codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
 1.4). And how do I contribute this patch?

 Realistically I don't think we can commit this to 2.9.x -- that branch
 is purely bug fixes at this point.

 Still it's possible others could make use of such a patch so if it's
 not too much work you may as well post it?  It can lead to
 improvements on the bulk postings branch too :)  The more patches the
 merrier!

 You only use PFor for the very high freq terms

Re: strange problem of PForDelta decoder

2010-12-30 Thread Earwin Burrfoot

until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
 distributed search is coarse-
 grained.
This is just crazy :)

The simple way is just to search different segments in parallel.
BalancedSegmentMergePolicy makes sure you have roughly even-sized
large segments (and small ones don't count, they're small!).
If you're bound on squeezing out that extra millisecond (and making
your life miserable along the way), you can search a single segment
with multiple threads (by dividing it in even chunks, and then doing
skipTo to position your iterators to the beginning of each chunk).

First approach is really easy to implement. Second one is harder, but
still doesn't require you to cook the number of CPU cores available
into your index!

It's the law of diminishing returns at play here. You're most likely
to search in parallel over mostly memory-resident index
(RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
tend to slow down considerably on parallel sequential reads, so you
already have pretty decent speed.
Searching different segments in parallel (with BSMP) makes you several
times faster.
Searching in parallel within a segment requires some weird hacks, but
has maybe a few percent advantage over previous solution.
Sharding posting lists requires a great deal of weird hacks, makes
index machine-bound, and boosts speed by another couple of percent.
Sounds worthless.

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li

searching multi segments is a alternative solution but it has some
disadvantages.
1. idf is not global?(I am not familiar with its implementation) maybe
it's easy to solve it by share global idf
2. each segments will has it's own tii and tis files, which may make
search slower(that's why optimization of
index is neccessary)
3. one term's  docList is distributed in many files rather than one.
more than one frq files means
hard disk must seek different tracks, it's time consuming. if there is
only one segment, the are likely
stored in a single track.


2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
 distributed search is coarse-
 grained.
 This is just crazy :)

 The simple way is just to search different segments in parallel.
 BalancedSegmentMergePolicy makes sure you have roughly even-sized
 large segments (and small ones don't count, they're small!).
 If you're bound on squeezing out that extra millisecond (and making
 your life miserable along the way), you can search a single segment
 with multiple threads (by dividing it in even chunks, and then doing
 skipTo to position your iterators to the beginning of each chunk).

 First approach is really easy to implement. Second one is harder, but
 still doesn't require you to cook the number of CPU cores available
 into your index!

 It's the law of diminishing returns at play here. You're most likely
 to search in parallel over mostly memory-resident index
 (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
 tend to slow down considerably on parallel sequential reads, so you
 already have pretty decent speed.
 Searching different segments in parallel (with BSMP) makes you several
 times faster.
 Searching in parallel within a segment requires some weird hacks, but
 has maybe a few percent advantage over previous solution.
 Sharding posting lists requires a great deal of weird hacks, makes
 index machine-bound, and boosts speed by another couple of percent.
 Sounds worthless.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li

plus
2 means search a term need seek many times for tis(if it's not cached in tii)

2010/12/31 Li Li fancye...@gmail.com:
 searching multi segments is a alternative solution but it has some
 disadvantages.
 1. idf is not global?(I am not familiar with its implementation) maybe
 it's easy to solve it by share global idf
 2. each segments will has it's own tii and tis files, which may make
 search slower(that's why optimization of
 index is neccessary)
 3. one term's  docList is distributed in many files rather than one.
 more than one frq files means
 hard disk must seek different tracks, it's time consuming. if there is
 only one segment, the are likely
 stored in a single track.


 2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
 distributed search is coarse-
 grained.
 This is just crazy :)

 The simple way is just to search different segments in parallel.
 BalancedSegmentMergePolicy makes sure you have roughly even-sized
 large segments (and small ones don't count, they're small!).
 If you're bound on squeezing out that extra millisecond (and making
 your life miserable along the way), you can search a single segment
 with multiple threads (by dividing it in even chunks, and then doing
 skipTo to position your iterators to the beginning of each chunk).

 First approach is really easy to implement. Second one is harder, but
 still doesn't require you to cook the number of CPU cores available
 into your index!

 It's the law of diminishing returns at play here. You're most likely
 to search in parallel over mostly memory-resident index
 (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
 tend to slow down considerably on parallel sequential reads, so you
 already have pretty decent speed.
 Searching different segments in parallel (with BSMP) makes you several
 times faster.
 Searching in parallel within a segment requires some weird hacks, but
 has maybe a few percent advantage over previous solution.
 Sharding posting lists requires a great deal of weird hacks, makes
 index machine-bound, and boosts speed by another couple of percent.
 Sounds worthless.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li

is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
it says Multithreading. Indices can be queried and scored concurrently.
maybe we can learn something from it.

2010/12/31 Li Li fancye...@gmail.com:
 plus
 2 means search a term need seek many times for tis(if it's not cached in tii)

 2010/12/31 Li Li fancye...@gmail.com:
 searching multi segments is a alternative solution but it has some
 disadvantages.
 1. idf is not global?(I am not familiar with its implementation) maybe
 it's easy to solve it by share global idf
 2. each segments will has it's own tii and tis files, which may make
 search slower(that's why optimization of
 index is neccessary)
 3. one term's  docList is distributed in many files rather than one.
 more than one frq files means
 hard disk must seek different tracks, it's time consuming. if there is
 only one segment, the are likely
 stored in a single track.


 2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
 distributed search is coarse-
 grained.
 This is just crazy :)

 The simple way is just to search different segments in parallel.
 BalancedSegmentMergePolicy makes sure you have roughly even-sized
 large segments (and small ones don't count, they're small!).
 If you're bound on squeezing out that extra millisecond (and making
 your life miserable along the way), you can search a single segment
 with multiple threads (by dividing it in even chunks, and then doing
 skipTo to position your iterators to the beginning of each chunk).

 First approach is really easy to implement. Second one is harder, but
 still doesn't require you to cook the number of CPU cores available
 into your index!

 It's the law of diminishing returns at play here. You're most likely
 to search in parallel over mostly memory-resident index
 (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
 tend to slow down considerably on parallel sequential reads, so you
 already have pretty decent speed.
 Searching different segments in parallel (with BSMP) makes you several
 times faster.
 Searching in parallel within a segment requires some weird hacks, but
 has maybe a few percent advantage over previous solution.
 Sharding posting lists requires a great deal of weird hacks, makes
 index machine-bound, and boosts speed by another couple of percent.
 Sounds worthless.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-27 Thread Li Li

I integrated pfor codec into lucene 2.9.3 and the search time
comparsion is as follows:
   single term   and query   or query
VINT in lucene 2.9.3 11.236.5   38.6
PFor in lucene 2.9.3 8.7  27.6   33.4
VINT in lucene 4 branch   10.6 26.5   35.4
PFor in lcuene 4 branch8.1  22.5   30.7

My test terms are high frequncy terms because we are interested in bad case
It seems lucene 4 branch's implementation of and query(conjuction
query) is well optimized that even for VINT codec, it's faster than
PFor in lucene 2.9.3. Could any one tell me what optimization is done?
is store docIDs and freqs separately making it faster? or anything
else?

Another querstion, Is there anyone interested in integrating pfor
codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
1.4). And how do I contribute this patch?

2010/12/24 Michael McCandless luc...@mikemccandless.com:
 Well, an early patch somewhere was able to run PFor on trunk, but the
 performance wasn't great because the trunk bulk-read API is a
 bottleneck (this is why the bulk postings branch was created).

 Mike

 On Wed, Dec 22, 2010 at 9:45 PM, Li Li fancye...@gmail.com wrote:
 I used the bulkpostings
 branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
 does trunk have PForDelta decoder/encoder ?

 2010/12/23 Michael McCandless luc...@mikemccandless.com:
 Those are nice speedups!

 Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this 
 test?

 Mike

 On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote:
 great improvement!
 I did a test in our data set. doc count is about 2M+ and index size
 after optimization is about 13.3GB(including fdt)
 it seems lucene4's index format is better than lucene2.9.3. and PFor
 give good results.
 Besides BlockEncoder for frq and pos. is there any other modification
 for lucene 4?

       decoder    \ avg time     single word(ms)          and
 query(ms)     or query(ms)
  VINT in lucene 2.9                   11.2
 36.5                 38.6
  VINT in lucene 4 branch           10.6
 26.5                 35.4
  PFor in lucene 4 branch             8.1
 22.5                 30.7
 2010/12/21 Li Li fancye...@gmail.com:
 OK we should have a look at that one still.  We need to converge on a
 good default codec for 4.0.  Fortunately it's trivial to take any int
 block encoder (fixed or variable block) and make a Lucene codec out of
 it!

 I suggests you not to use this one, I fixed dozens of bugs but it
 still failed when with random tests. it's codes is hand coded rather
 than generated by program. But we may learn something from it.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-23 Thread Michael McCandless

Well, an early patch somewhere was able to run PFor on trunk, but the
performance wasn't great because the trunk bulk-read API is a
bottleneck (this is why the bulk postings branch was created).

Mike

On Wed, Dec 22, 2010 at 9:45 PM, Li Li fancye...@gmail.com wrote:
 I used the bulkpostings
 branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
 does trunk have PForDelta decoder/encoder ?

 2010/12/23 Michael McCandless luc...@mikemccandless.com:
 Those are nice speedups!

 Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this 
 test?

 Mike

 On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote:
 great improvement!
 I did a test in our data set. doc count is about 2M+ and index size
 after optimization is about 13.3GB(including fdt)
 it seems lucene4's index format is better than lucene2.9.3. and PFor
 give good results.
 Besides BlockEncoder for frq and pos. is there any other modification
 for lucene 4?

       decoder    \ avg time     single word(ms)          and
 query(ms)     or query(ms)
  VINT in lucene 2.9                   11.2
 36.5                 38.6
  VINT in lucene 4 branch           10.6
 26.5                 35.4
  PFor in lucene 4 branch             8.1
 22.5                 30.7
 2010/12/21 Li Li fancye...@gmail.com:
 OK we should have a look at that one still.  We need to converge on a
 good default codec for 4.0.  Fortunately it's trivial to take any int
 block encoder (fixed or variable block) and make a Lucene codec out of
 it!

 I suggests you not to use this one, I fixed dozens of bugs but it
 still failed when with random tests. it's codes is hand coded rather
 than generated by program. But we may learn something from it.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-22 Thread Michael McCandless

Those are nice speedups!

Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test?

Mike

On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote:
 great improvement!
 I did a test in our data set. doc count is about 2M+ and index size
 after optimization is about 13.3GB(including fdt)
 it seems lucene4's index format is better than lucene2.9.3. and PFor
 give good results.
 Besides BlockEncoder for frq and pos. is there any other modification
 for lucene 4?

       decoder    \ avg time     single word(ms)          and
 query(ms)     or query(ms)
  VINT in lucene 2.9                   11.2
 36.5                 38.6
  VINT in lucene 4 branch           10.6
 26.5                 35.4
  PFor in lucene 4 branch             8.1
 22.5                 30.7
 2010/12/21 Li Li fancye...@gmail.com:
 OK we should have a look at that one still.  We need to converge on a
 good default codec for 4.0.  Fortunately it's trivial to take any int
 block encoder (fixed or variable block) and make a Lucene codec out of
 it!

 I suggests you not to use this one, I fixed dozens of bugs but it
 still failed when with random tests. it's codes is hand coded rather
 than generated by program. But we may learn something from it.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-22 Thread Li Li

I used the bulkpostings
branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
does trunk have PForDelta decoder/encoder ?

2010/12/23 Michael McCandless luc...@mikemccandless.com:
 Those are nice speedups!

 Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this 
 test?

 Mike

 On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote:
 great improvement!
 I did a test in our data set. doc count is about 2M+ and index size
 after optimization is about 13.3GB(including fdt)
 it seems lucene4's index format is better than lucene2.9.3. and PFor
 give good results.
 Besides BlockEncoder for frq and pos. is there any other modification
 for lucene 4?

       decoder    \ avg time     single word(ms)          and
 query(ms)     or query(ms)
  VINT in lucene 2.9                   11.2
 36.5                 38.6
  VINT in lucene 4 branch           10.6
 26.5                 35.4
  PFor in lucene 4 branch             8.1
 22.5                 30.7
 2010/12/21 Li Li fancye...@gmail.com:
 OK we should have a look at that one still.  We need to converge on a
 good default codec for 4.0.  Fortunately it's trivial to take any int
 block encoder (fixed or variable block) and make a Lucene codec out of
 it!

 I suggests you not to use this one, I fixed dozens of bugs but it
 still failed when with random tests. it's codes is hand coded rather
 than generated by program. But we may learn something from it.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-20 Thread Li Li

   I think random test is not sufficient.
   for normal situation, some branches are not executed. I tested
http://code.google.com/p/integer-array-compress-kit/ with many random
int arrays and it works. But when I use it in real indexing, when in
optimize stage, it corrupted.
  Because PForDelta will choose best numFrameBits and some bit such as
31 is hardly generated by random arrays. So I force the encoder to
choose all possible numFrameBits to test all the decode1 ...decode32
and find some bugs in it.
what's pfor2? using s9/s16 to encode exception and offset?
In http://code.google.com/p/integer-array-compress-kit/ it's s9
for NewPForDelta also have many bugs and also need test each branch to
ensure it works well.

2010/12/20 Michael McCandless luc...@mikemccandless.com:
 It's autogen'd from the Python script in that dir (gendecompress),
 but, we are actively experimenting with the numerous ways to feed it
 data from the file, to see what the JVM can most efficiently execute.

 For testing, we need better coverage here.  But I have an initial
 encode random ints test that I'm about to commit to the bulkpostings
 branch... the pfor1 impl passes it, but pfor2 doesn't yet (I think
 maybe because Simple16 can't handle ints = 2^28?).

 Mike

 On Sun, Dec 19, 2010 at 10:06 PM, Li Li fancye...@gmail.com wrote:
 is ForDecompressImpl generated by codes or manully coded?
 I am frustrated by
 http://code.google.com/p/integer-array-compress-kit/ which contains
 too many bugs( I fixed more than 20 but there still existed bugs)
 Because decoder has too many branches and in normal situation, some
 branches seldom  occurs .

 these decoder implemented in branch assume blockSize==128, it has less
 branches than decoder which support arbitrary blockSize.
 How do you test these decoder to ensure every branch is tested?

 2010/12/16 Michael McCandless luc...@mikemccandless.com:
 On the bulkpostings branch you can do something like this:

  CodecProvider cp = new CodecProvider();
  cp.register(new PatchedFrameOfRefCodec());
  cp.setDefaultFieldCodec(PatchedFrameOfRef);

 Then whenever you create an IW or IR, use the advanced method that
 accepts a CodecProvider.  Then the index will always use PForDelta to
 write/read.

 I suspect conjunction queries got faster because we no longer skip if
 the docID we seek is already in the current buffer (currently sized
 64).  Ie, skip is very costly when the target isn't far.  This was
 sort of an accidental byproduct of forcing even conjunction queries
 using Standard (vInt) codec to work on block buffers, but I think it's
 an important opto that we should more generally apply.

 Skipping for block codecs and Standard/vInt are done w/ the same class
 now.  It's just that the block codec must store the long filePointer
 where the block starts *and* the int offset into the block, vs
 Standard codec that just stores a filePointer.

 On how do we implement bulk read this is the core change on the
 bulkpostings branch -- we have a new API to separately bulk-read
 docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
 improving this (and getting to a clean PFor/For impl) now...

 Mike

 On Thu, Dec 16, 2010 at 4:29 AM, Li Li fancye...@gmail.com wrote:
 hi Michael,
   lucene 4 has so much changes that I don't know how to index and
 search with specified codec. could you please give me some code
 snipplets that using PFor codec so I can trace the codes.
   in you blog 
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
   you said The AND query, curiously, got faster; I think this is
 because I modified its scoring to first try seeking within the block
 of decoded ints.
   I am also curious about the result because VINT only need decode
 part of the doclist while PFor need decode the whole block. But I
 think with conjuction queries, the main time is used for searching in
 skiplist. I haven't read your codes yet. But I guess the skiplist for
 VINT and the skiplist for PFor is different.
   e.g.   lucene 2.9's default skipInterval is 16, so it like
   level 1                                               256
   level 0  16 32 48 64 80 96 112 128 ...   256
   when need skipTo(60) we need read 0 16 32 48 64 in level0
   but when use block, e.g. block size is 128, my implementation of 
 skiplist is
   level 1       256
   level 0 128 256
   when skipTo(60) we only read 2 item in level0 and decode the first
 block which contains 128 docIDs

   How do you implement bulk read?
   I did like this: I decode a block and cache it in a int array. I
 think I can buffer up to 100K docIDs and tfs for disjuction queries(it
 cost less than 1MB memory for each term)
   SegmentTermDocs.read(final int[] docs, final int[] freqs)
           ...
                        while (i  length  count  df) {
                                if (curBlockIdx = curBlockSize) { //this 
 condition is often
 false, we may optimize it. but JVM hotspots will cache hot codes.

Re: strange problem of PForDelta decoder

2010-12-20 Thread Michael McCandless

On Mon, Dec 20, 2010 at 5:49 AM, Li Li fancye...@gmail.com wrote:
   I think random test is not sufficient.
   for normal situation, some branches are not executed. I tested
 http://code.google.com/p/integer-array-compress-kit/ with many random
 int arrays and it works. But when I use it in real indexing, when in
 optimize stage, it corrupted.
  Because PForDelta will choose best numFrameBits and some bit such as
 31 is hardly generated by random arrays. So I force the encoder to
 choose all possible numFrameBits to test all the decode1 ...decode32
 and find some bugs in it.

Good point -- we need to make sure we cover all numFrameBits.  And a
series of 128 random ints in a row will heavily bias for the high num
bits cases.  Maybe if we doing a better job w/ the random source to
try to target all numBits, w/ varying numbers of exceptions, etc...
I'll put a nocommit for this.

    what's pfor2? using s9/s16 to encode exception and offset?

Yeah I just committed pfor2 this morning on the bulk branch.  You can
check it out from
https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

pfor2 came from the patch attached on
https://issues.apache.org/jira/browse/LUCENE-1410 by Hao Yan
(thanks!). It uses s16 for the exceptions (though, there's a bug
somewhere, because it fails the random test), and it takes a different
approachy for encoding exceptions.

    In http://code.google.com/p/integer-array-compress-kit/ it's s9
 for NewPForDelta also have many bugs and also need test each branch to
 ensure it works well.

OK we should have a look at that one still.  We need to converge on a
good default codec for 4.0.  Fortunately it's trivial to take any int
block encoder (fixed or variable block) and make a Lucene codec out of
it!

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-20 Thread Li Li

 OK we should have a look at that one still.  We need to converge on a
 good default codec for 4.0.  Fortunately it's trivial to take any int
 block encoder (fixed or variable block) and make a Lucene codec out of
 it!

I suggests you not to use this one, I fixed dozens of bugs but it
still failed when with random tests. it's codes is hand coded rather
than generated by program. But we may learn something from it.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-19 Thread Li Li

is ForDecompressImpl generated by codes or manully coded?
I am frustrated by
http://code.google.com/p/integer-array-compress-kit/ which contains
too many bugs( I fixed more than 20 but there still existed bugs)
Because decoder has too many branches and in normal situation, some
branches seldom  occurs .

these decoder implemented in branch assume blockSize==128, it has less
branches than decoder which support arbitrary blockSize.
How do you test these decoder to ensure every branch is tested?

2010/12/16 Michael McCandless luc...@mikemccandless.com:
 On the bulkpostings branch you can do something like this:

  CodecProvider cp = new CodecProvider();
  cp.register(new PatchedFrameOfRefCodec());
  cp.setDefaultFieldCodec(PatchedFrameOfRef);

 Then whenever you create an IW or IR, use the advanced method that
 accepts a CodecProvider.  Then the index will always use PForDelta to
 write/read.

 I suspect conjunction queries got faster because we no longer skip if
 the docID we seek is already in the current buffer (currently sized
 64).  Ie, skip is very costly when the target isn't far.  This was
 sort of an accidental byproduct of forcing even conjunction queries
 using Standard (vInt) codec to work on block buffers, but I think it's
 an important opto that we should more generally apply.

 Skipping for block codecs and Standard/vInt are done w/ the same class
 now.  It's just that the block codec must store the long filePointer
 where the block starts *and* the int offset into the block, vs
 Standard codec that just stores a filePointer.

 On how do we implement bulk read this is the core change on the
 bulkpostings branch -- we have a new API to separately bulk-read
 docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
 improving this (and getting to a clean PFor/For impl) now...

 Mike

 On Thu, Dec 16, 2010 at 4:29 AM, Li Li fancye...@gmail.com wrote:
 hi Michael,
   lucene 4 has so much changes that I don't know how to index and
 search with specified codec. could you please give me some code
 snipplets that using PFor codec so I can trace the codes.
   in you blog 
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
   you said The AND query, curiously, got faster; I think this is
 because I modified its scoring to first try seeking within the block
 of decoded ints.
   I am also curious about the result because VINT only need decode
 part of the doclist while PFor need decode the whole block. But I
 think with conjuction queries, the main time is used for searching in
 skiplist. I haven't read your codes yet. But I guess the skiplist for
 VINT and the skiplist for PFor is different.
   e.g.   lucene 2.9's default skipInterval is 16, so it like
   level 1                                               256
   level 0  16 32 48 64 80 96 112 128 ...   256
   when need skipTo(60) we need read 0 16 32 48 64 in level0
   but when use block, e.g. block size is 128, my implementation of skiplist 
 is
   level 1       256
   level 0 128 256
   when skipTo(60) we only read 2 item in level0 and decode the first
 block which contains 128 docIDs

   How do you implement bulk read?
   I did like this: I decode a block and cache it in a int array. I
 think I can buffer up to 100K docIDs and tfs for disjuction queries(it
 cost less than 1MB memory for each term)
   SegmentTermDocs.read(final int[] docs, final int[] freqs)
           ...
                        while (i  length  count  df) {
                                if (curBlockIdx = curBlockSize) { //this 
 condition is often
 false, we may optimize it. but JVM hotspots will cache hot codes. So
 ...
                                        int idBlockBytes = 
 freqStream.readVInt();
                                        curBlockIdx = 0;
                                        for (int k = 0; k  idBlockBytes; 
 k++) {
                                                buffer[k] = 
 freqStream.readInt();
                                        }

                                        blockIds = 
 code.decode(buffer,idBlockBytes);
                                        curBlockSize = blockIds.length;

                                        int tfBlockBytes = 
 freqStream.readVInt();
                                        for (int k = 0; k  tfBlockBytes; 
 k++) {
                                                buffer[k] = 
 freqStream.readInt();
                                        }
                                        blockTfs = code.decode(buffer, 
 tfBlockBytes);
                                        assert curBlockSize == decoded.length;

                                }
                                freq = blockTfs[curBlockIdx];
                                doc += blockIds[curBlockIdx++];

                                count++;

                                if (deletedDocs == null || 
 !deletedDocs.get(doc)) {
                                        docs[i] = doc;

Re: strange problem of PForDelta decoder

2010-12-16 Thread Li Li

hi Michael,
   lucene 4 has so much changes that I don't know how to index and
search with specified codec. could you please give me some code
snipplets that using PFor codec so I can trace the codes.
   in you blog 
http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
   you said The AND query, curiously, got faster; I think this is
because I modified its scoring to first try seeking within the block
of decoded ints.
   I am also curious about the result because VINT only need decode
part of the doclist while PFor need decode the whole block. But I
think with conjuction queries, the main time is used for searching in
skiplist. I haven't read your codes yet. But I guess the skiplist for
VINT and the skiplist for PFor is different.
   e.g.   lucene 2.9's default skipInterval is 16, so it like
   level 1   256
   level 0  16 32 48 64 80 96 112 128 ...   256
   when need skipTo(60) we need read 0 16 32 48 64 in level0
   but when use block, e.g. block size is 128, my implementation of skiplist is
   level 1   256
   level 0 128 256
   when skipTo(60) we only read 2 item in level0 and decode the first
block which contains 128 docIDs

   How do you implement bulk read?
   I did like this: I decode a block and cache it in a int array. I
think I can buffer up to 100K docIDs and tfs for disjuction queries(it
cost less than 1MB memory for each term)
   SegmentTermDocs.read(final int[] docs, final int[] freqs)
   ...
while (i  length  count  df) {
if (curBlockIdx = curBlockSize) { //this 
condition is often
false, we may optimize it. but JVM hotspots will cache hot codes. So
...
int idBlockBytes = 
freqStream.readVInt();
curBlockIdx = 0;
for (int k = 0; k  idBlockBytes; k++) {
buffer[k] = 
freqStream.readInt();
}

blockIds = 
code.decode(buffer,idBlockBytes);
curBlockSize = blockIds.length;

int tfBlockBytes = 
freqStream.readVInt();
for (int k = 0; k  tfBlockBytes; k++) {
buffer[k] = 
freqStream.readInt();
}
blockTfs = code.decode(buffer, 
tfBlockBytes);
assert curBlockSize == decoded.length;  


}
freq = blockTfs[curBlockIdx];
doc += blockIds[curBlockIdx++];

count++;

if (deletedDocs == null || 
!deletedDocs.get(doc)) {
docs[i] = doc;
freqs[i] = freq;
++i;
}
}



2010/12/15 Michael McCandless luc...@mikemccandless.com:
 Hi Li Li,

 That issue has such a big patch, and enough of us are now iterating on
 it, that we cut a dedicated branch for it.

 But note that this branch is off of trunk (to be 4.0).

 You should be able to do this:

  svn checkout 
 https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

 And then run things in there.  I just committed FOR/PFOR prototype
 codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
 using those codecs by running ant test
 -Dtests.codec=PatchedFrameOfRef.

 Please post patches back if you improve things!  We need all the help
 we can get :)

 Mike

 On Wed, Dec 15, 2010 at 5:54 AM, Li Li fancye...@gmail.com wrote:
 hi Michael
    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
    I am not familiar with patch. do I need download
 LUCENE-2723.patch(there are many patches after this name, do I need
 the latest one?) and LUCENE-2723_termscorer.patch and patch them
 (patch -p1 LUCENE-2723.patch)? I just check out the latest source
 code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene


 2010/12/14 Michael McCandless luc...@mikemccandless.com:
 Likely you are seeing the startup cost of hotspot compiling the PFOR code?

 Ie, does your test first warmup the JRE and then do the real test?

 I've also found that running -Xbatch produces more consistent results
 from run to run, however, those results may not be as fast as running
 w/o -Xbatch.

 Also, it's better to test on actual data (ie a Lucene index's
 postings), and in the full context of searching, because then we get a
 sense of what speedups a real app will see... micro-benching is nearly

Re: strange problem of PForDelta decoder

2010-12-16 Thread Michael McCandless

On the bulkpostings branch you can do something like this:

  CodecProvider cp = new CodecProvider();
  cp.register(new PatchedFrameOfRefCodec());
  cp.setDefaultFieldCodec(PatchedFrameOfRef);

Then whenever you create an IW or IR, use the advanced method that
accepts a CodecProvider.  Then the index will always use PForDelta to
write/read.

I suspect conjunction queries got faster because we no longer skip if
the docID we seek is already in the current buffer (currently sized
64).  Ie, skip is very costly when the target isn't far.  This was
sort of an accidental byproduct of forcing even conjunction queries
using Standard (vInt) codec to work on block buffers, but I think it's
an important opto that we should more generally apply.

Skipping for block codecs and Standard/vInt are done w/ the same class
now.  It's just that the block codec must store the long filePointer
where the block starts *and* the int offset into the block, vs
Standard codec that just stores a filePointer.

On how do we implement bulk read this is the core change on the
bulkpostings branch -- we have a new API to separately bulk-read
docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
improving this (and getting to a clean PFor/For impl) now...

Mike

On Thu, Dec 16, 2010 at 4:29 AM, Li Li fancye...@gmail.com wrote:
 hi Michael,
   lucene 4 has so much changes that I don't know how to index and
 search with specified codec. could you please give me some code
 snipplets that using PFor codec so I can trace the codes.
   in you blog 
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
   you said The AND query, curiously, got faster; I think this is
 because I modified its scoring to first try seeking within the block
 of decoded ints.
   I am also curious about the result because VINT only need decode
 part of the doclist while PFor need decode the whole block. But I
 think with conjuction queries, the main time is used for searching in
 skiplist. I haven't read your codes yet. But I guess the skiplist for
 VINT and the skiplist for PFor is different.
   e.g.   lucene 2.9's default skipInterval is 16, so it like
   level 1                                               256
   level 0  16 32 48 64 80 96 112 128 ...   256
   when need skipTo(60) we need read 0 16 32 48 64 in level0
   but when use block, e.g. block size is 128, my implementation of skiplist is
   level 1       256
   level 0 128 256
   when skipTo(60) we only read 2 item in level0 and decode the first
 block which contains 128 docIDs

   How do you implement bulk read?
   I did like this: I decode a block and cache it in a int array. I
 think I can buffer up to 100K docIDs and tfs for disjuction queries(it
 cost less than 1MB memory for each term)
   SegmentTermDocs.read(final int[] docs, final int[] freqs)
           ...
                        while (i  length  count  df) {
                                if (curBlockIdx = curBlockSize) { //this 
 condition is often
 false, we may optimize it. but JVM hotspots will cache hot codes. So
 ...
                                        int idBlockBytes = 
 freqStream.readVInt();
                                        curBlockIdx = 0;
                                        for (int k = 0; k  idBlockBytes; k++) 
 {
                                                buffer[k] = 
 freqStream.readInt();
                                        }

                                        blockIds = 
 code.decode(buffer,idBlockBytes);
                                        curBlockSize = blockIds.length;

                                        int tfBlockBytes = 
 freqStream.readVInt();
                                        for (int k = 0; k  tfBlockBytes; k++) 
 {
                                                buffer[k] = 
 freqStream.readInt();
                                        }
                                        blockTfs = code.decode(buffer, 
 tfBlockBytes);
                                        assert curBlockSize == decoded.length;

                                }
                                freq = blockTfs[curBlockIdx];
                                doc += blockIds[curBlockIdx++];

                                count++;

                                if (deletedDocs == null || 
 !deletedDocs.get(doc)) {
                                        docs[i] = doc;
                                        freqs[i] = freq;
                                        ++i;
                                }
                        }



 2010/12/15 Michael McCandless luc...@mikemccandless.com:
 Hi Li Li,

 That issue has such a big patch, and enough of us are now iterating on
 it, that we cut a dedicated branch for it.

 But note that this branch is off of trunk (to be 4.0).

 You should be able to do this:

  svn checkout 
 https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

 And then run things in there.  I just committed FOR/PFOR prototype

Re: strange problem of PForDelta decoder

2010-12-15 Thread Li Li

hi Michael
you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
I am not familiar with patch. do I need download
LUCENE-2723.patch(there are many patches after this name, do I need
the latest one?) and LUCENE-2723_termscorer.patch and patch them
(patch -p1 LUCENE-2723.patch)? I just check out the latest source
code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene


2010/12/14 Michael McCandless luc...@mikemccandless.com:
 Likely you are seeing the startup cost of hotspot compiling the PFOR code?

 Ie, does your test first warmup the JRE and then do the real test?

 I've also found that running -Xbatch produces more consistent results
 from run to run, however, those results may not be as fast as running
 w/o -Xbatch.

 Also, it's better to test on actual data (ie a Lucene index's
 postings), and in the full context of searching, because then we get a
 sense of what speedups a real app will see... micro-benching is nearly
 impossible in Java since Hotspot acts very differently vs the real
 test.

 Mike

 On Tue, Dec 14, 2010 at 2:50 AM, Li Li fancye...@gmail.com wrote:
 Hi
   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
   I use the implementation in
 http://code.google.com/p/integer-array-compress-kit/
   it implements a basic PForDelta algorithm and an improved one(which
 called NewPForDelta, but there are many bugs and I have fixed them),
   But compare it with VInt and S9, it's speed is very slow when only
 decode small number of integer arrays.
   e.g. when I decoded int[256] arrays which values are randomly
 generated between 0 and 100, if decode just one array. PFor(or
 NewPFor) is very slow. when it continuously decodes many arrays such
 as 1, it's faster than s9 and vint.
   Another strange phenomena is that when call PFor decoder twice, the
 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
 is faster. reverse the call sequcence, the 2nd called decoder is
 faster
   e.g.
                ct.testNewPFDCodes(list);
                ct.testPFor(list);
                ct.testVInt(list);
                ct.testS9(list);

 NewPFD decode: 3614705
 PForDelta decode: 17320
 VINT decode: 16483
 S9 decode: 19835
 when I call by the following sequence

                ct.testPFor(list);
                ct.testNewPFDCodes(list);
                ct.testVInt(list);
                ct.testS9(list);

 PForDelta decode: 3212140
 NewPFD decode: 19556
 VINT decode: 16762
 S9 decode: 16483

   My implementation is -- group docIDs and termDocFreqs into block
 which contains 128 integers. when SegmentTermDocs's next method
 called(or read readNoTf).it decodes a block and save it to a cache.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-15 Thread Michael McCandless

Hi Li Li,

That issue has such a big patch, and enough of us are now iterating on
it, that we cut a dedicated branch for it.

But note that this branch is off of trunk (to be 4.0).

You should be able to do this:

  svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

And then run things in there.  I just committed FOR/PFOR prototype
codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
using those codecs by running ant test
-Dtests.codec=PatchedFrameOfRef.

Please post patches back if you improve things!  We need all the help
we can get :)

Mike

On Wed, Dec 15, 2010 at 5:54 AM, Li Li fancye...@gmail.com wrote:
 hi Michael
    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
    I am not familiar with patch. do I need download
 LUCENE-2723.patch(there are many patches after this name, do I need
 the latest one?) and LUCENE-2723_termscorer.patch and patch them
 (patch -p1 LUCENE-2723.patch)? I just check out the latest source
 code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene


 2010/12/14 Michael McCandless luc...@mikemccandless.com:
 Likely you are seeing the startup cost of hotspot compiling the PFOR code?

 Ie, does your test first warmup the JRE and then do the real test?

 I've also found that running -Xbatch produces more consistent results
 from run to run, however, those results may not be as fast as running
 w/o -Xbatch.

 Also, it's better to test on actual data (ie a Lucene index's
 postings), and in the full context of searching, because then we get a
 sense of what speedups a real app will see... micro-benching is nearly
 impossible in Java since Hotspot acts very differently vs the real
 test.

 Mike

 On Tue, Dec 14, 2010 at 2:50 AM, Li Li fancye...@gmail.com wrote:
 Hi
   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
   I use the implementation in
 http://code.google.com/p/integer-array-compress-kit/
   it implements a basic PForDelta algorithm and an improved one(which
 called NewPForDelta, but there are many bugs and I have fixed them),
   But compare it with VInt and S9, it's speed is very slow when only
 decode small number of integer arrays.
   e.g. when I decoded int[256] arrays which values are randomly
 generated between 0 and 100, if decode just one array. PFor(or
 NewPFor) is very slow. when it continuously decodes many arrays such
 as 1, it's faster than s9 and vint.
   Another strange phenomena is that when call PFor decoder twice, the
 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
 is faster. reverse the call sequcence, the 2nd called decoder is
 faster
   e.g.
                ct.testNewPFDCodes(list);
                ct.testPFor(list);
                ct.testVInt(list);
                ct.testS9(list);

 NewPFD decode: 3614705
 PForDelta decode: 17320
 VINT decode: 16483
 S9 decode: 19835
 when I call by the following sequence

                ct.testPFor(list);
                ct.testNewPFDCodes(list);
                ct.testVInt(list);
                ct.testS9(list);

 PForDelta decode: 3212140
 NewPFD decode: 19556
 VINT decode: 16762
 S9 decode: 16483

   My implementation is -- group docIDs and termDocFreqs into block
 which contains 128 integers. when SegmentTermDocs's next method
 called(or read readNoTf).it decodes a block and save it to a cache.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-15 Thread Li Li

thanks.
   I'd like trying it and do some experiment on our dataset.

2010/12/15 Michael McCandless luc...@mikemccandless.com:
 Hi Li Li,

 That issue has such a big patch, and enough of us are now iterating on
 it, that we cut a dedicated branch for it.

 But note that this branch is off of trunk (to be 4.0).

 You should be able to do this:

  svn checkout 
 https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

 And then run things in there.  I just committed FOR/PFOR prototype
 codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
 using those codecs by running ant test
 -Dtests.codec=PatchedFrameOfRef.

 Please post patches back if you improve things!  We need all the help
 we can get :)

 Mike

 On Wed, Dec 15, 2010 at 5:54 AM, Li Li fancye...@gmail.com wrote:
 hi Michael
    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
    I am not familiar with patch. do I need download
 LUCENE-2723.patch(there are many patches after this name, do I need
 the latest one?) and LUCENE-2723_termscorer.patch and patch them
 (patch -p1 LUCENE-2723.patch)? I just check out the latest source
 code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene


 2010/12/14 Michael McCandless luc...@mikemccandless.com:
 Likely you are seeing the startup cost of hotspot compiling the PFOR code?

 Ie, does your test first warmup the JRE and then do the real test?

 I've also found that running -Xbatch produces more consistent results
 from run to run, however, those results may not be as fast as running
 w/o -Xbatch.

 Also, it's better to test on actual data (ie a Lucene index's
 postings), and in the full context of searching, because then we get a
 sense of what speedups a real app will see... micro-benching is nearly
 impossible in Java since Hotspot acts very differently vs the real
 test.

 Mike

 On Tue, Dec 14, 2010 at 2:50 AM, Li Li fancye...@gmail.com wrote:
 Hi
   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
   I use the implementation in
 http://code.google.com/p/integer-array-compress-kit/
   it implements a basic PForDelta algorithm and an improved one(which
 called NewPForDelta, but there are many bugs and I have fixed them),
   But compare it with VInt and S9, it's speed is very slow when only
 decode small number of integer arrays.
   e.g. when I decoded int[256] arrays which values are randomly
 generated between 0 and 100, if decode just one array. PFor(or
 NewPFor) is very slow. when it continuously decodes many arrays such
 as 1, it's faster than s9 and vint.
   Another strange phenomena is that when call PFor decoder twice, the
 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
 is faster. reverse the call sequcence, the 2nd called decoder is
 faster
   e.g.
                ct.testNewPFDCodes(list);
                ct.testPFor(list);
                ct.testVInt(list);
                ct.testS9(list);

 NewPFD decode: 3614705
 PForDelta decode: 17320
 VINT decode: 16483
 S9 decode: 19835
 when I call by the following sequence

                ct.testPFor(list);
                ct.testNewPFDCodes(list);
                ct.testVInt(list);
                ct.testS9(list);

 PForDelta decode: 3212140
 NewPFD decode: 19556
 VINT decode: 16762
 S9 decode: 16483

   My implementation is -- group docIDs and termDocFreqs into block
 which contains 128 integers. when SegmentTermDocs's next method
 called(or read readNoTf).it decodes a block and save it to a cache.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-14 Thread Michael McCandless

Likely you are seeing the startup cost of hotspot compiling the PFOR code?

Ie, does your test first warmup the JRE and then do the real test?

I've also found that running -Xbatch produces more consistent results
from run to run, however, those results may not be as fast as running
w/o -Xbatch.

Also, it's better to test on actual data (ie a Lucene index's
postings), and in the full context of searching, because then we get a
sense of what speedups a real app will see... micro-benching is nearly
impossible in Java since Hotspot acts very differently vs the real
test.

Mike

On Tue, Dec 14, 2010 at 2:50 AM, Li Li fancye...@gmail.com wrote:
 Hi
   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
   I use the implementation in
 http://code.google.com/p/integer-array-compress-kit/
   it implements a basic PForDelta algorithm and an improved one(which
 called NewPForDelta, but there are many bugs and I have fixed them),
   But compare it with VInt and S9, it's speed is very slow when only
 decode small number of integer arrays.
   e.g. when I decoded int[256] arrays which values are randomly
 generated between 0 and 100, if decode just one array. PFor(or
 NewPFor) is very slow. when it continuously decodes many arrays such
 as 1, it's faster than s9 and vint.
   Another strange phenomena is that when call PFor decoder twice, the
 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
 is faster. reverse the call sequcence, the 2nd called decoder is
 faster
   e.g.
                ct.testNewPFDCodes(list);
                ct.testPFor(list);
                ct.testVInt(list);
                ct.testS9(list);

 NewPFD decode: 3614705
 PForDelta decode: 17320
 VINT decode: 16483
 S9 decode: 19835
 when I call by the following sequence

                ct.testPFor(list);
                ct.testNewPFDCodes(list);
                ct.testVInt(list);
                ct.testS9(list);

 PForDelta decode: 3212140
 NewPFD decode: 19556
 VINT decode: 16762
 S9 decode: 16483

   My implementation is -- group docIDs and termDocFreqs into block
 which contains 128 integers. when SegmentTermDocs's next method
 called(or read readNoTf).it decodes a block and save it to a cache.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

2010-12-14 Thread Li Li

but I let VINT and S9 decoder run first, their time is the same as
when they are called in the end. it's stable

2010/12/14 Michael McCandless luc...@mikemccandless.com:
 Likely you are seeing the startup cost of hotspot compiling the PFOR code?

 Ie, does your test first warmup the JRE and then do the real test?

 I've also found that running -Xbatch produces more consistent results
 from run to run, however, those results may not be as fast as running
 w/o -Xbatch.

 Also, it's better to test on actual data (ie a Lucene index's
 postings), and in the full context of searching, because then we get a
 sense of what speedups a real app will see... micro-benching is nearly
 impossible in Java since Hotspot acts very differently vs the real
 test.

 Mike

 On Tue, Dec 14, 2010 at 2:50 AM, Li Li fancye...@gmail.com wrote:
 Hi
   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
   I use the implementation in
 http://code.google.com/p/integer-array-compress-kit/
   it implements a basic PForDelta algorithm and an improved one(which
 called NewPForDelta, but there are many bugs and I have fixed them),
   But compare it with VInt and S9, it's speed is very slow when only
 decode small number of integer arrays.
   e.g. when I decoded int[256] arrays which values are randomly
 generated between 0 and 100, if decode just one array. PFor(or
 NewPFor) is very slow. when it continuously decodes many arrays such
 as 1, it's faster than s9 and vint.
   Another strange phenomena is that when call PFor decoder twice, the
 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
 is faster. reverse the call sequcence, the 2nd called decoder is
 faster
   e.g.
                ct.testNewPFDCodes(list);
                ct.testPFor(list);
                ct.testVInt(list);
                ct.testS9(list);

 NewPFD decode: 3614705
 PForDelta decode: 17320
 VINT decode: 16483
 S9 decode: 19835
 when I call by the following sequence

                ct.testPFor(list);
                ct.testNewPFDCodes(list);
                ct.testVInt(list);
                ct.testS9(list);

 PForDelta decode: 3212140
 NewPFD decode: 19556
 VINT decode: 16762
 S9 decode: 16483

   My implementation is -- group docIDs and termDocFreqs into block
 which contains 128 integers. when SegmentTermDocs's next method
 called(or read readNoTf).it decodes a block and save it to a cache.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

Re: strange problem of PForDelta decoder

29 matches

Site Navigation

Mail list logo

Footer information