The performance results in my previous posting were based on an
implementation that performs 2 searches, one for getting 'Hits' and another
for getting the BitSet. I reimplemented this in one search using the code in
'SolrIndexSearcher.getDocListAndSetNC' and I'm now getting throughput of
350-375
I'm seeing query throughput of approx. 290 qps with OpenBitSet vs. 270 with
BitSet. I had to reduce the max. HashDocSet size to 2K - 3K (from 10K-20K)
to get optimal tradeoff.
no. docs in index: 730,000
average no. results returned: 40
average response time: 50 msec (15-20 for counting facets)
no
On 6/10/06, z shalev <[EMAIL PROTECTED]> wrote:
"the number of facets to check per request
average somewhere between 100 and 200 (the total number of unique
facets is much larger though). "
you mean 100 - 200 different catagories to facet?
I was going by memory, but 100 to 200 set inte
hi yonik,
thanks for the thurough reply,,
a few more quick questions...
"the number of facets to check per request
average somewhere between 100 and 200 (the total number of unique
facets is much larger though). "
you mean 100 - 200 different catagories to facet?
i ran
: A can tell you a little bit about ours... on one CNET faceted browsing
: implementation using Solr, the number of facets to check per request
: average somewhere between 100 and 200 (the total number of unique
: facets is much larger though). The median request time is 3ms (and I
: don't think
On 6/9/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
However, my throughput testing shows that the Solr method is at least 50%
faster than mine. I'm seeing a big win with the use of the HashDocSet for
lower hit counts. On my 64-bit platform, a MAX_SIZE value of 10K-20K seems
to provide optimal perf
On 6/10/06, z shalev <[EMAIL PROTECTED]> wrote:
1. could you let me know what kind of response time you were getting with
solr (as well as the size of data and result sizes)
A can tell you a little bit about ours... on one CNET faceted browsing
implementation using Solr, the number of fa
hi peter,
two quick questions
1. could you let me know what kind of response time you were getting with
solr (as well as the size of data and result sizes)
2. i took a really really quick look at DocSetHitCollector and saw the
dreaded
if (bits==null) bits = new BitSe
I compared Solr's DocSetHitCollector and counting bitset intersections to
get facet counts with a different approach that uses a custom hit collector
that tests each docid hit (bit) with each facets' bitset and increments a
count in a histogram. My assumption was that for queries with few hits, th
i know im a little late replying to this thread, but, in my humble opinion the
best way to aggregate values (not necessarily terms, but whole values in
fields) is as follows:
startup stage:
for each field you would like to aggregate create a hashmap
open an index reader and run
; Sent: Monday, May 22, 2006 2:07 AM
> To: java-user@lucene.apache.org
> Subject: Re: Aggregating category hits
>
> Hi Jelda,
> Is there any way by which I can achieve sorting of search
> results along with overriding the collect method of the
> HitCollector in this case?
&
t (do lazy initialization of
categoryCounts
holder.FAQ.)
//6 You are done.. :)
All the best,
Jelda
-Original Message-
From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Tuesday,
May 16, 2006 11:50 AM
To: java-user@lucene.apache.org
Subject: Re: Aggregating category hits
Hi Jel
Thanks, all.
The field cache and the bitsets both seem like good options until the
collection grows too large, provided that the index does not need to
be updated very frequently. Then for large collections, there's
statistical sampling. Any of those options seems preferable to
retriev
ginal Message-
From: Kapil Chhabra [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 16, 2006 11:50 AM
To: java-user@lucene.apache.org
Subject: Re: Aggregating category hits
Hi Jelda,
I have not yet migrated to Lucene 1.9 and I guess FieldCache
has been introduced in this release.
Can you please
o: java-user@lucene.apache.org
> Subject: Re: Aggregating category hits
>
> Hi Jelda,
> I have not yet migrated to Lucene 1.9 and I guess FieldCache
> has been introduced in this release.
> Can you please give me a pointer to your strategy of FieldCache?
>
> Thanks & Regards,
&g
you have
documents in million numbers and categories in thousands.
So I preferred in my project FieldCache strategy.
Jelda
-Original Message-
From: Kapil Chhabra [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 16, 2006 7:38 AM
To: java-user@lucene.apache.org
Subject: Re: Aggregating cate
On May 16, 2006, at 1:37 AM, Kapil Chhabra wrote:
Even I am doing the same in my application.
Once in a day, all the filters [for different categories] are
initialized. Each time a query is fired, the Query BitSet is ANDed
with the BitSet of each filter. The cardinality obtained is the
des
2006 7:38 AM
> To: java-user@lucene.apache.org
> Subject: Re: Aggregating category hits
>
> Even I am doing the same in my application.
> Once in a day, all the filters [for different categories] are
> initialized. Each time a query is fired, the Query BitSet is
> ANDed with
Even I am doing the same in my application.
Once in a day, all the filters [for different categories] are
initialized. Each time a query is fired, the Query BitSet is ANDed with
the BitSet of each filter. The cardinality obtained is the desired output.
@Eric: I would like to know more about the
On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote:
If you needed to know not just the total number of hits, but the
number of hits in each "category", how would you handle that?
For instance, a search for "egg" would have to produce the 20 most
relevant documents for "egg", but also a list
Marvin Humphrey wrote:
Greets,
If you needed to know not just the total number of hits, but the
number of hits in each "category", how would you handle that?
For instance, a search for "egg" would have to produce the 20 most
relevant documents for "egg", but also a list like this:
Holi
Greets,
If you needed to know not just the total number of hits, but the
number of hits in each "category", how would you handle that?
For instance, a search for "egg" would have to produce the 20 most
relevant documents for "egg", but also a list like this:
Holiday & Seasonal / Easte
22 matches
Mail list logo