I had similar requirements of "count" and "group by" on over 130mil
records, it's really a pain. It's currently usable but not
satisfactory.
Currently it's grouping at run-time by iterating through ungrouped
items. It collects matching documents into BitSet, so subsequent
queries can use BitSet
thanks a lot for your suggestion.
I'll try it and get back if need be.
Meanwhile, I gave it a thought and concluded that the best time to do the
categorization/clustering should be lucene calculates Hits/in the Scrorer.
I am not sure if I am right.
In addition to the current functionality can w
: Suppose I cluster the results only on the 1st field i.e. I do not show
: the constituent clusters. Even in this case, i'll require around 900
: Filters[i have 900 unique terms] in memory and will have to run the same
: query 900 times, 1 on each Filter. I am sitting at a situation where I
: get
thanks for the response.
I understand that using Filters can do the trick, but there are other issues
invloved.
Suppose I cluster the results only on the 1st field i.e. I do not show the
constituent clusters.
Even in this case, i'll require around 900 Filters[i have 900 unique terms] in
memory
the approach(es) I described in this thread...
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL
PROTECTED]
...should work, but you have the added complexity of whating the counts
not just for all unique values in a field, but all the permutations of
values from two f
well you're not going to like my answer, to that, if what you're looking
for is a group by result depending on the unique values of a field or a
combination of fields ( field-3 field-4 ), something that in SQL would
look like this :
select field-3 , field-4 , count(*) from ... where . grou
Thanks for the prompt reply.
I have to bunch the results [only the count] on the basis of value of one of
the FIELDS.
lets say FILED-3 and with in it FILED-4.
It is very much similar to using "group by"
What can be the options of doing this? And which is the best way to do it?
Thanks in antic
I don't understand your requirement, what do you want to bunch your
results by?
Can you explain so I can help
Nader Henein
kapilChhabra (sent by Nabble.com) wrote:
Hi All,
I have been using Lucene in my application to search over 4 million recordes
updated daily.
I am currently using a sing