from:"Mathias Bank"

Counting search results

2009-09-15 Thread Mathias Bank

Hello,

I'm trying to find the number of documents for a specific term to
create text statistics. I'm not interested in ordering the results or
even recieving the first result. I just need the number of results.

Currently, I'm trying to do this by using the lucene searcher class:

IndexSearcher searcher = new IndexSearcher(reader);
String queryString = fieldname+":" + term;
QueryParser parser = new QueryParser(fieldname, new GermanAnalyzer());
TopDocs d = searcher.search(parser.parse(queryString), filter, 1);
int count = d.totalHits;

The problem is, that there is a large index (optimized) with > 8 mio.
entries. One search could return a large number of search results (> 1
mio). Currently these search tasks take more than 15 secunds.

The question is: is there any way to get the number of search results
faster? I think, that it could be optimized by not using a Weight
object (order is not interesting), but I haven't seen a way to do
this.

I hope, someone has already solved this problem.

Mathias

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Counting search results

2009-09-15 Thread Mathias Bank

Hello,

This seams to be a similar solution like:

Term t = new Term(fieldname, term);
int count = searcher.docFreq(t);

The problem is, that in this situation it is not possible to apply a
filter object. If I don't wanna use this filter object, I would have
to use a complex search query, wich is - again - very slow. So,
unfortunatelly, your solution does not help.

Mathias

2009/9/15 Simon Willnauer :
> Did you try:
> int numDocs
> TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm"));
> while (termDocs.next()) { numDocs++; }
>
> simon
>
> On Tue, Sep 15, 2009 at 2:19 PM, Mathias Bank  wrote:
>> Hello,
>>
>> I'm trying to find the number of documents for a specific term to
>> create text statistics. I'm not interested in ordering the results or
>> even recieving the first result. I just need the number of results.
>>
>> Currently, I'm trying to do this by using the lucene searcher class:
>>
>> IndexSearcher searcher = new IndexSearcher(reader);
>> String queryString = fieldname+":" + term;
>> QueryParser parser = new QueryParser(fieldname, new GermanAnalyzer());
>> TopDocs d = searcher.search(parser.parse(queryString), filter, 1);
>> int count = d.totalHits;
>>
>> The problem is, that there is a large index (optimized) with > 8 mio.
>> entries. One search could return a large number of search results (> 1
>> mio). Currently these search tasks take more than 15 secunds.
>>
>> The question is: is there any way to get the number of search results
>> faster? I think, that it could be optimized by not using a Weight
>> object (order is not interesting), but I haven't seen a way to do
>> this.
>>
>> I hope, someone has already solved this problem.
>>
>> Mathias
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Counting search results

2009-09-17 Thread Mathias Bank

Hello,

I have tried your method, but it doesn't work.

set will be null after applying

BitSet set = filter.bits(reader);

I haven't found any reason for this.

Additionally, the bits method is deprecated and it is mentioned to use
"getDocIdSet". But this set does only provide an iterator, no hash
checks are possible.

Are there any other possibilities to improve speed?

Mathias


Am 15.09.2009 17:13 schrieb Simon Willnauer :
> Hmm, so if you wanna use the Filter to narrow down the search results
>
> you could use it in the while loop like this:
>
>
>
> BitSet set = filter.bits(reader);
>
>  int numDocs
>
> TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm"));
>
> while (termDocs.next()) {
>
>  if(set.get(termDocs.doc()))
>
>    numDocs++;
>
> }
>
>
>
> would that help?
>
>
>
> simon
>
> >>
>
> On Tue, Sep 15, 2009 at 5:01 PM, Mathias Bank mathias.b...@gmail.com> wrote:
>
> > Hello,
>
> >
>
> > This seams to be a similar solution like:
>
> >
>
> > Term t = new Term(fieldname, term);
>
> > int count = searcher.docFreq(t);
>
> >
>
> > The problem is, that in this situation it is not possible to apply a
>
> > filter object. If I don't wanna use this filter object, I would have
>
> > to use a complex search query, wich is - again - very slow. So,
>
> > unfortunatelly, your solution does not help.
>
> >
>
> > Mathias
>
> >
>
> > 2009/9/15 Simon Willnauer simon.willna...@googlemail.com>:
>
> >> Did you try:
>
> >> int numDocs
>
> >> TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm"));
>
> >> while (termDocs.next()) { numDocs++; }
>
> >>
>
> >> simon
>
> >>
>
> >> On Tue, Sep 15, 2009 at 2:19 PM, Mathias Bank mathias.b...@gmail.com> 
> >> wrote:
>
> >>> Hello,
>
> >>>
>
> >>> I'm trying to find the number of documents for a specific term to
>
> >>> create text statistics. I'm not interested in ordering the results or
>
> >>> even recieving the first result. I just need the number of results.
>
> >>>
>
> >>> Currently, I'm trying to do this by using the lucene searcher class:
>
> >>>
>
> >>> IndexSearcher searcher = new IndexSearcher(reader);
>
> >>> String queryString = fieldname+":" + term;
>
> >>> QueryParser parser = new QueryParser(fieldname, new GermanAnalyzer());
>
> >>> TopDocs d = searcher.search(parser.parse(queryString), filter, 1);
>
> >>> int count = d.totalHits;
>
> >>>
>
> >>> The problem is, that there is a large index (optimized) with > 8 mio.
>
> >>> entries. One search could return a large number of search results (> 1
>
> >>> mio). Currently these search tasks take more than 15 secunds.
>
> >>>
>
> >>> The question is: is there any way to get the number of search results
>
> >>> faster? I think, that it could be optimized by not using a Weight
>
> >>> object (order is not interesting), but I haven't seen a way to do
>
> >>> this.
>
> >>>
>
> >>> I hope, someone has already solved this problem.
>
> >>>
>
> >>> Mathias
>
> >>>
>
> >>> -
>
> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>
> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> >>>
>
> >>>
>
> >>
>
> >> -
>
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> >>
>
> >>
>
> >
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Creating tag clouds with lucene

2009-11-06 Thread Mathias Bank

Well, it could be a facet search, if there would be tags available but
if you just wanna have a "tag cloud" generated by full-text, I don't
see how a facet search could help to generate this cloud.
Unfortunatelly, I don't have tags in my data. What I need is the
information, what are the most used terms (or multi terms) in this
data. First I have thought of using carrot2, which uses a specialed
clustering algorithm. But I have wondered, if it is not possible to
get the most used terms out of lucene directly.

Glen has mentioned, that he is doing this for full-text data. He
mentioned that he is using the IndexReader.termDocs(Term term) method.
So I think he iterates all terms and looks in how many documents this
term exists. But what I don't see is: how does this method work with a
filter? Do you first look for all documents which are valid for the
used filter and than iterate all terms only counting documents in this
filtered set? I cannot imagine, that this is performant because I have
more than 10 mio documents (fast growing).

Mathias

2009/11/6 Chris Lu :
> Isn't the tag cloud just another facet search? Only difference is the tag is
> multi-valued.
>
> Basically just go through the search results and find all unique tag values.
>
> --
> Chris Lu
> -
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
> Mathias Bank wrote:
>>
>> Hi,
>>
>> I want to calculate a tag cload for search results. I have seen, that
>> it is possible to extract the top 20 words out of the lucene index. Is
>> there also a possibility to extract the top 20 words out of search
>> results (or filter results) in lucene?
>>
>> Mathias
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Counting search results

Re: Counting search results

Re: Counting search results

Re: Creating tag clouds with lucene

4 matches

Site Navigation

Mail list logo

Footer information