Counting search results
Hello, I'm trying to find the number of documents for a specific term to create text statistics. I'm not interested in ordering the results or even recieving the first result. I just need the number of results. Currently, I'm trying to do this by using the lucene searcher class: IndexSearcher searcher = new IndexSearcher(reader); String queryString = fieldname+":" + term; QueryParser parser = new QueryParser(fieldname, new GermanAnalyzer()); TopDocs d = searcher.search(parser.parse(queryString), filter, 1); int count = d.totalHits; The problem is, that there is a large index (optimized) with > 8 mio. entries. One search could return a large number of search results (> 1 mio). Currently these search tasks take more than 15 secunds. The question is: is there any way to get the number of search results faster? I think, that it could be optimized by not using a Weight object (order is not interesting), but I haven't seen a way to do this. I hope, someone has already solved this problem. Mathias - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Counting search results
Hello, This seams to be a similar solution like: Term t = new Term(fieldname, term); int count = searcher.docFreq(t); The problem is, that in this situation it is not possible to apply a filter object. If I don't wanna use this filter object, I would have to use a complex search query, wich is - again - very slow. So, unfortunatelly, your solution does not help. Mathias 2009/9/15 Simon Willnauer : > Did you try: > int numDocs > TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm")); > while (termDocs.next()) { numDocs++; } > > simon > > On Tue, Sep 15, 2009 at 2:19 PM, Mathias Bank wrote: >> Hello, >> >> I'm trying to find the number of documents for a specific term to >> create text statistics. I'm not interested in ordering the results or >> even recieving the first result. I just need the number of results. >> >> Currently, I'm trying to do this by using the lucene searcher class: >> >> IndexSearcher searcher = new IndexSearcher(reader); >> String queryString = fieldname+":" + term; >> QueryParser parser = new QueryParser(fieldname, new GermanAnalyzer()); >> TopDocs d = searcher.search(parser.parse(queryString), filter, 1); >> int count = d.totalHits; >> >> The problem is, that there is a large index (optimized) with > 8 mio. >> entries. One search could return a large number of search results (> 1 >> mio). Currently these search tasks take more than 15 secunds. >> >> The question is: is there any way to get the number of search results >> faster? I think, that it could be optimized by not using a Weight >> object (order is not interesting), but I haven't seen a way to do >> this. >> >> I hope, someone has already solved this problem. >> >> Mathias >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Counting search results
Hello, I have tried your method, but it doesn't work. set will be null after applying BitSet set = filter.bits(reader); I haven't found any reason for this. Additionally, the bits method is deprecated and it is mentioned to use "getDocIdSet". But this set does only provide an iterator, no hash checks are possible. Are there any other possibilities to improve speed? Mathias Am 15.09.2009 17:13 schrieb Simon Willnauer : > Hmm, so if you wanna use the Filter to narrow down the search results > > you could use it in the while loop like this: > > > > BitSet set = filter.bits(reader); > > int numDocs > > TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm")); > > while (termDocs.next()) { > > if(set.get(termDocs.doc())) > > numDocs++; > > } > > > > would that help? > > > > simon > > >> > > On Tue, Sep 15, 2009 at 5:01 PM, Mathias Bank mathias.b...@gmail.com> wrote: > > > Hello, > > > > > > This seams to be a similar solution like: > > > > > > Term t = new Term(fieldname, term); > > > int count = searcher.docFreq(t); > > > > > > The problem is, that in this situation it is not possible to apply a > > > filter object. If I don't wanna use this filter object, I would have > > > to use a complex search query, wich is - again - very slow. So, > > > unfortunatelly, your solution does not help. > > > > > > Mathias > > > > > > 2009/9/15 Simon Willnauer simon.willna...@googlemail.com>: > > >> Did you try: > > >> int numDocs > > >> TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm")); > > >> while (termDocs.next()) { numDocs++; } > > >> > > >> simon > > >> > > >> On Tue, Sep 15, 2009 at 2:19 PM, Mathias Bank mathias.b...@gmail.com> > >> wrote: > > >>> Hello, > > >>> > > >>> I'm trying to find the number of documents for a specific term to > > >>> create text statistics. I'm not interested in ordering the results or > > >>> even recieving the first result. I just need the number of results. > > >>> > > >>> Currently, I'm trying to do this by using the lucene searcher class: > > >>> > > >>> IndexSearcher searcher = new IndexSearcher(reader); > > >>> String queryString = fieldname+":" + term; > > >>> QueryParser parser = new QueryParser(fieldname, new GermanAnalyzer()); > > >>> TopDocs d = searcher.search(parser.parse(queryString), filter, 1); > > >>> int count = d.totalHits; > > >>> > > >>> The problem is, that there is a large index (optimized) with > 8 mio. > > >>> entries. One search could return a large number of search results (> 1 > > >>> mio). Currently these search tasks take more than 15 secunds. > > >>> > > >>> The question is: is there any way to get the number of search results > > >>> faster? I think, that it could be optimized by not using a Weight > > >>> object (order is not interesting), but I haven't seen a way to do > > >>> this. > > >>> > > >>> I hope, someone has already solved this problem. > > >>> > > >>> Mathias > > >>> > > >>> - > > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >>> > > >>> > > >> > > >> - > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Creating tag clouds with lucene
Well, it could be a facet search, if there would be tags available but if you just wanna have a "tag cloud" generated by full-text, I don't see how a facet search could help to generate this cloud. Unfortunatelly, I don't have tags in my data. What I need is the information, what are the most used terms (or multi terms) in this data. First I have thought of using carrot2, which uses a specialed clustering algorithm. But I have wondered, if it is not possible to get the most used terms out of lucene directly. Glen has mentioned, that he is doing this for full-text data. He mentioned that he is using the IndexReader.termDocs(Term term) method. So I think he iterates all terms and looks in how many documents this term exists. But what I don't see is: how does this method work with a filter? Do you first look for all documents which are valid for the used filter and than iterate all terms only counting documents in this filtered set? I cannot imagine, that this is performant because I have more than 10 mio documents (fast growing). Mathias 2009/11/6 Chris Lu : > Isn't the tag cloud just another facet search? Only difference is the tag is > multi-valued. > > Basically just go through the search results and find all unique tag values. > > -- > Chris Lu > - > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > DBSight customer, a shopping comparison site, (anonymous per request) got > 2.6 Million Euro funding! > > > Mathias Bank wrote: >> >> Hi, >> >> I want to calculate a tag cload for search results. I have seen, that >> it is possible to extract the top 20 words out of the lucene index. Is >> there also a possibility to extract the top 20 words out of search >> results (or filter results) in lucene? >> >> Mathias >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org