Re: Lucene sort performance roots?

2011-06-24 Thread Michael Sokolov
Because of this top-n behavior, its generally slow with Lucene to scan deeply into the result set. If you want to go on page 100 of your search results, the priority queue must at least have a size of n=docsPerPage*100. Because of this, most full text search engines (e.g. Google does this, too)

Re: Lucene sort performance roots?

2011-06-24 Thread pravesh
You might be doing search+sort together in a single query. Since lucene beats RDBMs in text search, so, could be that you are actually benefitting from the text search+sort, which would otherwise be slow on RDBMS. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/

RE: field sorted searches with unbounded hit count

2011-06-24 Thread Tim Eck
> if you use the same IndexReader / Searcher for both queries nothing > changes. How frequently do you open your index? I'm currently using the "real-time" readers from IndexWriter.getReader() and never closing my IndexWriter. I was (perhaps wrongly) assuming that those readers can observe mutat

Constructing an IDF table without indexing documents

2011-06-24 Thread Xiyang Chen
Hi, I'm developing a search application with two types of documents: 1. Documents that need to be indexed and queried against 2. Documents that will never show up in search results, but their content needs to contribute to the global term frequency table In other words, the application

RE: Does {Filter}ing is faster than {Query}ing in Lucene?

2011-06-24 Thread Uwe Schindler
Hi, If you dont cache filters, queries will be faster, as the ConjunctionScorer in Lucene has optimizations, which are currently not used for Filters. Filters are fine, if you cache them (e.g. if you always have the same access restrictions for a specific user that are applied to all his queries).

RE: Lucene sort performance roots?

2011-06-24 Thread Uwe Schindler
> Lucene is a great toolkit for search applications. And it's so fast in most of > cases. I think I am understand why it's faster than relational databases for > information retrieval. For example, Lucene use very efficient index than > allows to retrieve posting list in constant time and do inte

Re: Lucene sort performance roots?

2011-06-24 Thread Dawid Weiss
My guess is that it's an advantage of doing the sort in-memory and reusing caches. Hard to tell without the snippet of code that you're actually using. Dawid On Fri, Jun 24, 2011 at 8:04 AM, Denis Bazhenov wrote: > Yes, sorry. I should explain it. > > What we are using is sorting by field value.

Re: Does {Filter}ing is faster than {Query}ing in Lucene?

2011-06-24 Thread Ian Lea
Generalisation is risky, particularly wrt performance, but I'd say yes, particularly if you can cache and reuse the filter e.g. with CachingWrapperFilter. See http://wiki.apache.org/lucene-java/FilteringOptions. Not very up to date but I'd expect the conclusions to stand. -- Ian. On Fri, Jun

Re: spaces in the field name

2011-06-24 Thread Ian Lea
As far as I'm aware spaces are allowed in field names and a quick test appears to confirm that. Feel free to disagree (I'm often wrong) with more detail - or proof. -- Ian. On Thu, Jun 23, 2011 at 10:26 PM, Nilesh Vijaywargiay wrote: >  I have a situation where the field name consists of spac