Hi Andrew, We are running solr using its http interface from python. From the resources I could find, EmbeddedSolrServer is possible only if I am using solr from a java program. It will be useful to understand if a significant part of the performance increase is due to bypassing HTTP before going down this path.
In the mean time I am trying my luck with the other suggestions. Can you share the patch that helps cache solr documents instead of lucene documents? On a different note, I am wondering why does it take 4 - 5 seconds for Solr to return the ID's of ranked documents when it can rank the results in about 20 milli seconds? Am I missing something here? Thanks, Raghu On Fri, Nov 27, 2009 at 2:15 AM, Andrey Klochkov <akloch...@griddynamics.com > wrote: > Hi > > We obtain ALL documents for every query, the index size is about 50k. We > use > number of stored fields. Often the result set size is several thousands of > docs. > > We performed the following things to make it faster: > > 1. Use EmbeddedSolrServer > 2. Patch Solr to avoid unnecessary marshalling while using > EmbeddedSolrServer (there's an issue in Solr JIRA) > 3. Patch Solr to cache SolrDocument instances instead of Lucene's Document > instances. I was going to share this patch, but then decided that our usage > of Solr is not common and this functionality is useless in most cases > 4. We have all documents in cache > 5. In fact our index is stored in a data grid, not a file system. But as > tests showed this is not important because standard FSDirectory is faster > if > you have enough of RAM free for OS caches. > > These changes improved the performance very much, so in the end we have > performance comparable (about 3-5 times slower) to the "proper" Solr usage > (obtaining first 20 documents). > > To get more details on how different Solr components perform we injected > perf4j statements into key points in the code. And a profiler was helpful > too. > > Hope it helps somehow. > > On Thu, Nov 26, 2009 at 8:48 PM, Raghuveer Kancherla < > raghuveer.kanche...@aplopio.com> wrote: > > > Hi, > > I am using Solr1.4 for searching through half a million documents. The > > problem is, I want to retrieve nearly 200 documents for each search > query. > > The query time in Solr logs is showing 0.02 seconds and I am fairly happy > > with that. However Solr is taking a long time (4 to 5 secs) to return the > > results (I think it is because of the number of docs I am requesting). I > > tried returning only the id's (unique key) without any other stored > fields, > > but it is not helping me improve the response times (time to return the > > id's > > of matching documents). > > I understand that retrieving 200 documents for each search term is > > impractical in most scenarios but I dont have any other option. Any > > pointers > > on how to improve the response times will be a great help. > > > > Thanks, > > Raghu > > > > > > -- > Andrew Klochkov > Senior Software Engineer, > Grid Dynamics >