Re: Retrieving large num of docs

Raghuveer Kancherla Fri, 27 Nov 2009 01:03:03 -0800

Hi Andrew,
We are running solr using its http interface from python. From the resources
I could find, EmbeddedSolrServer is possible only if I am using solr from a
java program.  It will be useful to understand if a significant part of the
performance increase is due to bypassing HTTP before going down this path.


In the mean time I am trying my luck with the other suggestions. Can you
share the patch that helps cache solr documents instead of lucene documents?


On a different note, I am wondering why does it take 4 - 5 seconds for Solr
to return the ID's of ranked documents when it can rank the results in about
20 milli seconds? Am I missing something here?

Thanks,
Raghu



On Fri, Nov 27, 2009 at 2:15 AM, Andrey Klochkov <akloch...@griddynamics.com
> wrote:

> Hi
>
> We obtain ALL documents for every query, the index size is about 50k. We
> use
> number of stored fields. Often the result set size is several thousands of
> docs.
>
> We performed the following things to make it faster:
>
> 1. Use EmbeddedSolrServer
> 2. Patch Solr to avoid unnecessary marshalling while using
> EmbeddedSolrServer (there's an issue  in Solr JIRA)
> 3. Patch Solr to cache SolrDocument instances instead of Lucene's Document
> instances. I was going to share this patch, but then decided that our usage
> of Solr is not common and this functionality is useless in most cases
> 4. We have all documents in cache
> 5. In fact our index is stored in a data grid, not a file system. But as
> tests showed this is not important because standard FSDirectory is faster
> if
> you have enough of RAM free for OS caches.
>
> These changes improved the performance very much, so in the end we have
> performance comparable (about 3-5 times slower) to the "proper" Solr usage
> (obtaining first 20 documents).
>
> To get more details on how different Solr components perform we injected
> perf4j statements into key points in the code. And a profiler was helpful
> too.
>
> Hope it helps somehow.
>
> On Thu, Nov 26, 2009 at 8:48 PM, Raghuveer Kancherla <
> raghuveer.kanche...@aplopio.com> wrote:
>
> > Hi,
> > I am using Solr1.4 for searching through half a million documents. The
> > problem is, I want to retrieve nearly 200 documents for each search
> query.
> > The query time in Solr logs is showing 0.02 seconds and I am fairly happy
> > with that. However Solr is taking a long time (4 to 5 secs) to return the
> > results (I think it is because of the number of docs I am requesting). I
> > tried returning only the id's (unique key) without any other stored
> fields,
> > but it is not helping me improve the response times (time to return the
> > id's
> > of matching documents).
> > I understand that retrieving 200 documents for each search term is
> > impractical in most scenarios but I dont have any other option. Any
> > pointers
> > on how to improve the response times will be a great help.
> >
> > Thanks,
> >  Raghu
> >
>
>
>
> --
> Andrew Klochkov
> Senior Software Engineer,
> Grid Dynamics
>

Re: Retrieving large num of docs

Reply via email to