Re: CoreAdmin STATUS performance

Stefan Matheis Sun, 13 Jan 2013 05:11:20 -0800

Shahar


would you mind, if i ask you to open an jira-issue for that? attaching your 
changes as typical patch?
perhaps we could use that for the UI, in those cases where we don't need to 
full set of information ..

Stefan 


On Sunday, January 13, 2013 at 12:28 PM, Shahar Davidson wrote:

> Shawn, Per and anyone else who has participated in this thread - thank you!
> 
> I have finally resorted to apply a minor patch the Solr code. 
> I have noticed that most of the time of the STATUS request is spent when 
> collecting Index related info (such as segmentCount, sizeInBytes, numDocs, 
> etc.).
> In the STATUS request I added support for a new parameter which, if present, 
> will skip collection of the Index info (hence will only return general static 
> info, among it the core name) - this will, in fact, cut down the request time 
> by an order of two magnitudes!
> In my case, it decreased the request time from around 800ms to around 1ms-4ms.
> 
> Regards,
> 
> Shahar.
> 
> -----Original Message-----
> From: Shawn Heisey [mailto:s...@elyograg.org] 
> Sent: Thursday, January 10, 2013 5:14 PM
> To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
> Subject: Re: CoreAdmin STATUS performance
> 
> On 1/10/2013 2:09 AM, Shahar Davidson wrote:
> > As for your first question, the core info needs to be gathered upon every 
> > search request because cores are created dynamically.
> > When a user initiates a search request, the system must be aware of 
> > all available cores in order to execute distributed search on _all_ 
> > relevant cores. (the user must get reliable and most up to date data) The 
> > reason that 800ms seems a lot to me is because the overall execution time 
> > takes about 2500ms and a large part of it is due to the STATUS request.
> > 
> > The "minimal interval" concept is a good idea and indeed we've considered 
> > it, yet it poses a slight problem when building a RT system which needs to 
> > return to most up to date data.
> > I am just trying to understand if there's some other way to hasten the 
> > STATUS reply (for example, by asking the STATUS request to return just 
> > certain core attributes, such as name, instead of collecting 
> > everything)
> 
> 
> 
> Are there a *huge* number of SolrJ clients in the wild, or is it something 
> like a server farm where you are in control of everything? If it's the 
> latter, what I think I would do is have an asynchronous thread that 
> periodically (every few seconds) updates the client's view of what cores 
> exist. When a query is made, it will use that information, speeding up your 
> queries by 800 milliseconds and ensuring that new cores will not have long 
> delays before they become searchable. If you have a huge number of clients in 
> the wild, it would still be possible, but ensuring that those clients get 
> updated might be hard.
> 
> If you also delete cores as well as add them, that complicates things. 
> You'd have to have the clients be smart enough to exclude the last core on 
> the list (by whatever sorting mechanism you require), and you'd have to wait 
> long enough (30 seconds, maybe?) before *actually* deleting the last core to 
> be sure that no clients are accessing it.
> 
> Or you could use SolrCloud, as Per suggested, but with 4.1, not the released 
> 4.0. SolrCloud manages your cores for you automatically. 
> You'd probably be using a slightly customized SolrCloud, including the custom 
> hashing capability added by SOLR-2592. I don't know what other customizations 
> you might need.
> 
> Thanks,
> Shawn
> 
> 
> Email secured by Check Point

Re: CoreAdmin STATUS performance

Reply via email to