Re: CoreAdmin STATUS performance

Per Steffensen Thu, 10 Jan 2013 08:10:12 -0800

The collections are created dynamically. Not on update though. We useone collection per month and we have a timer-job running (every hour orso), which checks if all collections that need to exist actually doesexist - if not it creates the collection(s). The rule is that thecollection for "next month" has to exist as soon as we enter "currentmonth", so the first time the timer job runs e.g. 1. July it will createthe August-collection. We never get data with timestamp in the future.Therefore if the timer-job just gets to run once within every month wewill always have needed collections ready.

We create collections using the new Collection API in Solr. Be used tomanage creation of every single Shard/Replica/Core of the collectionsduring the Core Admin API in Solr, but since an Collection API wasintroduced we decided that we better use that. In 4.0 it did not havethe features we needed, which triggered SOLR-4114, SOLR-4120 andSOLR-4140 which will be available in 4.1. With those features we are nowusing the Collection API.

BTW, our timer-job also handles deletion of "old" collections. In oursystem you can configure how many historic month-collection you willkeep before it is ok to delete them. Lets say that this is configured to3, as soon at it becomes 1. July the timer-job will delete theMarch-collection (the historic collections to keep will just have becomeApril-, May- and June-collections). This way we will always have a least3 months of historic data, and last in a month close to 4 months ofhistory. It does not matter that we have a little to much history, whenwe just do not go below the lower limit on lenght of historic data. Wealso use the new Collection API for deletion.


Regards, Per Steffensen

On 1/10/13 3:04 PM, Shahar Davidson wrote:

Hi Per,

Thanks for your reply!

That's a very interesting approach.

In your system, how are the collections created? In other words, are the 
collections created dynamically upon an update (for example, per new day)?
If they are created dynamically, who handles their creation (client/server)  
and how is it done?

I'd love to hear more about it!

Appreciate your help,

Shahar.

-----Original Message-----
From: Per Steffensen [mailto:st...@designware.dk]
Sent: Thursday, January 10, 2013 1:23 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

On 1/10/13 10:09 AM, Shahar Davidson wrote:

search request, the system must be aware of all available cores in
order to execute distributed search on_all_  relevant cores

For this purpose I would definitely recommend that you go "SolrCloud".

Further more we do something "ekstra":
We have several collections each containing data from a specific period in time 
- timestamp of ingoing data decides which collection it is indexed into. One 
important search-criteria for our clients are search on timestamp-interval. 
Therefore most searches can be restricted to only consider a subset of all our 
collections. Instead of having the logic calculating the subset of collections 
to search (given the timestamp
search-interval) in clients, we just let clients do "dumb" searches by giving the 
timestamp-interval. The subset of collections to search are calculated on server-side from the 
timestamp-interval in the search-query. We handle this in a Solr SearchComponent which we place 
"early" in the chain of SearchComponents. Maybe you can get some inspiration by this 
approach, if it is also relevant for you.

Regards, Per Steffensen

Email secured by Check Point

Re: CoreAdmin STATUS performance

Reply via email to