Thanks for sharing this info, Per - this info may prove to be valuable for me 
in the future.

Shahar.

-----Original Message-----
From: Per Steffensen [mailto:st...@designware.dk] 
Sent: Thursday, January 10, 2013 6:10 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

The collections are created dynamically. Not on update though. We use one 
collection per month and we have a timer-job running (every hour or so), which 
checks if all collections that need to exist actually does exist - if not it 
creates the collection(s). The rule is that the collection for "next month" has 
to exist as soon as we enter "current month", so the first time the timer job 
runs e.g. 1. July it will create the August-collection. We never get data with 
timestamp in the future. 
Therefore if the timer-job just gets to run once within every month we will 
always have needed collections ready.

We create collections using the new Collection API in Solr. Be used to manage 
creation of every single Shard/Replica/Core of the collections during the Core 
Admin API in Solr, but since an Collection API was introduced we decided that 
we better use that. In 4.0 it did not have the features we needed, which 
triggered SOLR-4114, SOLR-4120 and
SOLR-4140 which will be available in 4.1. With those features we are now using 
the Collection API.

BTW, our timer-job also handles deletion of "old" collections. In our system 
you can configure how many historic month-collection you will keep before it is 
ok to delete them. Lets say that this is configured to 3, as soon at it becomes 
1. July the timer-job will delete the March-collection (the historic 
collections to keep will just have become April-, May- and June-collections). 
This way we will always have a least
3 months of historic data, and last in a month close to 4 months of history. It 
does not matter that we have a little to much history, when we just do not go 
below the lower limit on lenght of historic data. We also use the new 
Collection API for deletion.

Regards, Per Steffensen

On 1/10/13 3:04 PM, Shahar Davidson wrote:
> Hi Per,
>
> Thanks for your reply!
>
> That's a very interesting approach.
>
> In your system, how are the collections created? In other words, are the 
> collections created dynamically upon an update (for example, per new day)?
> If they are created dynamically, who handles their creation (client/server)  
> and how is it done?
>
> I'd love to hear more about it!
>
> Appreciate your help,
>
> Shahar.
>
> -----Original Message-----
> From: Per Steffensen [mailto:st...@designware.dk]
> Sent: Thursday, January 10, 2013 1:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CoreAdmin STATUS performance
>
> On 1/10/13 10:09 AM, Shahar Davidson wrote:
>> search request, the system must be aware of all available cores in 
>> order to execute distributed search on_all_  relevant cores
> For this purpose I would definitely recommend that you go "SolrCloud".
>
> Further more we do something "ekstra":
> We have several collections each containing data from a specific 
> period in time - timestamp of ingoing data decides which collection it 
> is indexed into. One important search-criteria for our clients are 
> search on timestamp-interval. Therefore most searches can be 
> restricted to only consider a subset of all our collections. Instead 
> of having the logic calculating the subset of collections to search 
> (given the timestamp
> search-interval) in clients, we just let clients do "dumb" searches by giving 
> the timestamp-interval. The subset of collections to search are calculated on 
> server-side from the timestamp-interval in the search-query. We handle this 
> in a Solr SearchComponent which we place "early" in the chain of 
> SearchComponents. Maybe you can get some inspiration by this approach, if it 
> is also relevant for you.
>
> Regards, Per Steffensen
>
> Email secured by Check Point
>


Email secured by Check Point

Reply via email to