Re: Solr could replace shards

2013-12-19 Thread Michael Della Bitta
I would make one *collection* for each date range and then make a
collection alias or aliases that span the ones that you want to query.

http://wiki.apache.org/solr/SolrCloud#Collection_Aliases

I don't have a good idea for you for how to handle indexing off-cluster,
however.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Wed, Dec 18, 2013 at 4:45 PM, Max Hansmire  wrote:

> I am considering using SolrCloud, but I have a use case that I am not sure
> if it covers.
>
> I would like to keep an index up to date in realtime, but also I would like
> to sometimes restate the past. The way that I would restate the past is to
> do batch processing over historical data.
>
> My idea is that I would have the Solr collection sharded by date range. As
> I move forward in time I would add more shards.
>
> For restating historical data I would have a separate process that actually
> indexes a shards worth of data. (This keeps the servers that are meant for
> production search from having to handle the load of indexing historically.)
> I would then move the index files to the solr servers and register the
> newly created index with the server replacing the existing shards.
>
> I used to be able to do something similar pre-SolrCloud by using the core
> admin. But this did not have the benefit of having one search for the
> entire "collection". I had to manually query each of the cores to get the
> full search index.
>
> Essentially the question is:
> 1- is it possible to shard by date range in this way?
> 2- is it possible to swap out the index used by a shard?
> 3- is there a different way I should be thinking of this?
>
> Max
>


Solr could replace shards

2013-12-18 Thread Max Hansmire
I am considering using SolrCloud, but I have a use case that I am not sure
if it covers.

I would like to keep an index up to date in realtime, but also I would like
to sometimes restate the past. The way that I would restate the past is to
do batch processing over historical data.

My idea is that I would have the Solr collection sharded by date range. As
I move forward in time I would add more shards.

For restating historical data I would have a separate process that actually
indexes a shards worth of data. (This keeps the servers that are meant for
production search from having to handle the load of indexing historically.)
I would then move the index files to the solr servers and register the
newly created index with the server replacing the existing shards.

I used to be able to do something similar pre-SolrCloud by using the core
admin. But this did not have the benefit of having one search for the
entire "collection". I had to manually query each of the cores to get the
full search index.

Essentially the question is:
1- is it possible to shard by date range in this way?
2- is it possible to swap out the index used by a shard?
3- is there a different way I should be thinking of this?

Max