Re: SolrCloud Heterogenous Hardware setup

2018-05-01 Thread Deepak Goel
I had a similar problem some time back. Although it might not be the best
way, but I used cron to move data from a high-end-spec to a lower-end-spec.
It worked beautifully



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Tue, May 1, 2018 at 10:02 PM, Greenhorn Techie  wrote:

> Thanks Erick. This information is very helpful. Will explore further on the
> node placement rules within Collections API.
>
> Many Thanks
>
>
> On 1 May 2018 at 16:26:34, Erick Erickson (erickerick...@gmail.com) wrote:
>
> "Is it possible to configure a collection such that the collection
> data is only stored on few nodes in the SolrCloud setup?"
>
> Yes. There are "node placement rules", but also you can create a
> collection with a createNodeSet that specifies the nodes that the
> replicas are placed on.
>
> " If this is possible, at the end of each month, what is the approach
> to be taken to “move” the latest collection from higher-spec hardware
> machines to the lower-spec ones?"
>
> There are a bunch of ways, in order of how long they've been around
> (check your version). All of these are COLLECTIONS API calls.
> - ADDREPLICA/DELETEREPLCIA
> - MOVEREPLICA
> - REPLACENODE
>
> The other thing you may wan to look at is that David Smiley has been
> working on timeseries support in Solr, but that's quite recent so may
> not be available in whatever version you're using. Nor do I know
> enough details a about it to know how (or if) it it supported the
> heterogeneous setup you're talking about. Check CHANGES.txt.
>
> Best,
> Erick
>
> On Tue, May 1, 2018 at 7:59 AM, Greenhorn Techie
>  wrote:
> > Hi,
> >
> > We are building a SolrCloud setup, which will index time-series data.
> Being
> > time-series data with write-once semantics, we are planning to have
> > multiple collections i.e. one collection per month. As per our use case,
> > end users should be able to query across last 12 months worth of data,
> > which means 12 collections (with one collection per month). To achieve
> > this, we are planning to leverage Solr collection aliasing such that the
> > search_alias collection will point to the 12 collections and indexing
> will
> > always happen to the latest collection.
> >
> > As its write-once kind of data, the question I have is whether it is
> > possible to have two different hardware profiles within the SolrCloud
> > cluster such that all the older collections (being read-only) will be
> > stored on the lower hardware spec, while the latest collection (being
> write
> > heavy) will be stored only on the higher hardware profile machines.
> >
> > - Is it possible to configure a collection such that the collection data
> > is only stored on few nodes in the SolrCloud setup?
> > - If this is possible, at the end of each month, what is the approach to
> > be taken to “move” the latest collection from higher-spec hardware
> machines
> > to the lower-spec ones?
> >
> > TIA.
>


Re: SolrCloud Heterogenous Hardware setup

2018-05-01 Thread Greenhorn Techie
Thanks Erick. This information is very helpful. Will explore further on the
node placement rules within Collections API.

Many Thanks


On 1 May 2018 at 16:26:34, Erick Erickson (erickerick...@gmail.com) wrote:

"Is it possible to configure a collection such that the collection
data is only stored on few nodes in the SolrCloud setup?"

Yes. There are "node placement rules", but also you can create a
collection with a createNodeSet that specifies the nodes that the
replicas are placed on.

" If this is possible, at the end of each month, what is the approach
to be taken to “move” the latest collection from higher-spec hardware
machines to the lower-spec ones?"

There are a bunch of ways, in order of how long they've been around
(check your version). All of these are COLLECTIONS API calls.
- ADDREPLICA/DELETEREPLCIA
- MOVEREPLICA
- REPLACENODE

The other thing you may wan to look at is that David Smiley has been
working on timeseries support in Solr, but that's quite recent so may
not be available in whatever version you're using. Nor do I know
enough details a about it to know how (or if) it it supported the
heterogeneous setup you're talking about. Check CHANGES.txt.

Best,
Erick

On Tue, May 1, 2018 at 7:59 AM, Greenhorn Techie
 wrote:
> Hi,
>
> We are building a SolrCloud setup, which will index time-series data.
Being
> time-series data with write-once semantics, we are planning to have
> multiple collections i.e. one collection per month. As per our use case,
> end users should be able to query across last 12 months worth of data,
> which means 12 collections (with one collection per month). To achieve
> this, we are planning to leverage Solr collection aliasing such that the
> search_alias collection will point to the 12 collections and indexing
will
> always happen to the latest collection.
>
> As its write-once kind of data, the question I have is whether it is
> possible to have two different hardware profiles within the SolrCloud
> cluster such that all the older collections (being read-only) will be
> stored on the lower hardware spec, while the latest collection (being
write
> heavy) will be stored only on the higher hardware profile machines.
>
> - Is it possible to configure a collection such that the collection data
> is only stored on few nodes in the SolrCloud setup?
> - If this is possible, at the end of each month, what is the approach to
> be taken to “move” the latest collection from higher-spec hardware
machines
> to the lower-spec ones?
>
> TIA.


Re: SolrCloud Heterogenous Hardware setup

2018-05-01 Thread Erick Erickson
"Is it possible to configure a collection such that the collection
data is only stored on few nodes in the SolrCloud setup?"

Yes. There are "node placement rules", but also you can create a
collection with a createNodeSet that specifies the nodes that the
replicas are placed on.

" If this is possible, at the end of each month, what is the approach
to be taken to “move” the latest collection from higher-spec hardware
machines to the lower-spec ones?"

There are a bunch of ways, in order of how long they've been around
(check your version). All of these are COLLECTIONS API calls.
- ADDREPLICA/DELETEREPLCIA
- MOVEREPLICA
- REPLACENODE

The other thing you may wan to look at is that David Smiley has been
working on timeseries support in Solr, but that's quite recent so may
not be available in whatever version you're using. Nor do I know
enough details a about it to know how (or if) it it supported the
heterogeneous setup you're talking about. Check CHANGES.txt.

Best,
Erick

On Tue, May 1, 2018 at 7:59 AM, Greenhorn Techie
 wrote:
> Hi,
>
> We are building a SolrCloud setup, which will index time-series data. Being
> time-series data with write-once semantics, we are planning to have
> multiple collections i.e. one collection per month. As per our use case,
> end users should be able to query across last 12 months worth of data,
> which means 12 collections (with one collection per month). To achieve
> this, we are planning to leverage Solr collection aliasing such that the
> search_alias collection will point to the 12 collections and indexing will
> always happen to the latest collection.
>
> As its write-once kind of data, the question I have is whether it is
> possible to have two different hardware profiles within the SolrCloud
> cluster such that all the older collections (being read-only) will be
> stored on the lower hardware spec, while the latest collection (being write
> heavy) will be stored only on the higher hardware profile machines.
>
>- Is it possible to configure a collection such that the collection data
>is only stored on few nodes in the SolrCloud setup?
>- If this is possible, at the end of each month, what is the approach to
>be taken to “move” the latest collection from higher-spec hardware machines
>to the lower-spec ones?
>
> TIA.


SolrCloud Heterogenous Hardware setup

2018-05-01 Thread Greenhorn Techie
Hi,

We are building a SolrCloud setup, which will index time-series data. Being
time-series data with write-once semantics, we are planning to have
multiple collections i.e. one collection per month. As per our use case,
end users should be able to query across last 12 months worth of data,
which means 12 collections (with one collection per month). To achieve
this, we are planning to leverage Solr collection aliasing such that the
search_alias collection will point to the 12 collections and indexing will
always happen to the latest collection.

As its write-once kind of data, the question I have is whether it is
possible to have two different hardware profiles within the SolrCloud
cluster such that all the older collections (being read-only) will be
stored on the lower hardware spec, while the latest collection (being write
heavy) will be stored only on the higher hardware profile machines.

   - Is it possible to configure a collection such that the collection data
   is only stored on few nodes in the SolrCloud setup?
   - If this is possible, at the end of each month, what is the approach to
   be taken to “move” the latest collection from higher-spec hardware machines
   to the lower-spec ones?

TIA.