Re: Adding nodes

2016-02-17 Thread Jeff Wartes
Solrcloud does not come with any autoscaling functionality. If you want such a 
thing, you’ll need to write it yourself.

https://github.com/whitepages/solrcloud_manager might be a useful head start 
though, particularly the “fill” and “cleancollection” commands. I don’t do 
*auto* scaling, but I do use this for all my cluster management, which 
certantly involves moving collections/shards around among nodes, adding 
capacity, and removing capacity.






On 2/14/16, 11:17 AM, "McCallick, Paul"  wrote:

>These are excellent questions and give me a good sense of why you suggest 
>using the collections api.
>
>In our case we have 8 shards of product data with a even distribution of data 
>per shard, no hot spots. We have very different load at different points in 
>the year (cyber monday), and we tend to have very little traffic at night. I'm 
>thinking of two use cases:
>
>1) we are seeing increased latency due to load and want to add 8 more replicas 
>to handle the query volume.  Once the volume subsides, we'd remove the nodes. 
>
>2) we lose a node due to some unexpected failure (ec2 tends to do this). We 
>want auto scaling to detect the failure and add a node to replace the failed 
>one. 
>
>In both cases the core api makes it easy. It adds nodes to the shards evenly. 
>Otherwise we have to write a fairly involved script that is subject to race 
>conditions to determine which shard to add nodes to. 
>
>Let me know if I'm making dangerous or uninformed assumptions, as I'm new to 
>solr. 
>
>Thanks,
>Paul
>
>> On Feb 14, 2016, at 10:35 AM, Susheel Kumar  wrote:
>> 
>> Hi Pual,
>> 
>> 
>> For Auto-scaling, it depends on how you are thinking to design and what/how
>> do you want to scale. Which scenario you think makes coreadmin API easy to
>> use for a sharded SolrCloud environment?
>> 
>> Isn't if in a sharded environment (assume 3 shards A,B & C) and shard B has
>> having higher or more load,  then you want to add Replica for shard B to
>> distribute the load or if a particular shard replica goes down then you
>> want to add another Replica back for the shard in which case ADDREPLICA
>> requires a shard name?
>> 
>> Can you describe your scenario / provide more detail?
>> 
>> Thanks,
>> Susheel
>> 
>> 
>> 
>> On Sun, Feb 14, 2016 at 11:51 AM, McCallick, Paul <
>> paul.e.mccall...@nordstrom.com> wrote:
>> 
>>> Hi all,
>>> 
>>> 
>>> This doesn’t really answer the following question:
>>> 
>>> What is the suggested way to add a new node to a collection via the
>>> apis?  I  am specifically thinking of autoscale scenarios where a node has
>>> gone down or more nodes are needed to handle load.
>>> 
>>> 
>>> The coreadmin api makes this easy.  The collections api (ADDREPLICA),
>>> makes this very difficult.
>>> 
>>> 
 On 2/14/16, 8:19 AM, "Susheel Kumar"  wrote:
 
 Hi Paul,
 
 Shawn is referring to use Collections API
 https://cwiki.apache.org/confluence/display/solr/Collections+API  than
>>> Core
 Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
 for SolrCloud.
 
 Hope that clarifies and you mentioned about ADDREPLICA which is the
 collections API, so you are on right track.
 
 Thanks,
 Susheel
 
 
 
 On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
 paul.e.mccall...@nordstrom.com> wrote:
 
> Then what is the suggested way to add a new node to a collection via the
> apis?  I  am specifically thinking of autoscale scenarios where a node
>>> has
> gone down or more nodes are needed to handle load.
> 
> Note that the ADDREPLICA endpoint requires a shard name, which puts the
> onus of how to scale out on the user. This can be challenging in an
> autoscale scenario.
> 
> Thanks,
> Paul
> 
>> On Feb 14, 2016, at 12:25 AM, Shawn Heisey 
>>> wrote:
>> 
>>> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>>> - When creating a new collection, SOLRCloud will use all available
> nodes for the collection, adding cores to each.  This assumes that you
>>> do
> not specify a replicationFactor.
>> 
>> The number of nodes that will be used is numShards multipled by
>> replicationFactor.  The default value for replicationFactor is 1.  If
>> you do not specify numShards, there is no default -- the CREATE call
>> will fail.  The value of maxShardsPerNode can also affect the overall
>> result.
>> 
>>> - When adding new nodes to the cluster AFTER the collection is
>>> created,
> one must use the core admin api to add the node to the collection.
>> 
>> Using the CoreAdmin API is strongly discouraged when running
>>> SolrCloud.
>> It works, but it is an expert API when in cloud mode, and can cause
>> serious problems if not used correctly.  Instead, use the Collections
>> API.  It can handle all normal 

Re: Adding nodes

2016-02-15 Thread Susheel Kumar
Hi Paul,  Thanks for the detail but I am still not able to understand how
the CoreAPI would make it easier for you to create replica's.  I understand
that using Core API, you can add more cores but would that also populate
the data so that it can serve queries / act like a replica.

Second, As Shawn mentioned in the link above that adding Replica for
auto-scaling or in a near real time etc. is not a good idea since it put
more load on the system and causing delay.

Unless you have copy of indexes (assuming index is static) and you can
create more cores dynamically in which case Core API may work for your case.

Thanks,
Susheel



On Sun, Feb 14, 2016 at 2:17 PM, McCallick, Paul <
paul.e.mccall...@nordstrom.com> wrote:

> These are excellent questions and give me a good sense of why you suggest
> using the collections api.
>
> In our case we have 8 shards of product data with a even distribution of
> data per shard, no hot spots. We have very different load at different
> points in the year (cyber monday), and we tend to have very little traffic
> at night. I'm thinking of two use cases:
>
> 1) we are seeing increased latency due to load and want to add 8 more
> replicas to handle the query volume.  Once the volume subsides, we'd remove
> the nodes.
>
> 2) we lose a node due to some unexpected failure (ec2 tends to do this).
> We want auto scaling to detect the failure and add a node to replace the
> failed one.
>
> In both cases the core api makes it easy. It adds nodes to the shards
> evenly. Otherwise we have to write a fairly involved script that is subject
> to race conditions to determine which shard to add nodes to.
>
> Let me know if I'm making dangerous or uninformed assumptions, as I'm new
> to solr.
>
> Thanks,
> Paul
>
> > On Feb 14, 2016, at 10:35 AM, Susheel Kumar 
> wrote:
> >
> > Hi Pual,
> >
> >
> > For Auto-scaling, it depends on how you are thinking to design and
> what/how
> > do you want to scale. Which scenario you think makes coreadmin API easy
> to
> > use for a sharded SolrCloud environment?
> >
> > Isn't if in a sharded environment (assume 3 shards A,B & C) and shard B
> has
> > having higher or more load,  then you want to add Replica for shard B to
> > distribute the load or if a particular shard replica goes down then you
> > want to add another Replica back for the shard in which case ADDREPLICA
> > requires a shard name?
> >
> > Can you describe your scenario / provide more detail?
> >
> > Thanks,
> > Susheel
> >
> >
> >
> > On Sun, Feb 14, 2016 at 11:51 AM, McCallick, Paul <
> > paul.e.mccall...@nordstrom.com> wrote:
> >
> >> Hi all,
> >>
> >>
> >> This doesn’t really answer the following question:
> >>
> >> What is the suggested way to add a new node to a collection via the
> >> apis?  I  am specifically thinking of autoscale scenarios where a node
> has
> >> gone down or more nodes are needed to handle load.
> >>
> >>
> >> The coreadmin api makes this easy.  The collections api (ADDREPLICA),
> >> makes this very difficult.
> >>
> >>
> >>> On 2/14/16, 8:19 AM, "Susheel Kumar"  wrote:
> >>>
> >>> Hi Paul,
> >>>
> >>> Shawn is referring to use Collections API
> >>> https://cwiki.apache.org/confluence/display/solr/Collections+API  than
> >> Core
> >>> Admin API
> https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
> >>> for SolrCloud.
> >>>
> >>> Hope that clarifies and you mentioned about ADDREPLICA which is the
> >>> collections API, so you are on right track.
> >>>
> >>> Thanks,
> >>> Susheel
> >>>
> >>>
> >>>
> >>> On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
> >>> paul.e.mccall...@nordstrom.com> wrote:
> >>>
>  Then what is the suggested way to add a new node to a collection via
> the
>  apis?  I  am specifically thinking of autoscale scenarios where a node
> >> has
>  gone down or more nodes are needed to handle load.
> 
>  Note that the ADDREPLICA endpoint requires a shard name, which puts
> the
>  onus of how to scale out on the user. This can be challenging in an
>  autoscale scenario.
> 
>  Thanks,
>  Paul
> 
> > On Feb 14, 2016, at 12:25 AM, Shawn Heisey 
> >> wrote:
> >
> >> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
> >> - When creating a new collection, SOLRCloud will use all available
>  nodes for the collection, adding cores to each.  This assumes that you
> >> do
>  not specify a replicationFactor.
> >
> > The number of nodes that will be used is numShards multipled by
> > replicationFactor.  The default value for replicationFactor is 1.  If
> > you do not specify numShards, there is no default -- the CREATE call
> > will fail.  The value of maxShardsPerNode can also affect the overall
> > result.
> >
> >> - When adding new nodes to the cluster AFTER the collection is
> >> created,
>  one must use the core admin api to add the node to the collection.
> >
> > Using 

Re: Adding nodes

2016-02-14 Thread McCallick, Paul
These are excellent questions and give me a good sense of why you suggest using 
the collections api.

In our case we have 8 shards of product data with a even distribution of data 
per shard, no hot spots. We have very different load at different points in the 
year (cyber monday), and we tend to have very little traffic at night. I'm 
thinking of two use cases:

1) we are seeing increased latency due to load and want to add 8 more replicas 
to handle the query volume.  Once the volume subsides, we'd remove the nodes. 

2) we lose a node due to some unexpected failure (ec2 tends to do this). We 
want auto scaling to detect the failure and add a node to replace the failed 
one. 

In both cases the core api makes it easy. It adds nodes to the shards evenly. 
Otherwise we have to write a fairly involved script that is subject to race 
conditions to determine which shard to add nodes to. 

Let me know if I'm making dangerous or uninformed assumptions, as I'm new to 
solr. 

Thanks,
Paul

> On Feb 14, 2016, at 10:35 AM, Susheel Kumar  wrote:
> 
> Hi Pual,
> 
> 
> For Auto-scaling, it depends on how you are thinking to design and what/how
> do you want to scale. Which scenario you think makes coreadmin API easy to
> use for a sharded SolrCloud environment?
> 
> Isn't if in a sharded environment (assume 3 shards A,B & C) and shard B has
> having higher or more load,  then you want to add Replica for shard B to
> distribute the load or if a particular shard replica goes down then you
> want to add another Replica back for the shard in which case ADDREPLICA
> requires a shard name?
> 
> Can you describe your scenario / provide more detail?
> 
> Thanks,
> Susheel
> 
> 
> 
> On Sun, Feb 14, 2016 at 11:51 AM, McCallick, Paul <
> paul.e.mccall...@nordstrom.com> wrote:
> 
>> Hi all,
>> 
>> 
>> This doesn’t really answer the following question:
>> 
>> What is the suggested way to add a new node to a collection via the
>> apis?  I  am specifically thinking of autoscale scenarios where a node has
>> gone down or more nodes are needed to handle load.
>> 
>> 
>> The coreadmin api makes this easy.  The collections api (ADDREPLICA),
>> makes this very difficult.
>> 
>> 
>>> On 2/14/16, 8:19 AM, "Susheel Kumar"  wrote:
>>> 
>>> Hi Paul,
>>> 
>>> Shawn is referring to use Collections API
>>> https://cwiki.apache.org/confluence/display/solr/Collections+API  than
>> Core
>>> Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
>>> for SolrCloud.
>>> 
>>> Hope that clarifies and you mentioned about ADDREPLICA which is the
>>> collections API, so you are on right track.
>>> 
>>> Thanks,
>>> Susheel
>>> 
>>> 
>>> 
>>> On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
>>> paul.e.mccall...@nordstrom.com> wrote:
>>> 
 Then what is the suggested way to add a new node to a collection via the
 apis?  I  am specifically thinking of autoscale scenarios where a node
>> has
 gone down or more nodes are needed to handle load.
 
 Note that the ADDREPLICA endpoint requires a shard name, which puts the
 onus of how to scale out on the user. This can be challenging in an
 autoscale scenario.
 
 Thanks,
 Paul
 
> On Feb 14, 2016, at 12:25 AM, Shawn Heisey 
>> wrote:
> 
>> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>> - When creating a new collection, SOLRCloud will use all available
 nodes for the collection, adding cores to each.  This assumes that you
>> do
 not specify a replicationFactor.
> 
> The number of nodes that will be used is numShards multipled by
> replicationFactor.  The default value for replicationFactor is 1.  If
> you do not specify numShards, there is no default -- the CREATE call
> will fail.  The value of maxShardsPerNode can also affect the overall
> result.
> 
>> - When adding new nodes to the cluster AFTER the collection is
>> created,
 one must use the core admin api to add the node to the collection.
> 
> Using the CoreAdmin API is strongly discouraged when running
>> SolrCloud.
> It works, but it is an expert API when in cloud mode, and can cause
> serious problems if not used correctly.  Instead, use the Collections
> API.  It can handle all normal maintenance needs.
> 
>> I would really like to see the second case behave more like the
>> first.
 If I add a node to the cluster, it is automatically used as a replica
>> for
 existing clusters without my having to do so.  This would really
>> simplify
 things.
> 
> I've added a FAQ entry to address why this is a bad idea.
>> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F
> 
> Thanks,
> Shawn
>> 


Re: Adding nodes

2016-02-14 Thread Susheel Kumar
Hi Pual,


For Auto-scaling, it depends on how you are thinking to design and what/how
do you want to scale. Which scenario you think makes coreadmin API easy to
use for a sharded SolrCloud environment?

Isn't if in a sharded environment (assume 3 shards A,B & C) and shard B has
having higher or more load,  then you want to add Replica for shard B to
distribute the load or if a particular shard replica goes down then you
want to add another Replica back for the shard in which case ADDREPLICA
requires a shard name?

Can you describe your scenario / provide more detail?

Thanks,
Susheel



On Sun, Feb 14, 2016 at 11:51 AM, McCallick, Paul <
paul.e.mccall...@nordstrom.com> wrote:

> Hi all,
>
>
> This doesn’t really answer the following question:
>
> What is the suggested way to add a new node to a collection via the
> apis?  I  am specifically thinking of autoscale scenarios where a node has
> gone down or more nodes are needed to handle load.
>
>
> The coreadmin api makes this easy.  The collections api (ADDREPLICA),
> makes this very difficult.
>
>
> On 2/14/16, 8:19 AM, "Susheel Kumar"  wrote:
>
> >Hi Paul,
> >
> >Shawn is referring to use Collections API
> >https://cwiki.apache.org/confluence/display/solr/Collections+API  than
> Core
> >Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
> >for SolrCloud.
> >
> >Hope that clarifies and you mentioned about ADDREPLICA which is the
> >collections API, so you are on right track.
> >
> >Thanks,
> >Susheel
> >
> >
> >
> >On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
> >paul.e.mccall...@nordstrom.com> wrote:
> >
> >> Then what is the suggested way to add a new node to a collection via the
> >> apis?  I  am specifically thinking of autoscale scenarios where a node
> has
> >> gone down or more nodes are needed to handle load.
> >>
> >> Note that the ADDREPLICA endpoint requires a shard name, which puts the
> >> onus of how to scale out on the user. This can be challenging in an
> >> autoscale scenario.
> >>
> >> Thanks,
> >> Paul
> >>
> >> > On Feb 14, 2016, at 12:25 AM, Shawn Heisey 
> wrote:
> >> >
> >> >> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
> >> >> - When creating a new collection, SOLRCloud will use all available
> >> nodes for the collection, adding cores to each.  This assumes that you
> do
> >> not specify a replicationFactor.
> >> >
> >> > The number of nodes that will be used is numShards multipled by
> >> > replicationFactor.  The default value for replicationFactor is 1.  If
> >> > you do not specify numShards, there is no default -- the CREATE call
> >> > will fail.  The value of maxShardsPerNode can also affect the overall
> >> > result.
> >> >
> >> >> - When adding new nodes to the cluster AFTER the collection is
> created,
> >> one must use the core admin api to add the node to the collection.
> >> >
> >> > Using the CoreAdmin API is strongly discouraged when running
> SolrCloud.
> >> > It works, but it is an expert API when in cloud mode, and can cause
> >> > serious problems if not used correctly.  Instead, use the Collections
> >> > API.  It can handle all normal maintenance needs.
> >> >
> >> >> I would really like to see the second case behave more like the
> first.
> >> If I add a node to the cluster, it is automatically used as a replica
> for
> >> existing clusters without my having to do so.  This would really
> simplify
> >> things.
> >> >
> >> > I've added a FAQ entry to address why this is a bad idea.
> >> >
> >> >
> >>
> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F
> >> >
> >> > Thanks,
> >> > Shawn
> >> >
> >>
>


Re: Adding nodes

2016-02-14 Thread McCallick, Paul
Hi all,


This doesn’t really answer the following question:

What is the suggested way to add a new node to a collection via the
apis?  I  am specifically thinking of autoscale scenarios where a node has
gone down or more nodes are needed to handle load.


The coreadmin api makes this easy.  The collections api (ADDREPLICA), makes 
this very difficult.


On 2/14/16, 8:19 AM, "Susheel Kumar"  wrote:

>Hi Paul,
>
>Shawn is referring to use Collections API
>https://cwiki.apache.org/confluence/display/solr/Collections+API  than Core
>Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
>for SolrCloud.
>
>Hope that clarifies and you mentioned about ADDREPLICA which is the
>collections API, so you are on right track.
>
>Thanks,
>Susheel
>
>
>
>On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
>paul.e.mccall...@nordstrom.com> wrote:
>
>> Then what is the suggested way to add a new node to a collection via the
>> apis?  I  am specifically thinking of autoscale scenarios where a node has
>> gone down or more nodes are needed to handle load.
>>
>> Note that the ADDREPLICA endpoint requires a shard name, which puts the
>> onus of how to scale out on the user. This can be challenging in an
>> autoscale scenario.
>>
>> Thanks,
>> Paul
>>
>> > On Feb 14, 2016, at 12:25 AM, Shawn Heisey  wrote:
>> >
>> >> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>> >> - When creating a new collection, SOLRCloud will use all available
>> nodes for the collection, adding cores to each.  This assumes that you do
>> not specify a replicationFactor.
>> >
>> > The number of nodes that will be used is numShards multipled by
>> > replicationFactor.  The default value for replicationFactor is 1.  If
>> > you do not specify numShards, there is no default -- the CREATE call
>> > will fail.  The value of maxShardsPerNode can also affect the overall
>> > result.
>> >
>> >> - When adding new nodes to the cluster AFTER the collection is created,
>> one must use the core admin api to add the node to the collection.
>> >
>> > Using the CoreAdmin API is strongly discouraged when running SolrCloud.
>> > It works, but it is an expert API when in cloud mode, and can cause
>> > serious problems if not used correctly.  Instead, use the Collections
>> > API.  It can handle all normal maintenance needs.
>> >
>> >> I would really like to see the second case behave more like the first.
>> If I add a node to the cluster, it is automatically used as a replica for
>> existing clusters without my having to do so.  This would really simplify
>> things.
>> >
>> > I've added a FAQ entry to address why this is a bad idea.
>> >
>> >
>> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F
>> >
>> > Thanks,
>> > Shawn
>> >
>>


Re: Adding nodes

2016-02-14 Thread McCallick, Paul
Then what is the suggested way to add a new node to a collection via the apis?  
I  am specifically thinking of autoscale scenarios where a node has gone down 
or more nodes are needed to handle load. 

Note that the ADDREPLICA endpoint requires a shard name, which puts the onus of 
how to scale out on the user. This can be challenging in an autoscale scenario. 

Thanks,
Paul

> On Feb 14, 2016, at 12:25 AM, Shawn Heisey  wrote:
> 
>> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>> - When creating a new collection, SOLRCloud will use all available nodes for 
>> the collection, adding cores to each.  This assumes that you do not specify 
>> a replicationFactor.
> 
> The number of nodes that will be used is numShards multipled by
> replicationFactor.  The default value for replicationFactor is 1.  If
> you do not specify numShards, there is no default -- the CREATE call
> will fail.  The value of maxShardsPerNode can also affect the overall
> result.
> 
>> - When adding new nodes to the cluster AFTER the collection is created, one 
>> must use the core admin api to add the node to the collection.
> 
> Using the CoreAdmin API is strongly discouraged when running SolrCloud. 
> It works, but it is an expert API when in cloud mode, and can cause
> serious problems if not used correctly.  Instead, use the Collections
> API.  It can handle all normal maintenance needs.
> 
>> I would really like to see the second case behave more like the first.  If I 
>> add a node to the cluster, it is automatically used as a replica for 
>> existing clusters without my having to do so.  This would really simplify 
>> things.
> 
> I've added a FAQ entry to address why this is a bad idea.
> 
> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F
> 
> Thanks,
> Shawn
> 


Re: Adding nodes

2016-02-14 Thread Susheel Kumar
Hi Paul,

Shawn is referring to use Collections API
https://cwiki.apache.org/confluence/display/solr/Collections+API  than Core
Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
for SolrCloud.

Hope that clarifies and you mentioned about ADDREPLICA which is the
collections API, so you are on right track.

Thanks,
Susheel



On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
paul.e.mccall...@nordstrom.com> wrote:

> Then what is the suggested way to add a new node to a collection via the
> apis?  I  am specifically thinking of autoscale scenarios where a node has
> gone down or more nodes are needed to handle load.
>
> Note that the ADDREPLICA endpoint requires a shard name, which puts the
> onus of how to scale out on the user. This can be challenging in an
> autoscale scenario.
>
> Thanks,
> Paul
>
> > On Feb 14, 2016, at 12:25 AM, Shawn Heisey  wrote:
> >
> >> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
> >> - When creating a new collection, SOLRCloud will use all available
> nodes for the collection, adding cores to each.  This assumes that you do
> not specify a replicationFactor.
> >
> > The number of nodes that will be used is numShards multipled by
> > replicationFactor.  The default value for replicationFactor is 1.  If
> > you do not specify numShards, there is no default -- the CREATE call
> > will fail.  The value of maxShardsPerNode can also affect the overall
> > result.
> >
> >> - When adding new nodes to the cluster AFTER the collection is created,
> one must use the core admin api to add the node to the collection.
> >
> > Using the CoreAdmin API is strongly discouraged when running SolrCloud.
> > It works, but it is an expert API when in cloud mode, and can cause
> > serious problems if not used correctly.  Instead, use the Collections
> > API.  It can handle all normal maintenance needs.
> >
> >> I would really like to see the second case behave more like the first.
> If I add a node to the cluster, it is automatically used as a replica for
> existing clusters without my having to do so.  This would really simplify
> things.
> >
> > I've added a FAQ entry to address why this is a bad idea.
> >
> >
> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F
> >
> > Thanks,
> > Shawn
> >
>


Re: Adding nodes

2016-02-14 Thread Shawn Heisey
On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>  - When creating a new collection, SOLRCloud will use all available nodes for 
> the collection, adding cores to each.  This assumes that you do not specify a 
> replicationFactor.

The number of nodes that will be used is numShards multipled by
replicationFactor.  The default value for replicationFactor is 1.  If
you do not specify numShards, there is no default -- the CREATE call
will fail.  The value of maxShardsPerNode can also affect the overall
result.

>  - When adding new nodes to the cluster AFTER the collection is created, one 
> must use the core admin api to add the node to the collection.

Using the CoreAdmin API is strongly discouraged when running SolrCloud. 
It works, but it is an expert API when in cloud mode, and can cause
serious problems if not used correctly.  Instead, use the Collections
API.  It can handle all normal maintenance needs.

> I would really like to see the second case behave more like the first.  If I 
> add a node to the cluster, it is automatically used as a replica for existing 
> clusters without my having to do so.  This would really simplify things.

I've added a FAQ entry to address why this is a bad idea.

https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F

Thanks,
Shawn



Adding nodes

2016-02-13 Thread McCallick, Paul
I’d like to verify the following:

 - When creating a new collection, SOLRCloud will use all available nodes for 
the collection, adding cores to each.  This assumes that you do not specify a 
replicationFactor.

 - When adding new nodes to the cluster AFTER the collection is created, one 
must use the core admin api to add the node to the collection.

I would really like to see the second case behave more like the first.  If I 
add a node to the cluster, it is automatically used as a replica for existing 
clusters without my having to do so.  This would really simplify things.




Paul McCallick
Sr Manager Information Technology
eCommerce Foundation



[RE-BALACE of Collection] Re-balancing of collection after adding nodes to clustered node

2014-03-27 Thread Debasis Jana
Hi,

I found the email addresses from a slide-share @
http://www.slideshare.net/thelabdude/tjp-solr-webinar. It's very useful. We
are developing SOLR search using CDH4 Cloudera and embedded SOLR
4.4.0-search-1.1.0.

We created a Collection when the cluster had 2 slave nodes. Then two extra
nodes added. In those extra nodes SOLR service runs, but Zoo Keeper service
does not run in those nodes.
Zoo Keeper service runs only in earlier nodes. When cluster had 2 nodes
then indexing tool run successfully. But after adding two nodes when again
indexing tool runs then it throws and error *no active slice servicing
hashcode*.
The error seems that re-balancing of collection didn't happen after adding
extra SOLR nodes. So when indexing tool runs then tool tries to
shard/distribute the indexing information into extra node(s) which is/are
not aware of that collection
and throws an error. Number of sharding is: 2. Composite routing policy is
used.

My question is, is it possible to re-balancing the collection information
after creating new SOLR nodes?
In your slide share it's written that re-balancing is available in
SOLR-5025, what's SOLR-5025?

Thanks  Regards
Debasis


Re: Replication after re adding nodes to cluster (sleeping replicas)

2013-11-04 Thread Erick Erickson
The whole point of SolrCloud is to automatically take care of all
the ugly details of synching etc. You should be able to add a node
and, assuming it has been assigned to a shard, do nothing.
The node will start up, synch with the leader, get registered and
start handling queries without you having to do anything.

If you shut the node down, SolrCloud will figure that out and stop
sending requests to it.

If yo then bring the node back up, SolrCloud will figure out how
to synch it with the leader and just make it happen. When it's
synched, it'll start serving requests.

Watch the Solr admin page and you'll see the status change as
these operations happen. You'll have to refresh the screen

And finally, watch the Solr log on the new node, that'll give you
a good sense of what the steps are.

Best,
Erick


On Fri, Nov 1, 2013 at 4:13 AM, michael.boom my_sky...@yahoo.com wrote:

 I have a SolrCloud cluster holding 4 collections, each with with 3 shards
 and
 replication factor = 2.
 They all live on 2 machines, and I am currently using this setup for
 testing.

 However, i would like to connect this test setup to our live application,
 just for benchmarking and evaluating if it can handle the big qpm number.
 I am planning also to setup a new machine, and add new nodes manually, one
 more replica for each shard on the new machines, in case the first two have
 problems handling the big qpm.
 But what i would like to do is after I set up the new nodes, to shut down
 the new machine and only put it back in the cluster if it's needed.

 Thus, getting to the title of this mail:
 After re adding the 3rd machine to the cluster, will the replicas be
 automatically synced with the leader, or do i need to manually trigger this
 somehow ?

 Is there a better idea for having this sleeping  replicas? I bet lots of
 people faced this problem, so a best practice must be out there.



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Replication-after-re-adding-nodes-to-cluster-sleeping-replicas-tp4098764.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Replication after re adding nodes to cluster (sleeping replicas)

2013-11-01 Thread michael.boom
I have a SolrCloud cluster holding 4 collections, each with with 3 shards and
replication factor = 2.
They all live on 2 machines, and I am currently using this setup for
testing.

However, i would like to connect this test setup to our live application,
just for benchmarking and evaluating if it can handle the big qpm number. 
I am planning also to setup a new machine, and add new nodes manually, one
more replica for each shard on the new machines, in case the first two have
problems handling the big qpm.
But what i would like to do is after I set up the new nodes, to shut down
the new machine and only put it back in the cluster if it's needed.

Thus, getting to the title of this mail: 
After re adding the 3rd machine to the cluster, will the replicas be
automatically synced with the leader, or do i need to manually trigger this
somehow ?

Is there a better idea for having this sleeping  replicas? I bet lots of
people faced this problem, so a best practice must be out there.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-after-re-adding-nodes-to-cluster-sleeping-replicas-tp4098764.html
Sent from the Solr - User mailing list archive at Nabble.com.