[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-07-04 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533050#comment-16533050
 ] 

Gus Heck commented on SOLR-12356:
-

I don't like the idea of things being hidden (in general). And I think it 
should be possible to easily replace whatever .system exists with one 
configured to the user's needs (probably when first building or when upgrading 
the cluster) also WRT sizing the use case that motivated SOLR-8349 which and 
relies on the blob store too was a postal code geo lookup table for UK+US that 
was around 1GB (and at the time the problem that was solved was that it was 
getting replicated in memory 40 times for 40 cores).

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-06-22 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520946#comment-16520946
 ] 

Noble Paul commented on SOLR-12356:
---

{quote}Why? we don't care about search performance of this collection that 
much, we only care about fault tolerance. Having a replica on every node seems 
an overkill - if your cluster is likely to lose N-1 nodes you're in a deep 
trouble anyway 
{quote}
Imagine you have . an RF of 3 and you have 20 nodes. It's not uncommon to lose 
3 nodes out of 20. 
{quote}I disagree - actively hiding this from the users complicates the code 
and prevents them from understanding how it works. 
{quote}
The problem with system generated config coexisting with user created config is 
that it leads to
 * config bloat which leads to poor readability. 
 * Legacy configuration living in the cluster the user doesn't know how to 
upgrade when something changes in the framework

OTOH, if we are maintaining that configuration hidden from users , we eliminate 
this problem altogether. Another place where we apply these principles is the 
implicitly registered responseWriters, requesthandlers, functions etc . We 
could have left them in the {{solrconfig.xml}} and it would have caused the 
same problems as I mentioned above. In short, I'm not very happy to see the 
autoAddReplicas creating a huge blob of config in {{autoscaling.json}} which 
the user is left to manage 

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-06-20 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517827#comment-16517827
 ] 

Andrzej Bialecki  commented on SOLR-12356:
--

bq. The default RF should be very high.
Why? we don't care about search performance of this collection that much, we 
only care about fault tolerance. Having a replica on every node seems an 
overkill - if your cluster is likely to lose N-1 nodes you're in a deep trouble 
anyway ;)

bq. I don't think a user should know anything about the autoscaling thing to 
use .system collection.
They wouldn't have to - the config API can set up / re-config triggers as 
necessary.

bq. If replicas are to be created, they should be created automatically.
Yes - and the existing mechanism of autoAddReplicas is already implemented and 
it works.

bq. When I say we should not expose the user to the autoscaling framework, it's 
desirable to not even have an entry in the autoscaling.json
I disagree - actively hiding this from the users complicates the code and 
prevents them from understanding how it works. We can still provide all 
necessary simple setup and config options via the config API so that users are 
not exposed to the autoscaling API if they don't want to.

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-06-19 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516999#comment-16516999
 ] 

Noble Paul commented on SOLR-12356:
---

bq.use a configurable default RF eg. 3, unless there are fewer nodes in the 
cluster (then the limit is the number of nodes)
The default RF should be very high. By default, for a small cluster , there 
should be a replica of {{.system}} collection in every node. For a user of a 
small cluster , it's too much of a learning curve if his {{.system}} collection 
is not available 

bq.use AutoAddReplicasPlanAction to automatically increase the number of 
replicas to the desired RF as more nodes are added

I don't think a user should know anything about the autoscaling thing to use 
{{.system}} collection. We should define the default behavior and it should all 
happen behind the scenes. If replicas are to be created, they should be created 
automatically.

bq.we could use the .scheduled_maintenance trigger with an action that 
periodically prunes the collection based either on index size or time-to-live 
criteria.

We should have a sensible default for the maximum index size of the {{.system}} 
collection. Something like 100mb sounds OK. Once it crosses this threshold, the 
system should automaticlly do the pruning.The user should not be exposed to the 
autoscaling framework at all for this pruning to happen

When I say we should not expose the user to the autoscaling framework, it's 
desirable to not even have an entry in the {{autoscaling.json}}. These can be 
implicit triggers which are automatically registered/unregistered based on 
presence/absence of certain cluster properties

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-06-19 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516817#comment-16516817
 ] 

Andrzej Bialecki  commented on SOLR-12356:
--

In other words, these are the main concerns and the answers:
* when do we create the collection?
** it's too late to do this on first use because first updates will always fail
** CollectionsHandler can automatically create this collection when the first 
collection is created.
** users can opt out of having this collection (by setting a property using 
cluster properties API), and components should handle the absence of this 
collection gracefully. In any case the cost of this collection should be 
negligible.
* how many replicas to create and what will be their placement?
** use a configurable default RF eg. 3, unless there are fewer nodes in the 
cluster (then the limit is the number of nodes)
** use {{AutoAddReplicasPlanAction}} to automatically increase the number of 
replicas to the desired RF as more nodes are added
** use autoscaling preferences to automatically place replicas on different 
physical nodes
* how to control the size of the collection as new documents are being added?
** we could use the {{.scheduled_maintenance}} trigger with an action that 
periodically prunes the collection based either on index size or time-to-live 
criteria.

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-06-19 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516767#comment-16516767
 ] 

Andrzej Bialecki  commented on SOLR-12356:
--

This collection is used not only for metrics, but also for autoscaling history 
and for blobs. IMHO the benefits of always having this collection outweigh the 
costs:
* I agree that components should fail gracefully if the collection is not 
present - this is simple to implement and it's a good practice to do this 
anyway.
* initially when only 1 node is present the replicationFactor is 1 - however, 
we can use already existing mechanisms, such as {{AutoAddReplicasPlanAction}} 
to automatically increase RF to a "safe" default eg. 3, or a percentage of live 
nodes.
* autoscaling framework can automatically handle placing these replicas on 
different physical machines, eg using IP-based rules.
* we already have an API to manage cluster properties, including collection 
defaults - we can allow users to opt out of having this collection

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-06-04 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500193#comment-16500193
 ] 

Noble Paul commented on SOLR-12356:
---

If you use SolrJ to read/write from {{.system}} collection it fails because 
SolrJ does a check before even sending the request. So, auto-create can fail

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-05-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477233#comment-16477233
 ] 

Jan Høydahl commented on SOLR-12356:


Does not the auto scaling framework allow us to create the collection with 
desired replicationFactor of 2, and some fancy rules will make sure that the 
2nd replica is created whenever a new node is added? Can we similarly create a 
rule preventing that replica to land on a node sharing the same IP address as 
the leader, then all is good :) I don't see any problems with Overseer making 
sure this collection exists, even if it is empty?

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-05-15 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476603#comment-16476603
 ] 

Noble Paul commented on SOLR-12356:
---

I would say we should create it only during first-use. if a POST request is 
made to the {{.system}} collection,  the collection is created, if not present. 
 Similarly, even if a GET request is performed on {{.system}} collection, we 
can auto create it 

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-05-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476575#comment-16476575
 ] 

Tomás Fernández Löbbe commented on SOLR-12356:
--

Even if this is the default, can there be an opt out? (i.e. a solr.xml config 
option)? I imagine people upgrading from older versions of Solr that have built 
their own  metrics/scaling may not want this collection around. Features 
requiring .system collection then should have graceful failures in case of the 
collection not being there. 

> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12356) Always auto-create ".system" collection when in SolrCloud mode

2018-05-15 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475926#comment-16475926
 ] 

Shawn Heisey commented on SOLR-12356:
-

The only issues I can see related to doing this automatically is how 
replicationFactor is decided, and how to prevent multiple replicas from ending 
up on the same node.  When somebody decides to run multiple nodes per host, 
ensuring proper replica placement is particularly important.

The first time an overseer starts in a cloud, there's probably only going to be 
one Solr node, so it won't be possible to create the collection with a 
replicationFactor higher than 1.  How do we handle that?  When nodes are added, 
how do we decide whether to automatically add a replica? My preference would be 
to do the add, but users may disagree, especially if they add a node in a 
location with limited bandwidth.


> Always auto-create ".system" collection when in SolrCloud mode
> --
>
> Key: SOLR-12356
> URL: https://issues.apache.org/jira/browse/SOLR-12356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Priority: Major
>
> The {{.system}} collection is currently used for blobs, and in SolrCloud mode 
> it's also used for autoscaling history and as a metrics history store 
> (SOLR-11779). It should be automatically created on Overseer start if it's 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org