[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2017-01-22 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833804#comment-15833804
 ] 

Damien Kamerman commented on SOLR-7191:
---

Regarding the extra threads, I'm thinking the issue is that 
CoreContainer.load() now calls ZkContainer.registerInZk() with background 
'true'. Check how many 'coreZkRegister' threads there are.

> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Noble Paul
>  Labels: performance, scalability
> Fix For: 6.3
>
> Attachments: lots-of-zkstatereader-updates-branch_5x.log, 
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8297) Allow join query over 2 sharded collections: enhance functionality and exception handling

2016-11-29 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706971#comment-15706971
 ] 

Damien Kamerman commented on SOLR-8297:
---

The use case is not narrow in my view. I have a sharded collection holding 
multiple data-sets that I wish to join on. I'm joining on the same collection.

Will this patch be applied to 6.x?

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -
>
> Key: SOLR-8297
> URL: https://issues.apache.org/jira/browse/SOLR-8297
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.3
>Reporter: Paul Blanchaert
> Attachments: SOLR-8297.patch
>
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7280) Load cores in sorted order and tweak coreLoadThread counts to improve cluster stability on restarts

2016-11-28 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704322#comment-15704322
 ] 

Damien Kamerman commented on SOLR-7280:
---

I've just been testing this and found a couple of issues:

All the core registrations are done in background threads. This can flood the 
overseer queue. See CoreContainer.load() calls zkSys.registerInZk(core, true, 
false);

I've increased leaderConflictResolveWait to 30min but every 15s I can see:
org.apache.solr.handler.admin.PrepRecoveryOp; After 15 seconds, core 
ip_1224_shard1_replica1 (shard1 of ip_1224) still does not have state: 
recovering; forcing ClusterState update from ZooKeeper
Again, I think this can flood the overseer queue.

> Load cores in sorted order and tweak coreLoadThread counts to improve cluster 
> stability on restarts
> ---
>
> Key: SOLR-7280
> URL: https://issues.apache.org/jira/browse/SOLR-7280
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
> Fix For: 5.5.3, 6.2
>
> Attachments: SOLR-7280-5x.patch, SOLR-7280-5x.patch, 
> SOLR-7280-5x.patch, SOLR-7280-test.patch, SOLR-7280.patch, SOLR-7280.patch
>
>
> In SOLR-7191, Damien mentioned that by loading solr cores in a sorted order 
> and tweaking some of the coreLoadThread counts, he was able to improve the 
> stability of a cluster with thousands of collections. We should explore some 
> of these changes and fold them into Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7280) Load cores in sorted order and tweak coreLoadThread counts to improve cluster stability on restarts

2016-07-12 Thread damien kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374195#comment-15374195
 ] 

damien kamerman commented on SOLR-7280:
---

Or, ensure that the coreLoadThreads is >= max(collection's replicas on a single 
node) ?

> Load cores in sorted order and tweak coreLoadThread counts to improve cluster 
> stability on restarts
> ---
>
> Key: SOLR-7280
> URL: https://issues.apache.org/jira/browse/SOLR-7280
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
> Fix For: 5.2, 6.0
>
> Attachments: SOLR-7280.patch
>
>
> In SOLR-7191, Damien mentioned that by loading solr cores in a sorted order 
> and tweaking some of the coreLoadThread counts, he was able to improve the 
> stability of a cluster with thousands of collections. We should explore some 
> of these changes and fold them into Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7282) Cache config or index schema objects by configset and share them across cores

2016-07-12 Thread damien kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374179#comment-15374179
 ] 

damien kamerman commented on SOLR-7282:
---

What about caching the IndexSchema object only when the schemaFactory is the 
(immutable) ClassicIndexSchemaFactory?

> Cache config or index schema objects by configset and share them across cores
> -
>
> Key: SOLR-7282
> URL: https://issues.apache.org/jira/browse/SOLR-7282
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
> Fix For: 5.2, 6.0
>
> Attachments: SOLR-7282.patch
>
>
> Sharing schema and config objects has been known to improve startup 
> performance when a large number of cores are on the same box (See 
> http://wiki.apache.org/solr/LotsOfCores).Damien also saw improvements to 
> cluster startup speed upon caching the index schema in SOLR-7191.
> Now that SolrCloud configuration is based on config sets in ZK, we should 
> explore how we can minimize config/schema parsing for each core in a way that 
> is compatible with the recent/planned changes in the config and schema APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2016-06-28 Thread damien kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352431#comment-15352431
 ] 

damien kamerman commented on SOLR-7191:
---

Concern based on general principles. e.g. if a shard has all 4 replicas on the 
one JVM and 3 load threads. Then registration will be based on the first three 
cores only.

> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Shalin Shekhar Mangar
>  Labels: performance, scalability
> Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2016-06-27 Thread damien kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352228#comment-15352228
 ] 

damien kamerman commented on SOLR-7191:
---

Only coreLoadThreadCount cores are registering at a time on each JVM, so
the concern is if there is a collection that has more than
coreLoadThreadCount replicas on a JVM then registration could fail.





-- 
Damien Kamerman


> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Shalin Shekhar Mangar
>  Labels: performance, scalability
> Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2016-06-20 Thread damien kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340698#comment-15340698
 ] 

damien kamerman commented on SOLR-7191:
---

The patch from March 2015 was against Solr 4.10.
The patch from Oct 2015 was against Solr trunk (1708905)





-- 
Damien Kamerman


> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Shalin Shekhar Mangar
>  Labels: performance, scalability
> Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2016-06-10 Thread damien kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325626#comment-15325626
 ] 

damien kamerman commented on SOLR-7191:
---

This fits with what I've seen on solr 4/5. The cores register on an
unlimited thread pool. The patch I did was to limit the thread pool and
register in order.



> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Shalin Shekhar Mangar
>  Labels: performance, scalability
> Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-10-22 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970255#comment-14970255
 ] 

Damien Kamerman commented on SOLR-7191:
---

After 2min around 100 collections all-green. This is with a 3-node ensemble. 
Ten minutes would be great, and I guess with 3K collections I would be close to 
that mark.

> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Shalin Shekhar Mangar
>  Labels: performance, scalability
> Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-10-22 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970253#comment-14970253
 ] 

Damien Kamerman commented on SOLR-7191:
---

1. hmmm cancel that. Initially I noticed very slow (around 60min total) 
shutdown in JmxMonitoredMap.clear(). I went back to test it and was unable to 
reproduce!? I did update the trunk. A partial stack is all I've saved:
at org.apache.solr.core.JmxMonitoredMap.clear(JmxMonitoredMap.java:144)
at org.apache.solr.core.SolrCore.close(SolrCore.java:1263)
at org.apache.solr.core.SolrCores.close(SolrCores.java:124)
at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:564)
at 
org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:172)

2. OK, will look into that.

> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Shalin Shekhar Mangar
>  Labels: performance, scalability
> Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-10-20 Thread Damien Kamerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Kamerman updated SOLR-7191:
--
Attachment: SOLR-7191.patch

I've had a first look at porting the patch I did for Solr 4.10 to the Solr 
trunk (1708905). I created 6,000 collections (3 nodes; 2 x replicas) and 
re-started the 3 nodes, and all green in 24min.

The bottleneck was ZkController.register() calling 
zkStateReader.updateClusterState() for each collection. I moved this call up to 
the end of CoreContainer.load(). Comments appreciated.

> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> 
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Assignee: Shalin Shekhar Mangar
>  Labels: performance, scalability
> Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7282) Cache config or index schema objects by configset and share them across cores

2015-08-23 Thread Damien Kamerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Kamerman updated SOLR-7282:
--
Attachment: SOLR-7282.patch

Cache cloud schemas by name

 Cache config or index schema objects by configset and share them across cores
 -

 Key: SOLR-7282
 URL: https://issues.apache.org/jira/browse/SOLR-7282
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 5.2, Trunk

 Attachments: SOLR-7282.patch


 Sharing schema and config objects has been known to improve startup 
 performance when a large number of cores are on the same box (See 
 http://wiki.apache.org/solr/LotsOfCores).Damien also saw improvements to 
 cluster startup speed upon caching the index schema in SOLR-7191.
 Now that SolrCloud configuration is based on config sets in ZK, we should 
 explore how we can minimize config/schema parsing for each core in a way that 
 is compatible with the recent/planned changes in the config and schema APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5750) Backup/Restore API for SolrCloud

2015-04-30 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522594#comment-14522594
 ] 

Damien Kamerman commented on SOLR-5750:
---

Only snapshot if the index version has changed.

 Backup/Restore API for SolrCloud
 

 Key: SOLR-5750
 URL: https://issues.apache.org/jira/browse/SOLR-5750
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Varun Thacker
 Fix For: Trunk, 5.2


 We should have an easy way to do backups and restores in SolrCloud. The 
 ReplicationHandler supports a backup command which can create snapshots of 
 the index but that is too little.
 The command should be able to backup:
 # Snapshots of all indexes or indexes from the leader or the shards
 # Config set
 # Cluster state
 # Cluster properties
 # Aliases
 # Overseer work queue?
 A restore should be able to completely restore the cloud i.e. no manual steps 
 required other than bringing nodes back up or setting up a new cloud cluster.
 SOLR-5340 will be a part of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7361) Main Jetty thread blocked by core loading delays HTTP listener from binding if core loading is slow

2015-04-10 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490623#comment-14490623
 ] 

Damien Kamerman commented on SOLR-7361:
---

Regarding loading the cores in the background, I've made a few tweaks to work 
with thousands of cores (See SOLR-7191):
1. Load cores in fixed threadPool. Cores are registered in background threads, 
so no reason to load all cores at once!
2. Register cores in corename order with a fixed 128 threadPool. This is to not 
flood the DistributedQueue.
3. Publish an entire node as 'down' (4.10 branch)
4. Cache ConfigSetService.createIndexSchema() in cloud mode.

So, my questions are:
What threadPool size will be used to load the cores?
What order will cores be loaded in?
Will cores be registered as they are loaded, or offloaded to another threadPool?

My initial thought was to register as a live node after cores are loaded.

 Main Jetty thread blocked by core loading delays HTTP listener from binding 
 if core loading is slow
 ---

 Key: SOLR-7361
 URL: https://issues.apache.org/jira/browse/SOLR-7361
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Timothy Potter
Assignee: Timothy Potter

 During server startup, the CoreContainer uses an ExecutorService to load 
 cores in multiple back-ground threads but then blocks until cores are loaded, 
 see: CoreContainer#load around line 290 on trunk (invokeAll). From the 
 JavaDoc on that method, we have:
 {quote}
 Executes the given tasks, returning a list of Futures holding their status 
 and results when all complete. Future.isDone() is true for each element of 
 the returned list.
 {quote}
 In other words, this is a blocking call.
 This delays the Jetty HTTP listener from binding and accepting requests until 
 all cores are loaded. Do we need to block the main thread?
 Also, prior to this happening, the node is registered as a live node in ZK, 
 which makes it a candidate for receiving requests from the Overseer, such as 
 to service a create collection request. The problem of course is that the 
 node listed in /live_nodes isn't accepting requests yet. So we either need to 
 unblock the main thread during server loading or maybe wait longer before we 
 register as a live node ... not sure which is the better way forward?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-03-18 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368557#comment-14368557
 ] 

Damien Kamerman commented on SOLR-7191:
---

Shalin, Another change I made was to cache ConfigSetService.createIndexSchema() 
in cloud mode. BTW, have tested OK up to 24K cores.

 Improve stability and startup performance of SolrCloud with thousands of 
 collections
 

 Key: SOLR-7191
 URL: https://issues.apache.org/jira/browse/SOLR-7191
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Shawn Heisey
Assignee: Shalin Shekhar Mangar
  Labels: performance, scalability
 Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
 lots-of-zkstatereader-updates-branch_5x.log


 A user on the mailing list with thousands of collections (5000 on 4.10.3, 
 4000 on 5.0) is having severe problems with getting Solr to restart.
 I tried as hard as I could to duplicate the user setup, but I ran into many 
 problems myself even before I was able to get 4000 collections created on a 
 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
 not very stable once it's up and running.
 This kind of setup is very much pushing the envelope on SolrCloud performance 
 and scalability.  It doesn't help that I'm running both Solr nodes on one 
 machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-03-14 Thread Damien Kamerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Kamerman updated SOLR-7191:
--
Attachment: SOLR-7191.patch

Thanks Shawn,
Here's a patch for branch_5x that does the following:
1. Load cores in fixed threadPool. Cores are registered in background threads, 
so no reason to load all cores at once!
2. Register cores in corename order with a fixed 128 threadPool. This is to not 
flood the DistributedQueue.

 Improve stability and startup performance of SolrCloud with thousands of 
 collections
 

 Key: SOLR-7191
 URL: https://issues.apache.org/jira/browse/SOLR-7191
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Shawn Heisey
  Labels: performance, scalability
 Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
 lots-of-zkstatereader-updates-branch_5x.log


 A user on the mailing list with thousands of collections (5000 on 4.10.3, 
 4000 on 5.0) is having severe problems with getting Solr to restart.
 I tried as hard as I could to duplicate the user setup, but I ran into many 
 problems myself even before I was able to get 4000 collections created on a 
 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
 not very stable once it's up and running.
 This kind of setup is very much pushing the envelope on SolrCloud performance 
 and scalability.  It doesn't help that I'm running both Solr nodes on one 
 machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-03-14 Thread Damien Kamerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Kamerman updated SOLR-7191:
--
Attachment: SOLR-7191.patch

This fix takes some load off the DistributedQueue.
On shutdown the node will publish a 'down' state for the entire node and 
overseer will do the rest.
Registration is done in a fixed size (128) threadPool in corename order.

 Improve stability and startup performance of SolrCloud with thousands of 
 collections
 

 Key: SOLR-7191
 URL: https://issues.apache.org/jira/browse/SOLR-7191
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Shawn Heisey
  Labels: performance, scalability
 Attachments: SOLR-7191.patch, SOLR-7191.patch, 
 lots-of-zkstatereader-updates-branch_5x.log


 A user on the mailing list with thousands of collections (5000 on 4.10.3, 
 4000 on 5.0) is having severe problems with getting Solr to restart.
 I tried as hard as I could to duplicate the user setup, but I ran into many 
 problems myself even before I was able to get 4000 collections created on a 
 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
 not very stable once it's up and running.
 This kind of setup is very much pushing the envelope on SolrCloud performance 
 and scalability.  It doesn't help that I'm running both Solr nodes on one 
 machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-03-12 Thread Damien Kamerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Kamerman updated SOLR-7191:
--
Attachment: SOLR-7191.patch

Patch against lucene_solr_4_10_4.

 Improve stability and startup performance of SolrCloud with thousands of 
 collections
 

 Key: SOLR-7191
 URL: https://issues.apache.org/jira/browse/SOLR-7191
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Shawn Heisey
  Labels: performance, scalability
 Attachments: SOLR-7191.patch, 
 lots-of-zkstatereader-updates-branch_5x.log


 A user on the mailing list with thousands of collections (5000 on 4.10.3, 
 4000 on 5.0) is having severe problems with getting Solr to restart.
 I tried as hard as I could to duplicate the user setup, but I ran into many 
 problems myself even before I was able to get 4000 collections created on a 
 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
 not very stable once it's up and running.
 This kind of setup is very much pushing the envelope on SolrCloud performance 
 and scalability.  It doesn't help that I'm running both Solr nodes on one 
 machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-03-12 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358149#comment-14358149
 ] 

Damien Kamerman commented on SOLR-7191:
---

With 10K cores I was still seeing many cores stuck in recovery.

I reduced CoreContainer.load() coreLoadExector to be 24 (cfg.coreLoadThreads) 
threads (from max int); I guess I'm now assuming replicas to be on other nodes.
I reduced ZkContainer.coreZkRegister to be 24 threads (from max int) and added 
a sort to SolrCore.getCores(). The sort ensures replicas are available.
I tested with solr 4.10.4; 2 x nodes; 5K collections; 10K cores. All green in 
19min.

Please review patch.

Would still like to see SOLR-6399 and a SolrCloud LotsOfCores as per Erick.

 Improve stability and startup performance of SolrCloud with thousands of 
 collections
 

 Key: SOLR-7191
 URL: https://issues.apache.org/jira/browse/SOLR-7191
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Shawn Heisey
  Labels: performance, scalability
 Attachments: lots-of-zkstatereader-updates-branch_5x.log


 A user on the mailing list with thousands of collections (5000 on 4.10.3, 
 4000 on 5.0) is having severe problems with getting Solr to restart.
 I tried as hard as I could to duplicate the user setup, but I ran into many 
 problems myself even before I was able to get 4000 collections created on a 
 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
 not very stable once it's up and running.
 This kind of setup is very much pushing the envelope on SolrCloud performance 
 and scalability.  It doesn't help that I'm running both Solr nodes on one 
 machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-03-09 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354379#comment-14354379
 ] 

Damien Kamerman commented on SOLR-7191:
---

I tested 4,000 cores on branch_5x and found better results. My setup:
3 nodes (32GB RAM each ; jdk1.8.0_40) running on a single server (256GB RAM).
2,000 collections (1 x shard ; 2 x replica)
1 x Zookeeper 3.4.6

Full restart (stop all nodes; start all nodes 1min staggered). Many cores on 
node1 are active, other cores are recovering. Lots of warnings 
'org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed 
updates' on node2 and node3. But it is slowly recovering.

BTW: collection creation slows down the more collections you have in the cloud. 
Starts with qtimes of ~3s; ending with ~6s. Solr 4.x was always steady at ~3s.

 Improve stability and startup performance of SolrCloud with thousands of 
 collections
 

 Key: SOLR-7191
 URL: https://issues.apache.org/jira/browse/SOLR-7191
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Shawn Heisey
  Labels: performance, scalability
 Attachments: lots-of-zkstatereader-updates-branch_5x.log


 A user on the mailing list with thousands of collections (5000 on 4.10.3, 
 4000 on 5.0) is having severe problems with getting Solr to restart.
 I tried as hard as I could to duplicate the user setup, but I ran into many 
 problems myself even before I was able to get 4000 collections created on a 
 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
 not very stable once it's up and running.
 This kind of setup is very much pushing the envelope on SolrCloud performance 
 and scalability.  It doesn't help that I'm running both Solr nodes on one 
 machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7191) Improve stability and startup performance of SolrCloud with thousands of collections

2015-03-04 Thread Damien Kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348132#comment-14348132
 ] 

Damien Kamerman commented on SOLR-7191:
---

I'm more concerned with stability than performance. I've created up to 10,000 
cores and the cloud is fine and well within memory limits. However, the cloud 
will never restart. Lots of warnings 'org.apache.solr.cloud.ZkController; Still 
seeing conflicting information about the leader of shard'

 Improve stability and startup performance of SolrCloud with thousands of 
 collections
 

 Key: SOLR-7191
 URL: https://issues.apache.org/jira/browse/SOLR-7191
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Shawn Heisey
  Labels: performance, scalability

 A user on the mailing list with thousands of collections (5000 on 4.10.3, 
 4000 on 5.0) is having severe problems with getting Solr to restart.
 I tried as hard as I could to duplicate the user setup, but I ran into many 
 problems myself even before I was able to get 4000 collections created on a 
 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
 not very stable once it's up and running.
 This kind of setup is very much pushing the envelope on SolrCloud performance 
 and scalability.  It doesn't help that I'm running both Solr nodes on one 
 machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org