[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610468#comment-13610468 ] Commit Tag Bot commented on SOLR-4114: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1417070 SOLR-4114: Allow creating more than one shard per instance with the Collection API. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540390#comment-13540390 ] Per Steffensen commented on SOLR-4114: -- Guess the following will do {code} && (server.getAttribute(mbean, "coreName") != null) {code} > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540383#comment-13540383 ] Per Steffensen commented on SOLR-4114: -- Maybe you ought to fix the real problem instaed - making Solr behave equally whether it is checked out from GitHub or SVN. Had a look at your fix - it ought to be {code} && ((value = server.getAttribute(mbean, "coreName")) != null) {code} instead of {code} && ((indexDir = server.getAttribute(mbean, "coreName")) != null) {code} Regards, Per Steffensen > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540160#comment-13540160 ] Commit Tag Bot commented on SOLR-4114: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1426330 SOLR-4114: tests: make happy with git - source attrib counts on svn substitution > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540154#comment-13540154 ] Commit Tag Bot commented on SOLR-4114: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1426329 SOLR-4114: tests: make happy with git - source attrib counts on svn substitution > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540122#comment-13540122 ] Mark Miller commented on SOLR-4114: --- Ran into some troubles with the 'no two shards use the same index dir' check with a git checkout - finally traced it down to checking the source attribute from the the solrcore mbean - and when not using svn, this can be $URL. I'll look at an alternate check to do instead. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530650#comment-13530650 ] Mark Miller commented on SOLR-4114: --- Thanks for the contribution Per! > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529976#comment-13529976 ] Mark Miller commented on SOLR-4114: --- I don't often run precommit - takes too long - but I generally do when adding new dependencies because i know there will be license stuff. Jenkins will catch it even if we dont before commit though. You have to extend LuceneTestCase because thats what hooks you into our "test framework" I believe - eg it enforces things like setup and teardown calling super and a ton of other nice stuff. If you don't do it, jenkins will complain and fail. I'll look at SOLR-4120. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529797#comment-13529797 ] Per Steffensen commented on SOLR-4114: -- bq. Feel free to give feedback in that issue - this was my first experience looking at easymock. I added a comment on SOLR-4136, but basically you did a fine job! Now that SOLR-4114 is closed would you mind continuing with SOLR-4120, Mark? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529791#comment-13529791 ] Per Steffensen commented on SOLR-4114: -- bq. Great patch! I just had to make a couple small tweaks to make the policeman happy - added license files for the test jars, changed the new test to extend LuceneTestCase. Good stuff found by the long but useful precommit ant target. Yes, I do not do precommit test before sending a patch. Of course I will do before committing myself if/when I become committer. Just out of curiosity, why do tests HAVE to extend LuceneTestCase? bq. I think that covers all of your points Per - let me know Checking only on branch_4x, hopeing that you did the "exact" same changes on trunk. Updated my local checkout of branch_4x, made a check to see if everything seemed to be there. It was. I am ready to close this ticket. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529298#comment-13529298 ] Mark Miller commented on SOLR-4114: --- note: hossman has run into trouble with the easymock test in SOLR-4136 - I've attached a small patch there that extends the mocking to cover solrzkclient and a call to get the base url. I suppose it's not ideal in that it simply returns _ replaced with /, so the test won't work with a solr context using an underscore - but it's a start, works currently, and can be improved. Feel free to give feedback in that issue - this was my first experience looking at easymock. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529209#comment-13529209 ] Mark Miller commented on SOLR-4114: --- I think that covers all of your points Per - let me know. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529207#comment-13529207 ] Commit Tag Bot commented on SOLR-4114: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1420329 SOLR-4114: add back commented out test with 10 second wait, add Per's new test, add test to ensure no two cores use the same index directory > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529198#comment-13529198 ] Commit Tag Bot commented on SOLR-4114: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1420327 SOLR-4114: add back commented out test with 10 second wait, add Per's new test, add test to ensure no two cores use the same index directory > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529184#comment-13529184 ] Mark Miller commented on SOLR-4114: --- bq. Add the checkNoTwoShardsUseTheSameIndexDir thing if you (also) find it usefull (see above) Missed that comment - I'll add this in to the mix. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529180#comment-13529180 ] Mark Miller commented on SOLR-4114: --- Great patch! I just had to make a couple small tweaks to make the policeman happy - added license files for the test jars, changed the new test to extend LuceneTestCase. Good stuff found by the long but useful precommit ant target. I also added in the 10 second wait for now. I'll commit very shortly. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Mark Miller > Labels: collection-api, multicore, shard, shard-allocation > Fix For: 4.1, 5.0 > > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529093#comment-13529093 ] Per Steffensen commented on SOLR-4114: -- Thanks Mark Guess what is left is: * Decide if you want to enable the call to "checkCollectionIsNotCreated" and set the wait in it to 10 sec (or so), or if you are prepared to drop this part of the test, now that you got the OverseerCollectionProcessorTest * Maybe make a comment on SOLR-4043 that you ought to modify the BasicDistributedZkTest.testCollectionsAPI test as a part of solving that issue. * Add the checkNoTwoShardsUseTheSameIndexDir thing if you (also) find it usefull (see above) * Add SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch But you probably want to read through the (last of the) comments above to make sure we didnt miss something we promissed (eash other and/or Robert Muir) :-) Regards, Per Steffensen > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529069#comment-13529069 ] Mark Miller commented on SOLR-4114: --- Hey Per - thanks! I'll try and get back to helping finish up this issue - at a minimum I'll get a commit or two in on it today. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: > SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, > SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511348#comment-13511348 ] Per Steffensen commented on SOLR-4114: -- I verified that the removal of my "controlled" instance-dir and data-dir OverseerCollectionProcessor.createCollection is ok. I needed to do some investigations on how instance-dir and data-dir works. Now I know and can see that the "controlled" instance-dir and data-dir was a bad idea. Thanks for being so thorough, Mark. During my investigations of instance-dir and data-dir I came up with an additional test for BasicDistributedZkTest.testCollectionAPI, namely to do a test making sure that when you have created a lot of collections you will not end up with any two (or more) shards using the same index-dir - that was actually what I was affraid would happen when you (Mark) removed the "controlled" instance- and data-dir. This additional test-part will run very fast (200 ms on my local machine), so it will not extend the run-time of the test noticeably to include it. Instead of sending a patch I will just explain what to do to get this additional testing into BasicDistributedZkTest (this description works on 4.0, but I couldnt imagine that it wouldnt on 5.x or 4.x): * Add this method somewhere in BasicDistributedZkTest {code} private void checkNoTwoShardsUseTheSameIndexDir() throws Exception { Map> indexDirToShardNamesMap = new HashMap>(); List servers = new LinkedList(); servers.add(ManagementFactory.getPlatformMBeanServer()); servers.addAll(MBeanServerFactory.findMBeanServer(null)); for (final MBeanServer server : servers) { Set mbeans = new HashSet(); mbeans.addAll(server.queryNames(null, null)); for (final ObjectName mbean : mbeans) { Object value; Object indexDir; Object name; try { if (((value = server.getAttribute(mbean, "category")) != null && value.toString().equals(Category.CORE.toString())) && ((value = server.getAttribute(mbean, "source")) != null && value.toString().contains(SolrCore.class.getSimpleName())) && ((indexDir = server.getAttribute(mbean, "indexDir")) != null) && ((name = server.getAttribute(mbean, "name")) != null)) { if (!indexDirToShardNamesMap.containsKey(indexDir.toString())) { indexDirToShardNamesMap.put(indexDir.toString(), new HashSet()); } indexDirToShardNamesMap.get(indexDir.toString()).add(name.toString()); } } catch (Exception e) { // ignore, just continue - probably a "category" or "source" attribute not found } } } assertTrue("Something is broken in the assert for no shards using the same indexDir - probably something was changed in the attributes published in the MBean of " + SolrCore.class.getSimpleName(), indexDirToShardNamesMap.size() > 0); for (Entry> entry : indexDirToShardNamesMap.entrySet()) { if (entry.getValue().size() > 1) { fail("We have shards using the same indexDir. E.g. shards " + entry.getValue().toString() + " all use indexDir " + entry.getKey()); } } } {code} * Add a call to this method (checkNoTwoShardsUseTheSameIndexDir();) at the end of BasicDistributedZkTest.testCollectionsAPI * Add the line "lst.add("indexDir", getIndexDir());" to SolrCore.getStatistics() so that index-dir will also be part of the information exposed in the MBean of SolrCore Please consider including the additional test. It scans all SolrCores in the system to see if any of them share index-dir. I do the scanning by accessing MBean info from SolrCores - the simplest way I could come up with. It means that SolrCore will now also expose index-dir through its MBean, but I guess no one would have anything against that. Regards, Per Steffensen > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511336#comment-13511336 ] Per Steffensen commented on SOLR-4114: -- bq. I'm personally okay with adding like a 10 second wait until that gets in. So please do it. Enable the call to "checkCollectionIsNotCreated" and set the wait in it to 10 sec. Maybe also make a comment on SOLR-4043 that you ought to modify the BasicDistributedZkTest.testCollectionsAPI test as a part of solving this issue. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510691#comment-13510691 ] Mark Miller commented on SOLR-4114: --- SOLR-4043 (collection api responses) should make the missing test much easier it seems to me. I'm personally okay with adding like a 10 second wait until that gets in. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510610#comment-13510610 ] Mark Miller commented on SOLR-4114: --- bq. vetoing those commits before they can be added at all, Vetos are not made for that and IMO would not apply, but okay... > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510607#comment-13510607 ] Mark Miller commented on SOLR-4114: --- A veto requires a formal announcement accompanied with a valid technical argument. I know you are used to throwing them around rather casually, but generally, -1, +1 are used to indicate votes of direction at Apache. This test in particular is actually many tests - compiled together to save the time of setting up and tearing down lots of jetties. If you were to @Nightly it in the name of so called "progress", I would break it up into many tests, each under a minute, and increase the total test run time. I'd prefer it was all faster this way, but I'm okay with that way too. @Slow was introduced the last time these discussion came up and if you don't want to run these tests, I suggest you take advantage of it. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510590#comment-13510590 ] Robert Muir commented on SOLR-4114: --- thats ok if you want to veto progress mark. I really think we can be reasonable with solr tests and do: * reasonably good unit tests that are simpler and easy to debug * nightly-annotated slow tests that are more like the solr tests of today. This would probably remove a lot of the frustration lucene developers have with solr tests. In all cases, its fine if people want to veto the \@Nightly annotations of existing tests. But these will be the very last slow solr tests we have in our build, because I already mentioned I'm going to raise issues about new tests added with huge sleeps, vetoing those commits before they can be added at all, so the test situation does not continue to get even worse. I'm trying to take a stand against the always-failing/super-slow test situation of today. Its been like this for far too long. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510562#comment-13510562 ] Mark Miller commented on SOLR-4114: --- bq. I think if a test takes double-digit seconds then it should be @Nightly. -1 to running all these solrcloud tests only as nightly. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510556#comment-13510556 ] Robert Muir commented on SOLR-4114: --- Unfortunately the \@Slow is enabled by default. I think if a test takes double-digit seconds then it should be \@Nightly. I looked and its strange that no solr tests are never marked this way, lets start here. We do it with lots of lucene tests, keeping the regular runs fast. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510549#comment-13510549 ] Per Steffensen commented on SOLR-4114: -- bq. Add the unit tests for "good" checks during normal test runs Ok. I like a good discussion, that that will be your prize for participating so eagerly :-) Will try to find time to do it soon. bq. Also add the slower test but mark it @Nightly. So it runs at night in jenkins tests runs. So we agree to commit the test with a 10-20 sec wait? Yeah I would rather not add the @Nightly, since I do not really know your test-target setup (nightly, pre-commit, etc). I certainly agree that such a slow test needs to be run at night, but actually I guessed that the @Slow annotation was for that. The test is already marked as @Slow. I certainly can add the @Nightly annotation if people agree that it should be added to BasicDistributedZkTest? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510546#comment-13510546 ] Robert Muir commented on SOLR-4114: --- But i feel like there is a compromise: Add the unit tests for "good" checks during normal test runs. Also add the slower test but mark it \@Nightly. So it runs at night in jenkins tests runs. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510534#comment-13510534 ] Per Steffensen commented on SOLR-4114: -- Agree with everything you just said! bq. But when I see arguments to add an e.g. 60 second test like this, I felt the need to speak up I understand, and it is not that I this kind of test! I just want the feature tested and protected from someone else ruining it tomorrow. And I cant come up with another way of testing that nothing happens, than wait for a while, and assert that it did not. And IMHO unit-tests directly on OverseerCollectionProcessor is not enough to ensure that the functionality seen as a whole from the outside will not be broken - and it IS the main concern (if we had a real customer :-) ) > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510526#comment-13510526 ] Robert Muir commented on SOLR-4114: --- Per I don't want to drown out your issue either. But when I see arguments to add an e.g. 60 second test like this, I felt the need to speak up. Solr has a lot of these tests today (many fail on a daily basis), but I'm not sure how much use they really are. If tests are failing constantly and nobody is fixing them: then there is a problem :) Look: I'm totally fine with such tests being annotated \@Nightly and running on jenkins at night (as long as they are reliable and debuggable). But slowness itself presents another barrier for someone else trying to debug the test. So I think its important to have quicker, simpler unit tests to encourage refactoring and good APIs. Solr is really missing this. And of course for the record I agree with you about the whole issue of refactoring in general. I feel like refactoring in solr is not really encouraged, because there is no faith in the test suite. So its safer to just stick with the status quo instead. This is how projects die. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510522#comment-13510522 ] Per Steffensen commented on SOLR-4114: -- bq. but it should be used as an adjunct, not a replacement for hard-core unit testing Guess I kinda agree, just dont overdue the hard-core unit testing, and certainly not forget to do the behavioural/integration tests. bq. Solr is screaming out for a "mocking" capability so that more "integration" testing can be done at the unit test level Uhhh yes. Mocking is a nice way of having your "integration" tests not include the whole world. E.g. an a test included the entire request handling, seen from the outside down to the OverseerCollectionProcessor, but where we mock the calls to the Core Admin API and just assert that the requests forwarded to the Core Admin API is as expected. Mocking capability in Solr will take tests to the next level, but why not just start using it? - e.g. mockito is just go-use. bq. its hard to tell if its just an unrelated sporatic fail, because the test suite is unreliable in general Yes, my impression is also that the tests are unreliable - sometimes fail and sometimes not, and it is really hard to know if you did something wrong. But I will still claim that it is not because of integration tests - it might be because they, as so many other things in Solr, are done in a undisciplined way. Well everyone have their experience and believes and I am always glad to discuss with you guys. Glad that you decided to really join Robert, even though you thought that there where nothing to discuss. But I guess I have stated my opinion now, and I dont want to do any more debating here on this issue/ticket. If you want to continue on the mailing-list I will probably join, but this issue/ticket should be about this issue/ticket from now on. We still have unhandled matters (e.g. controlling instance-dir and/or data-dir), and I dont want the discussion about those matters to drown in "partly" unrelated discussions about test-strategies. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510515#comment-13510515 ] Per Steffensen commented on SOLR-4114: -- bq. Its not working out: and its exactly due to this attitude (which must be changed) Its also my impression that is does not work out, we just do not share opinion about reason. bq. you contradict yourself. you say only the "seen from the outside" matters, yet discourage the unit tests that make it easier to develop new APIs, and the unit tests that "test" actual usage of the API. This is critical to refactoring. No I do not contradict myself - you misunderstand me. If an API can be used "from the outside" it certainly needs test-coverage - whether or not you see it as unit or behavioural tests. The collection-creation/reload/removal featurs of OverseerCollectionProcesser (the one i question here) is not accessable from "the outside". The Collection API is, which is triggerede by sending a request getting handled by CollectionHandler etc. eventaully submitting a "job" for the OverseerCollectionProcessor. This should (also) be tested by sending such request "from the outside" to a complete Solr cluster, and assert that the expected collection/shards ends up being created/reloaded/removed, and in case of create that you can index data into the new collection and search the data up again, and i case of remove that the collections/shards disappear from the system (ZK and that data-dirs are delete if that is what you asked for) etc. If you want to supplement with tests directly on OverseerCollectionProcessor that is fine. But such tests are mainly usefull during the development process, and not to ensure that no one in the future breaks the feature you introduced. The feature seen from "the outside" is typically unchanged during refactoring, and the feature seen from "the outside" is what matters. Say we some day decides that collection creation shouldnt really be handled asynchronously by the Overseer, but that we want to handle it synchronously before responding to the one that sent the collection creation request. In that case OverseerCollectionProcessor will probably be deleted (yes most of the code will still remain, but will probably be moved/restructured to other classes/units somewhere else), and there will be tests in a OverseerCollectionProcessorTest that needs to be moved and changed, and it is not certain that the one doing the refactor "gets" (the reason or point) of all the tests of OverseerCollectionProcessor or that he is able to tweak them to "simular" tests of the new components handling the same aspects. The behavioral test does not need to change, as it ensures that the feature seen "from the outside" did not change, and that is very important doing refactoring. As I said, I am certainly not against "unit"-tests, but is is mainly a "working"-tool for the developers. Behavioural tests are the ones that ensure, that your system as a whole works as it is supposed to - and whether you want it or not, it IS what creating a system is all about. Guess you would realize a thing or two by working on a project for a real custumor setting up real demands. Those demands are all requirements to the system as a whole seen from "the outside". He doesnt care about the internal working of the system. Our testers work a lot with the custumor (or the PO representing him) to work out behavioural tests to make sure we fulfill his needs and requirements. We do lots of unit-tests in our project also, but have NO problem what-so-ever refactoring all the time. So I can tell you that behavioural tests and daring to do constant refactoring can go hand in hand. It is a little harder when you have only unit-tests. Start thinking about Solr as a product you need to deliver to some "artificial" customer, and try to think the way he would think - only system-level behaviour matters to him. "Unit"-tests are "working"-tools for the developers. bq. I'm also against more tests with sleeps As I said "me too". But rather test a feature the way it can be tested, than not testing it. bq. you can expect to see me vote on commits that have huge sleeps Uhh I hope not, you just said you where against sleeps. But I guess mean "you cannot..." :-) > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, s
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510508#comment-13510508 ] Robert Muir commented on SOLR-4114: --- I agree with Jack about the mocking, its really needed. I feel like with solr cloud it could help a lot, e.g. simulate this guy going down and that guy and not deal with timing issues and so on. Just like lucene doesn't actually write continuously to your disk until its really full to TestIndexWriterOnDiskFull, it mocks the disk full. Sure these mock techniques aren't perfect, but they are much easier to debug. for "real integration tests" maybe junit isnt even the best tool for the job anyway, so i would prefer if these were separate from the unit tests. These integration tests are especially frustrating for lucene developers, who *seriously* dont want to break solr when they change things underneath it. but the test suite doesn't really allow this you know, and when something does break its hard to tell if its just an unrelated sporatic fail, because the test suite is unreliable in general. There is just no replacement for good unit tests. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510504#comment-13510504 ] Jack Krupansky commented on SOLR-4114: -- Definition of integration testing: A process where you spend 75% to 95% of your time (and the time of people tracking down test failures) GUESSING what numbers to use for sleeps. I am a fan of integration testing, but it should be used as an adjunct, not a replacement for hard-core unit testing. Solr is screaming out for a "mocking" capability so that more "integration" testing can be done at the unit test level. Mocking can also improve testing by varying the characteristics of dependencies in a controlled manner rather than have integration tests that test only a narrow range of characteristics that vary between environments and over time in an unpredictable and non-repeatable manner. I mean, it would be nice if we could develop components that were less sensitive to changes in the performance of components that they depend on. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510497#comment-13510497 ] Robert Muir commented on SOLR-4114: --- I'm also against more tests with sleeps. you can expect to see me vote on commits that have huge sleeps. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510492#comment-13510492 ] Robert Muir commented on SOLR-4114: --- Per: you contradict yourself. you say only the "seen from the outside" matters, yet discourage the unit tests that make it easier to develop new APIs, and the unit tests that "test" actual usage of the API. This is critical to refactoring. Its a bonus that you have to change unit tests when you refactor APIs, its like having little example apps that use your apis to "test" if the api change has some quirks and so on. you are forced to actually use it. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510490#comment-13510490 ] Robert Muir commented on SOLR-4114: --- Its not working out: and its exactly due to this attitude (which must be changed). (release manager who had to release solr 4.0, where its own test suite didnt pass because no one really gives a shit about this stuff) > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510487#comment-13510487 ] Per Steffensen commented on SOLR-4114: -- bq. I don't even think we need to argue about it really. Well if that is your opinion you will have a problem working on big projects. You should find a private project to work on in you basement. I think it is worth arguing about. Maybe not on this issue/ticket, but in general in the community on the dev mailing list. I am surrounded by professional testers in my everyday work, and what I hear from them is more and more pulling towards behavioural testing. bq. Currently solr has a lot of integration tests, how is that working out?! I dont know how it works out, but if you see a lot of problems, I wouldnt blame the integration-test over unit-test strategy (is such a strategy exists). I have a gut fealing about then main problem of Solr is that no one ever dare to do refactoring - the code is a mess. And that you really do not "trust your test-suite". If you want to to be able to "trust you test-suite" enough to dare doing big refactorings, integration/behavioural tests are by far the best. Typically when you do major refactoring you do not change the system-behaviour seen from the outside. You basically re-organize the code internally in order to be able to keep adding features, fixing bugs etc. without getting too confused. Therefore "integration/behavioural" tests do not have to be changed during a big refactor. Unit-tests usually do, because a big refactor often includes getting rid of existing "units", splitting up existing "units", adding new "units" etc. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510478#comment-13510478 ] Robert Muir commented on SOLR-4114: --- I don't even think we need to argue about it really. Currently solr has a lot of integration tests, how is that working out?! Look at solr's queryparser tests. they index documents and run actual queries, instead of simply assertEquals(expectedQuery, actual). This provides no benefit, and only makes it harder to debug. instead of seeing what is different about the query structure, you typically get "xpath failed" and have to start digging. This is just the simplest possible Captain Obvious example. The problem runs much deeper. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510471#comment-13510471 ] Per Steffensen commented on SOLR-4114: -- Guess there are two "movements" in the industry at the moment * TDD: Mostly focused at testing "units" (single components), hoping that if all "units" work as they are supposed to, the entire system where all those "units" are used in combination will also work as it is supposed to * BDD: Mostly focused at testing "behaviour" of a system seen from the outside, because that is basically all you care about. No one cares how the system works internally. Everyone cares about how the system works as a whole, when used to the extend you can use it "from the outside". This is especially true for the end users of the system, and at the end of the day a system is created to fullfil the users needs. bq. Integration-level tests aren't nearly as useful I completely disagree with that. I, personally, are not that much into "unit"-tests because thay basically do not show that the system as a whole "behaves" as it is supposed to. Behavioural/integration-tests does, where you test against an entire system using it the way it can be used "from the outside", and asserting that "result" is as expected seen "from the outside". bq. and much more difficult to debug You are right about that. It like "unit"-tests to supplement behavioural/integration-tests whenever it is found to be necessary. We can add a OverseerCollectionProcessor "unit"-test, testing some of this new functionality, but in my mind (as I said) "it will not ensure the correct system-level functionality to nearly the same degree", and basically thats all we are interrested in. If the community would like we can supplement with "unit"-tests for this new feature, but you will have to fight me (FWIW) to get rid of the integration-ish test of the feature. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510450#comment-13510450 ] Robert Muir commented on SOLR-4114: --- {quote} If we go do a more unit-test-ish test directly on OverseerCollectionProcessor we might be able to do something faster, but it will not ensure the correct system-level functionality to nearly the same degree. {quote} But these tests are really necessary. I think solr has enough "integration" tests and not enough unit tests already. This makes it difficult to change solr code: because there is no direct correlation with the functionality and tests. Integration-level tests aren't nearly as useful, and much more difficult to debug. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510351#comment-13510351 ] Per Steffensen commented on SOLR-4114: -- bq. I suppose a 10 second sleep is more agreeable, but these things add up and and I'd rather come up with a better test. I would rather too, but believe it is hard to make sure something does not happen, without giving it a chance to happen, and check that it did not. So there is no better way of testing it, as I see it. At least if you want this kind of integration-ish test, where you start a full system and do something to it as if you where a real client acting against a real system. And I like this kind of tests. If we go do a more unit-test-ish test directly on OverseerCollectionProcessor we might be able to do something faster, but it will not ensure the correct system-level functionality to nearly the same degree. I think you should commit with a 10-20 sec wait, and then if you (or someone else) can come up with a faster way for testing it properly, it is fine for me to make the change. I do not believe I will be able to come up with a proper test of this that is faster. But protect the feature "the slow way" until someone comes up with a faster way of testing. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510038#comment-13510038 ] Mark Miller commented on SOLR-4114: --- I'm happy to commit a test for this, but lets come up with something that doesn't have a long sleep like this. I suppose a 10 second sleep is more agreeable, but these things add up and and I'd rather come up with a better test. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510020#comment-13510020 ] Per Steffensen commented on SOLR-4114: -- Ok, will you please consider enabling the 60 sec test (maybe reduce it to 10 or 30 sec) so that the feature is protected until someone comes up with a better/faster test. Please!!! > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509967#comment-13509967 ] Mark Miller commented on SOLR-4114: --- bq. Still easier to just take it in For me it's easier to not take it in :) I have to vet what I take in. I think you will find it easier to get stuff in if your refactoring is related to the issue. Otherwise make a refactoring issue. bq. Trust your test-suite It's not my test-suite, it's a huge shard test-suite and I don't blindly trust it. It certainly doesn't test everything. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509960#comment-13509960 ] Per Steffensen commented on SOLR-4114: -- bq. We poll so that we wait up to 120 seconds for a slow comp, but a fast comp won't need to wait nearly that long. The 60 wait hits everyone no matter what. We generally try avoid any hard waits like that. I understand you can't really poll in this case, so I'm not sure the best way to test that - I made it a TODO for now. Ok with a TODO, but it should be about making a more clever solution than the 60 sec wait. You commented out the assert-method "checkCollectionIsNotCreated", which means that we now have an unprotected feature in the code-base. Anyone can go ruin this feature tomorrow and no one will notice. Yes, I believe the main value of tests is their ability to "protect" other people from accidently ruining existing fuctionality. All the comments below are very unimportant compared to this - I am really serious about this. Get "checkCollectionIsNotCreated" running now so that the code is protected, then think about a more clever solution later (if you think such a thing exists). TODO's have a tendency to be forgotten. bq. Yeah, I minimized the patch down to just dealing with this issue. I'm away from home and just looking to get this issue completed with minimum fuss. Then the easiest thing would probably have been, to take everything in, except for things you thought would actually ruin existing stuff. Instead of using time to find every little detail that could be left out. Do not misunderstand me, I am glad you used your time to get it committed, but I also want to influence the way you committers work, whenever I have the chance. Only thinking about our common interrest - a good Solr. I have a bad gut feeling that the code-base is so messy because no one ever refactors. Refactoring is something you usually do while solving some specific (potentially) unrelated task. No one goes refactoring just to do refactoring, but it is extremely important that refactoring has an easy way into the code-base. bq. 'nodeUrlWithoutProtocolPart' is just so long and I didn't see it helping much in terms of code readability - if you have a better fitting name that is a little less verbose, I think I'd be more into it. Well, first of all a long saying name is much better than a short misleading name, and second of all that name really isnt very long :-) bq. Yeah, I don't think I agree with changing the users params on him - I'd rather warn and let the user do what he wants to do rather than trying to outthink him ahead of time. If he decides he wants more than one repica on an instance for some reason, that's his deal - we can warn him though. Ok, cool bq. Right - it should match collection1 - eg newcollection/data should be the data dir just like collection1/data and rather than something like newcollection_data. In my experience and testing, data ends up in the cores own instance dir - not some common dir. Didnt learn much from this explanation, but I will have to do a little studying on instance-dir and data-dir to understand how your solution will ever work. I will get back with an official complaint (just a comment here or a mail on the mailing list :-) ) if I still do not think it will work after I have done my studying. bq. Yup - it seemed unrelated and I'm busy so I didn't want to think about it... Still easier to just take it in, unless you saw it harming more than it helped. I am worried about refactoring in this project! Trust your test-suite :-) > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See d
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509910#comment-13509910 ] Commit Tag Bot commented on SOLR-4114: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1417070 SOLR-4114: Allow creating more than one shard per instance with the Collection API. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509882#comment-13509882 ] Commit Tag Bot commented on SOLR-4114: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1417045 SOLR-4114: Allow creating more than one shard per instance with the Collection API. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509864#comment-13509864 ] Mark Miller commented on SOLR-4114: --- bq. In checkForCollection we are willing to wait up to 120 secs (used to be 60 secs) for the collection to be created (correctly). So it is kinda obvious to also wait at least 60 secs to verify that nothing related to a collection was actually created in cases where you want to be sure that nothing is created. We poll so that we wait up to 120 seconds for a slow comp, but a fast comp won't need to wait nearly that long. The 60 wait hits everyone no matter what. We generally try avoid any hard waits like that. I understand you can't really poll in this case, so I'm not sure the best way to test that - I made it a TODO for now. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509861#comment-13509861 ] Mark Miller commented on SOLR-4114: --- Yeah, I minimized the patch down to just dealing with this issue. I'm away from home and just looking to get this issue completed with minimum fuss. 'nodeUrlWithoutProtocolPart' is just so long and I didn't see it helping much in terms of code readability - if you have a better fitting name that is a little less verbose, I think I'd be more into it. bq. I see that you removed the "auto-reduce replica-per-shard t Yeah, I don't think I agree with changing the users params on him - I'd rather warn and let the user do what he wants to do rather than trying to outthink him ahead of time. If he decides he wants more than one repica on an instance for some reason, that's his deal - we can warn him though. bq. You removed the following lines (because you just want default-values for instance-dir and data-dir) Right - it should match collection1 - eg newcollection/data should be the data dir just like collection1/data and rather than something like newcollection_data. In my experience and testing, data ends up in the cores own instance dir - not some common dir. bq. making "shardToUrls"-map (renamed) concurrency protected Yup - it seemed unrelated and I'm busy so I didn't want to think about it. My goal is to get the essence of this thing committed - it's a lot easier to then fight over smaller pieces on top of that. Progress not perfection. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114.patch, SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509687#comment-13509687 ] Per Steffensen commented on SOLR-4114: -- bq. We have to do something else other than a one minute sleep in the test. In checkForCollection we are willing to wait up to 120 secs (used to be 60 secs) for the collection to be created (correctly). So it is kinda obvious to also wait at least 60 secs to verify that nothing related to a collection was actually created in cases where you want to be sure that nothing is created. In the case of waiting for a collection being created correctly you can skip the waiting as soon as you see that is has been created correctly. In the case where you want to see that nothing gets created you have no chance to do that. I think the 60 sec wait fits well into the way the test already works. bq. I'd like the default data dir to remain /data. If you are not doing anything special that works fine. Thats also fine, but as soon at you create more than one shard on the same node, dont you need different data-dirs? You cant have two shards using the same folder for lucene-index-files, right? Did I get the instance-dir and data-dir concepts wrong? Or is it the instance-dir you want to vary between different shards? I would say that running only one shard on a node IS "doing something special". bq. I'd also like the ability to override the data dir on per shard basis, but I'm not sure what the best API for that is. Yes, me too, but that has to be another ticket/issue. For now, until the API allowes to control the data-dir-naming, it should be ok for the algorithm to decide for something reasonable by itself. bq. So I'd like to support what you want, but I don't want to change the default behavior. I agree, but if I got the concepts correct you cannot use "/data" as data-dir for more than one shard on each node. So default behaviour about using "/data" as data-dir will not work as soon as you run more than one shard on a node. I probably got something wrong - please try to explain. bq. My latest patch - I'll commit this soon and we can iterate from there. Well I would prefer you committed my patch, and we can iterate from there :-) It will also make it much easier to get SOLR-4120 in, which I hope you will also consider. Had a quick peek at your patch and have the following comments * I see that you removed the "auto-reduce replica-per-shard to never have more than one replica of the same shard on the same node"-feature and just issue a warning instead in OverseerCollectionProcessor (the "if (numShardsPerSlice > nodeList.size())"-thingy). It is ok for me, eventhough I believe it is pointless to replicate data to the same machine and under the same Solr instance. But then you probably need to change the BasicDistrbutedZkTest also - in checkCollectionExceptations I believe you'll need to change from {code} int expectedShardsPerSlice = (Math.min(clusterState.getLiveNodes().size(), numShardsNumReplicaList.get(1) + 1)); {code} to {code} int expectedShardsPerSlice = numShardsNumReplicaList.get(1) + 1; {code} * You removed the following lines (because you just want default-values for instance-dir and data-dir) {code} params.set(CoreAdminParams.INSTANCE_DIR, "."); params.set(CoreAdminParams.DATA_DIR, shardName + "_data"); {code} I still do not understand how that will work, but hopefully you will explain * You didnt like my rename of variable "replica" to "nodeUrlWithoutProtocolPart" in OverseerCollectionProcessor.createCollection. As I said on mailing-list I dont like the term "replica" as a replacement for what we used to call "shards", because I think it will cause misunderstandings, as "replica" is also (in my mind) a role played at runtime. But getting the terms right and reflect them correctly in API, variable-names etc. across the code must be another issue/ticket. But here in this specific example "replica" is just a very bad name, because the variable is not even containing a "replica-url", which would require the shard/replica-name to be postfixed to the string. So this "replica"-variable is closest to being an node-url (without the protocol part) - NOT a shard/replica-url. I would accept my name-change if I where you, but I have a motto of "carefully choosing my fights" and this is a fight I will not fight for very long :-) * I see that you did not include my changes to HttpShardHandler, making "shardToUrls"-map (renamed) concurrency protected through getURLs-method (renamed to getFullURLs), so that you do not have to use the map so carefully outside. I understand that it has very little to do with this issue SOLR-4114, but it is a nice modification. Please consider committing it - maybe related to another issue/ticket. It is little bit of a problem that good refactoring does not easy get in as part of issues/tickets
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509330#comment-13509330 ] Mark Miller commented on SOLR-4114: --- Thanks for the updated patch. Thanks for the corrected patch! Two things I'd like to address before committing: 1. +// nocommit +Thread.sleep(6); We have to do something else other than a one minute sleep in the test. 2. Setting the shard as part of the data dir. I'd like the default data dir to remain /data. If you are not doing anything special that works fine. I'd also like the ability to override the data dir on per shard basis, but I'm not sure what the best API for that is. So I'd like to support what you want, but I don't want to change the default behavior. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509328#comment-13509328 ] Mark Miller commented on SOLR-4114: --- bq. consider backporting to 4.x Currently, all of 'my' work is targeted towards 4.x - with the caveat that some trickier stuff might back in 5x before being back ported. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508382#comment-13508382 ] Per Steffensen commented on SOLR-4114: -- Hope you will commit, and consider backporting to 4.x, since we expect to upgrade to 4.1 when it is released, and we would really like this feature to be included. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508288#comment-13508288 ] Mark Miller commented on SOLR-4114: --- Should be against 5x - I'm going to US west coast for a week - so not sure when I'll get back to this - I may try and get it going while I'm out there and I may not have time till I get back. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508287#comment-13508287 ] Per Steffensen commented on SOLR-4114: -- Where does your patch fit, Mark? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507932#comment-13507932 ] Per Steffensen commented on SOLR-4114: -- Thanks, Mark. Let me know if I can help in any way. It is not that big a patch, but I did take the opportunity to do some refactoring - you know the bell in my head preventing me from just doing ctrl-c/ctrl-v :-) This makes the patch a little bigger. Let me know if you want me to make a patch fitting on top of 4.x or 5.x. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507759#comment-13507759 ] Mark Miller commented on SOLR-4114: --- I started working on patching this into recent stuff, and it's more of a pain than I thought. I must have missed some piece as I tried to merge it up and the test is failing. Giving up for tonight. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505808#comment-13505808 ] Per Steffensen commented on SOLR-4114: -- Created another ticket for the spread-shards-according-to-provided-list thingy. SOLR-4120 > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505612#comment-13505612 ] Mark Miller commented on SOLR-4114: --- Well, it makes things a little more painful in that I have to merge it to 4x/5x, but I can do that. It's probably not too difficult. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505597#comment-13505597 ] Per Steffensen commented on SOLR-4114: -- bq. so no, generally nothing is being back ported to the 4.0 branch Well, I guess the essence of my question is, if it is ok that I keep providing patches relative to lucene_solr_4_0? At least for this issue and the spread-shards-across-provided-list-of-solrs one? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505590#comment-13505590 ] Mark Miller commented on SOLR-4114: --- bq. But is it also going to be backported to lucene_solr_4_0 Given past discussion, it's very unlikely that we will release a 4.0.1 (I was for it FWIW) and will just do a 4.1 - so no, generally nothing is being back ported to the 4.0 branch. If we did end up deciding to do a 4.0.1, then we would select which issues should go in and then do those back ports later. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505578#comment-13505578 ] Per Steffensen commented on SOLR-4114: -- bq. 5x and then merged to 4x - just that small fix though - have not had a chance to review this patch fully yet. But is it also going to be backported to lucene_solr_4_0, which is actually the branch I am working on top of? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505575#comment-13505575 ] Per Steffensen commented on SOLR-4114: -- Well Im off for today. Will probably (if my POs head does not turn green) be making the spread-shards-according-to-provided-list feature tomorrow. If you commit the entire patch for SOLR-4114 it will be easier for me to provide a new patch for this new feature and attach it to a new issue :-) > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505571#comment-13505571 ] Mark Miller commented on SOLR-4114: --- bq. On which branch are you committing, Mark? 5x and then merged to 4x - just that small fix though - have not had a chance to review this patch fully yet. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505570#comment-13505570 ] Per Steffensen commented on SOLR-4114: -- bq. I would too, but we would still disagree on what that meant since I would interpret the "number of times the data is replicated"... I actually agree with you. I just dont like "replica" to part of the name for it then. If we rename "replication-factor" to "number-of-copies" or something I would be much happier changing the semantics of it :-) But really, this is another issue. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505565#comment-13505565 ] Per Steffensen commented on SOLR-4114: -- bq. I've committed the shared params issue under SOLR-4055 and added Per to the Changes entry. On which branch are you committing, Mark? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505564#comment-13505564 ] Yonik Seeley commented on SOLR-4114: bq. I would expect "replication-factor" to say something about how many times the data is REPLICATED. I would too, but we would still disagree on what that meant since I would interpret the "number of times the data is replicated" to mean the total number of copies that exist after a write operation to the cluster. That seems to be the much more common interpretation in this context since there is no "original"... everyone has stored/indexed a copy. $ echo hello > file1.txt $ cp file1.txt file2.txt How many copies of the file are there? If you look at the state (and not the mechanism by which you arrived there) most would say there are 2 copies. In one interpretation, there is only one "copy", but that's too literal and assignes some special category to the original. http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html "The number of copies of a file is called the replication factor of that file." http://www.datastax.com/docs/1.0/cluster_architecture/replication "The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node." Oracle NoSQL store: http://docs.oracle.com/cd/NOSQL/html/AdminGuide/introduction.html#replicationfactor http://docs.oracle.com/cd/NOSQL/html/AdminGuide/store-config.html "A Replication Factor of 3 gives you shards with one master plus two replicas." Riak: http://wiki.basho.com/What-is-Riak%3F.html "An n value of 3 (default) means that each object is replicated 3 times. When an object’s key is mapped onto a given partition, Riak won’t stop there – it automatically replicates the data onto the next two partitions as well." Splunk: http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Thereplicationfactor "The number of data/bucket copies is called the cluster's replication factor." "The cluster can tolerate a failure of (replication factor - 1) peer nodes. So, for example, to ensure that your system can tolerate a failure of two peers, you must configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of your data available on the remaining peer(s)." It's clear that "3 copies" means 3 total instances of the same data, not 4 (an "original" plus 3 more copies of it.) > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505562#comment-13505562 ] Per Steffensen commented on SOLR-4114: -- bq. When grabbing the params fix I noticed you set the data dir to something like shardname+_data - that's not strictly necessary right? Since each core should have it's own instance dir Well I use the same instance-dir for all shards, but a different data-dir - this is just how we used to do it in my project, but it can be changed. As long as the code uses same instance-dir different data-dirs are necessary though. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505558#comment-13505558 ] Mark Miller commented on SOLR-4114: --- I've committed the shared params issue under SOLR-4055 and added Per to the Changes entry. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505556#comment-13505556 ] Per Steffensen commented on SOLR-4114: -- bq. Personally, I'm not really sold on this auto re balancing idea. I'd prefer the user had to explicitly make these moves. Me neither - and I can say that ES sometimes f it up. At least when I was working with it, but that was mainly because of bad re-balancing algoritms. But I like moving shards "manually" from a admin-console! > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505547#comment-13505547 ] Per Steffensen commented on SOLR-4114: -- bq. could not you do same thing as Elastic Search. Build index with number of shards (initial number is 5). If there is 1 machine in cluster, then all shards are on this machine. If you add more machines, they will move to other machines. It is way simple for administration. This moving shards around as more Solr servers join the cluster is the easiest way to provide elasticity (as I mentioned above somewhere). That is one of the reasons, that I want to be able to run multiple shards for a collection on the same Solr server. In that way you will have shards already to move to other Solrs that might join the cluster later. In Solr, right now, we dont have the abillity to move shards from one server to another (ES has it), but in order to be able to bennefit from such a future feature, you will need to be able have multiple shards on one Solr server. Alternatively you have to go split a shard, but that is much harder, and should only be used if you did not forsee, when you created your collection, that you would add more servers later, and therefore created your collection with multiple shards per server. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505541#comment-13505541 ] Jack Krupansky commented on SOLR-4114: -- I certainly think of "replica" as a "copy of the ORIGINAL", which makes perfect sense in a master with n-slaves configuration, but in a fully distributed environment such as SolrCloud where the leader of a shard can vary over time and updates are distributed to all nodes all of the time, there is no longer the concept of an "original copy" of the data. If anything, the original data is the source data on the wire before it gets instantiated on each node. No node is truly the original. The terminology has this difficulty that it is only partially shared between the worlds of master/slave and the cloud. In master/slave, only the slaves are replicas and the master is the original, while in cloud ALL nodes are replicas since there are no originals. The "leader" is not a "master" copy of the data in the sense of master/slave. So, I guess I am semi-comfortable with replica referring to all instances of the data, but we do need to be careful to highlight the distinction between how the term replica is used in the world of master/slave vs. SolrCloud, especially since many Cloud users will be migrating from the world of master/slave. We also need to be careful not to refer to "leader and replicas" which implies that a leader is not a replica! > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505539#comment-13505539 ] Mark Miller commented on SOLR-4114: --- bq. you add more machines, they will move to other machines. Personally, I'm not really sold on this auto re balancing idea. I'd prefer the user had to explicitly make these moves. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505535#comment-13505535 ] Mark Miller commented on SOLR-4114: --- bq. Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? My preference would be a new issue. If it has to be done as one piece, I would wait for this to go in before supplying the patch for that issue. Or supply a patch for that issue and note that it requires applying this patch first. Combining multiple issues into one patch just makes it more difficult to get it in generally. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505533#comment-13505533 ] Radim Kolar commented on SOLR-4114: --- could not you do same thing as Elastic Search. Build index with number of shards (initial number is 5). If there is 1 machine in cluster, then all shards are on this machine. If you add more machines, they will move to other machines. It is way simple for administration. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505531#comment-13505531 ] Mark Miller commented on SOLR-4114: --- When grabbing the params fix I noticed you set the data dir to something like shardname+_data - that's not strictly necessary right? Since each core should have it's own instance dir? I've been thinking about how to set custom datadirs with this api - it would be nice to be able to specify the data dir - and in some cases perhaps base it on something like the core name rather than just some static string. But have you found it 'necessary' with your work? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505506#comment-13505506 ] Per Steffensen commented on SOLR-4114: -- Another more urgent problem (for me) is that I need to do another change to the Solr Collection API, before we can use it as a replacement for what we already do in our project (where we create each shard one by one in OUR code). We split our set of Solr servers into two subsets - Data-Solrs and Search-Solrs. The Search-Solrs are not supposed to carry any data and therefore to be occupied by indexing. Search-Solr instead play the role of receiving queries from the outside, sub-quering the Data-Solrs and combining the final total response to the "outside". Data-Solrs are where we create the data-carrying collections. Data-Solrs need more CPU and IO-capabilities while Search-Solrs need more RAM - hence the splitup. Therefore I need to be able to provide a list of Solrs to the create operation of the Solr Collection API. The shards are then only allowed to be spread shards for the collection over the Solrs in this list - default list could be "all Solrs". As this list we, in our Solr-based projbect, will give our list of Data-Solrs. Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? I can create another issue but provide a combined patch. Are you interrested in such a feature at all? That is, a feature where the create operation takes a list of Solrs to spread the created shards over. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505496#comment-13505496 ] Per Steffensen commented on SOLR-4114: -- bq. Per: this is unrelated to your patch of course - it just happened to come up here. No problem. I could make it as part of this patch if you want, but Im not sure I agree with your way of interpreting the term "replication-factor". I would expect "replication-factor" to say something about how many times the data is REPLICATED. If I run with only one copy of the data for each slice, I would logically say that my data is not replicated, and that matches the replication-factor of 0. I have used HDFS and HBase a little a year or so ago, but Im not sure what meaning they put into the term "replica". I've also worked a lot with ElasticSearch (which I believe is more of a pendant to Solr) and in ElasticSearch I believe they use the term "replica" as the number of ADDITIONAL copies of the data - equal to your/our current implementation in Solr. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505472#comment-13505472 ] Per Steffensen commented on SOLR-4114: -- bq. This fix belongs with the issue that fixed delete and reload - I'm going to fix it there. Yes of course, it is just hard for me to split up the patch, because it is all needed for the tests to be green. But commit-wise it belongs to the other issue. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505471#comment-13505471 ] Yonik Seeley commented on SOLR-4114: bq. Ok, its just than the replicationFactor you specify in your request is the other thing. Hmmm, you're right: "Note: replicationFactor defines the maximum number of replicas created in addition to the leader from amongst the nodes currently running" That's not consistent with the original definition (http://wiki.apache.org/solr/NewSolrCloudDesign), the way the state is represented in clusterstate, or the way others use the term such as in hbase/HDFS, cassandra, oracle, etc. The important part is how many times the data is stored (the replication factor), and things like leaders are more of an implementation detail. Luckily we don't yet store this in the cluster, so there's no back compat issue with existing clusters. There's only a change when creating a new cluster, but that seems relatively minor. Given that, I'd lean toward changing this parameter to be in line with common usage. Per: this is unrelated to your patch of course - it just happened to come up here. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505468#comment-13505468 ] Mark Miller commented on SOLR-4114: --- bq. fixed in collectionCmd (used for delete and reload) but not in createCollection This fix belongs with the issue that fixed delete and reload - I'm going to fix it there. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505397#comment-13505397 ] Per Steffensen commented on SOLR-4114: -- Patch including the maxShardsPerNode feature comming up. And (much) better testing of the create operation of the Collections API. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505368#comment-13505368 ] Per Steffensen commented on SOLR-4114: -- bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). I understand that you can view all shards under a slice as a "replica", but in my mind "replica" is also a "role" that a shard plays at runtime - all shards except one under a slice plays the "replica role" at runtime, the remaining shard play the "leader role". To not create to much confusion I suggest you use the term shards for all the instances under a slice, and that you use the term "replica" only for a role that a shard plays at runtime. But that of course would require changes e.g. to Slice-class where e.g. getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code cleaning up in term-usage. Suggested terms: * collection: A big logical bucket to fill data into * slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping * shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice. * replica and leader: Roles played by shards at runtime. As soon as the system is not running there are no replica/leader - there are just shards * node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server * node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505296#comment-13505296 ] Per Steffensen commented on SOLR-4114: -- bq. Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to Solr 4.1 back compat. Of course, I agree! But anyway... > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505295#comment-13505295 ] Per Steffensen commented on SOLR-4114: -- bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). Ok, its just than the replicationFactor you specify in your request is the other thing. You get replicationFactor + 1 shards per slice, if we define replicationFactor as the one you give in your request. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505001#comment-13505001 ] Yonik Seeley commented on SOLR-4114: bq. > So that we don't lose functionality we currently have? bq. So now you care about backwards compatibility? I was speaking specifically about functionality, not back compatibility. bq. With your current solution there will be no "waiting until that machine comes back up". You will just end up with 8 slices, where 7 of them have 2 replica, and the last only have 1 replica. Correct. When I said "it's entirely reasonable for a user to want to wait", I meant wait to create the additional replica for one shard, not wait to create the whole collection. Although I guess it might be useful to be able to fail collection creation if certain specified constraints aren't met (including a min replication factor). As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504973#comment-13504973 ] Mark Miller commented on SOLR-4114: --- Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to Solr 4.1 back compat. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504954#comment-13504954 ] Per Steffensen commented on SOLR-4114: -- bq. So that we don't lose functionality we currently have? So now you care about backwards compatibility? :-) You didnt care about backwards compatibility from 3.6 to 4.0 when you introduced optimistic locking (including error in case of updating an existing document without providing correct version), which is forced upon you in 4.0 if you choose to run with version-field and update-log. There are perfectly valid reasons for wanting to use version-field and update-log, without wanting to have fullblown optimistic locking. My solution to SOLR-3178 support this kind of backwards compatibility by letting you explicitly choose among update-semantics modes "classic", "consistency" and "classic-consistency-hybrid". So if you come from 3.6 and want backwards compatibile update-semantics, but also want version-field and update-log, you just choose update-semantics "classic" :-) See http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics. Im just teasing you a little :-) But anyway, I like backwards compatibility so you are right, we probably do not want to do something that change default behaviour in 4.0.0. Will have a look at a solution tomorrow. It is kinda late in europe now. bq. Example: you have 24 servers and create a collection with 8 shards and a target replication factor of 3... but one of the servers goes down in the meantime so one shard has only 2 replicas. It's entirely reasonable for a user to want to wait until that machine comes back up rather than doubling up on a different node. Assume you mean replication-factor of 2? With a replication-factor of 2 you will get 3 shards per slice. With your current solution there will be no "waiting until that machine comes back up". You will just end up with 8 slices, where 7 of them have 2 replica, and the last only have 1 replica. With the patch I provided today you will end up with 8 slices, where all of them have 2 replica - but one of the servers will be running two shards and the solr down will not be running any (when it comes back up). I probably would prefer my current solution - at least you acheive the property that any two servers can crash (including disk crash) without you loosing data - which is basically what you want to acheive when you request replication-factor of 2. But waiting for the machine to come back up before creating the collection would certainly be the best solution. It is just extremly hard to know if a machine is down or not - or if you intented to run one server more than what is currently running. In general there is no information in solr/ZK about that - and there shouldnt. In this case a maxShardsPerNode could be a nice way to tell the system that you just want to wait. But then it would have to be implemented correctly, and that is really hard. In OverseerCollectionProcessor you can check if you can meet the maxShardsPerNode requirement with the current set of live solrs, and if you cant just dont initiate the creation process. But a server can go down between the time where the OverseerCollectionProcessor checks and the time where it is supposed to create a shard. Therefore it is impossible to guarantee that the OverseerCollectionProcessor does not create some shards of a new collection without being able to create them all while still living up to the maxShardsPerNode requirement. In such case, if you really want to live up to the maxShardsPerNode requiremnt the OverseerCollectionProcessor would have to try to delete the shards of the collection that was successfully created. But this deletion process can also fail. Ahhh there is no guaranteeed way. Therefore my idea about the whole thing, is more aming at just having all the shards created, and then move them around later. I know this is not possible for now, but I do expect that we (at least my project) will make support for (manually and/or automatic) migration of shards from one server to another. This feature is needed to acheive nice elasiticty (moving shards/load onto new servers as they join the cluster), but also to do re-balancing after e.g. a solr was down (and a shard that should have been placed on this server was temporarily created to run on another server). Well as I said I will consider the best (small patch :-) ) solution tomorrow. But if I cant come up with a better small-patch-solution we can certainly do the maxShardsPerNode thing - no problemo. It just isnt going to be 100% guaranteed. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: http
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504889#comment-13504889 ] Yonik Seeley commented on SOLR-4114: bq. Well I see no reason to introduce (in the first step at least) a maxShardsPerNode. So that we don't lose functionality we currently have? And I agree that it should be up to the user, hence the proposed parameter to control it. bq. Only potential problem is if his create request is run when not all Solr servers are running, and in such case a maxShardsPerNode could help to stop the creation process. Exactly... there's the main use case. Example: you have 24 servers and create a collection with 8 shards and a target replication factor of 3... but one of the servers goes down in the meantime so one shard has only 2 replicas. It's entirely reasonable for a user to want to wait until that machine comes back up rather than doubling up on a different node. The other use case is the examples on http://wiki.apache.org/solr/SolrCloud > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504881#comment-13504881 ] Per Steffensen commented on SOLR-4114: -- Learned from Steve today, that you usually develop for 5.x on trunk, and then back port to 4.x.y branches. Let me know if you would like a trunk-based patch instead > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504698#comment-13504698 ] Per Steffensen commented on SOLR-4114: -- Well I see no reason to introduce (in the first step at least) a maxShardsPerNode. Your requested numShards and the number of live servers at that point in time will decide the number of shards of each node/server - basically no limit, but the clever user of Solr will probably not want to request to have 1000 shards for a collection if he only have 2 Solr servers, but it should actually be up to the user of Solr. We have been using Solr-cloud for a long time now and we have a very high focus on performance, because we need to end up with a Solr cluster supporting "live-searches" among 50-100 billion records. During numerous performance test we have a.o. played with the number of shards per Solr server per collection. We have run one-month+ tests just pumping data into the cluster to se how loading-time, search-time etc. develops as collections are filled with data. We have run such tests with 1, 4, 8 and 12 shards per Solr server per collection, and each of them have both good and bad properties wrt performance, so until we know (and we should be very carefull taking "good decissions on behalf of every Solr user") that there is a always-true best number for maxShardsPerNode we should be careful putting any limit on the user. I know you just want to give a maxShardsPerNode in the create request, but the user of Solr really should be able to calculate the number of shards going on each Solr when he controls the numShards and when he knows how many Solr servers he is running. Only potential problem is if his create request is run when not all Solr servers are running, and in such case a maxShardsPerNode could help to stop the creation process. But a Solr user probably want to make sure all Solr servers that are supposed to run, are actually running, before he issues a collection creation request, so that he gets shards distributed across all the Solr servers he intend to run. We do that in our project BTW, but outside Solr code. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504659#comment-13504659 ] Yonik Seeley commented on SOLR-4114: What's the proposed API? Perhaps a maxShardsPerNode parameter during the create? Seems like it should default to 1 (the current behavior)? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org