[jira] [Commented] (SOLR-12050) UTILIZENODE does not enforce policy rules
[ https://issues.apache.org/jira/browse/SOLR-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383222#comment-16383222 ] ASF subversion and git services commented on SOLR-12050: Commit 23aee00213a2c48bd578bcf01a5ed435b0bdc881 in lucene-solr's branch refs/heads/master from noble [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=23aee00 ] SOLR-12031: Refactor Policy framework to make simulated changes affect more than a single node SOLR-12050: UTILIZENODE does not enforce policy rules > UTILIZENODE does not enforce policy rules > - > > Key: SOLR-12050 > URL: https://issues.apache.org/jira/browse/SOLR-12050 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-12050.log.txt > > > I've been poking around TestUtilizeNode and some of it's recent jenkins > failures -- AFAICT the {{UTILIZENODE}} is not behaving correctly per it's > current documentation... > bq. It tries to fix any policy violations first and then it tries to move > some load off of the most loaded nodes according to the preferences > ...based on my testing w/a slightly modified testcase that does additional > logging/asserts, it will frequently choose to move a "random" replica to > move, even when there are existing replicas that violate the policy. > I will be commiting my current improvements to the test while citing this > issue, and marking the test \@AwaitsFix Then i'll attach some logs/comments > showing what i mean. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12050) UTILIZENODE does not enforce policy rules
[ https://issues.apache.org/jira/browse/SOLR-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383217#comment-16383217 ] ASF subversion and git services commented on SOLR-12050: Commit 888c6260f122d03beec03615469dbed444ab62e7 in lucene-solr's branch refs/heads/branch_7x from noble [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=888c626 ] SOLR-12031: Refactor Policy framework to make simulated changes affect more than a single node SOLR-12050: UTILIZENODE does not enforce policy rules > UTILIZENODE does not enforce policy rules > - > > Key: SOLR-12050 > URL: https://issues.apache.org/jira/browse/SOLR-12050 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-12050.log.txt > > > I've been poking around TestUtilizeNode and some of it's recent jenkins > failures -- AFAICT the {{UTILIZENODE}} is not behaving correctly per it's > current documentation... > bq. It tries to fix any policy violations first and then it tries to move > some load off of the most loaded nodes according to the preferences > ...based on my testing w/a slightly modified testcase that does additional > logging/asserts, it will frequently choose to move a "random" replica to > move, even when there are existing replicas that violate the policy. > I will be commiting my current improvements to the test while citing this > issue, and marking the test \@AwaitsFix Then i'll attach some logs/comments > showing what i mean. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12050) UTILIZENODE does not enforce policy rules
[ https://issues.apache.org/jira/browse/SOLR-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382930#comment-16382930 ] Hoss Man commented on SOLR-12050: - I've attached a sample log file from running this test after my assert/logging updates, if you look for the new logging messages, it's pretty easy to see that while the 2nd UTILIZE command is causing a replica to be moved onto the new node (jettyY), it seems to be completley ignoring the fact that there is a core hosted on a "blacklist" (per the policy) port (jettyX) that should be the first candidate for being moved... {noformat} // in this particular run, the first UTILIZENODE command works, // it moves a replica off a random node to jettyX/3 // // (allthough see TODO in test -- based on how the docs are worded, // it's not clear if there's any requirement that it do so) 9201 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [] o.a.s.c.TestUtilizeNode Sending UTILIZE command for jettyX (127.0.0.1:3_solr) 9204 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr] o.a.s.h.a.CollectionsHandler Invoked Collection Action :utilizenode with params node=127.0.0.1:3_solr=UTILIZENODE=javabin=2 and sendToOCPQueue=true ... 9355 INFO (OverseerThreadFactory-20-thread-3-processing-n:127.0.0.1:46180_solr) [n:127.0.0.1:46180_solr] o.a.s.c.a.c.MoveReplicaCmd Replica will be moved to node 127.0.0.1:3_solr: core_node8:{"core":"utilizenodecoll_shard2_replica_n7","base_url":"http://127.0.0.1:33567/solr","node_name":"127.0.0.1:33567_solr","state":"active","type":"NRT"} 9361 INFO (OverseerThreadFactory-20-thread-3-processing-n:127.0.0.1:46180_solr) [n:127.0.0.1:46180_solr] o.a.s.c.a.c.AddReplicaCmd Node Identified 127.0.0.1:3_solr for creating new replica ... 10078 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={node=127.0.0.1:3_solr=UTILIZENODE=javabin=2} status=0 QTime=874 // next up, sanity check which replicas jettyX/3 now has, // then set a new policy saying that port 3 should have 0 replicas... 10079 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [] o.a.s.c.TestUtilizeNode jettyX replicas prior to being blacklisted: [core_node10:{"core":"utilizenodecoll_shard2_replica_n9","base_url":"http://127.0.0.1:3/solr","node_name":"127.0.0.1:3_solr","state":"recovering","type":"NRT"}] 10079 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [] o.a.s.c.TestUtilizeNode Setting new policy to blacklist jettyX (127.0.0.1:3_solr) port=3 ... 10143 INFO (qtp1498399719-27) [n:127.0.0.1:33567_solr] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/autoscaling params={wt=javabin=2} status=0 QTime=59 // now spin up another new node: jettyY/55619, // redundently sanity check the replicas on jettyX again, 10144 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [] o.a.s.c.TestUtilizeNode Spinning up additional jettyY... ... 10361 INFO (zkConnectionManagerCallback-78-thread-1) [] o.a.s.c.c.ConnectionManager zkClient has connected 10365 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [] o.a.s.c.TestUtilizeNode jettyX replicas prior to utilizing jettyY: [core_node10:{"core":"utilizenodecoll_shard2_replica_n9","base_url":"http://127.0.0.1:3/solr","node_name":"127.0.0.1:3_solr","state":"recovering","type":"NRT"}] // Now send a UTILIZENODE command for jettyY/55619, // this *should* move the replica from jettyX->jettyY // (in order to resolve the existing policy violation) 10365 INFO (TEST-TestUtilizeNode.test-seed#[78A4DE08FC5237FE]) [] o.a.s.c.TestUtilizeNode Sending UTILIZE command for jettyY (127.0.0.1:55619_solr) 10366 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr] o.a.s.h.a.CollectionsHandler Invoked Collection Action :utilizenode with params node=127.0.0.1:55619_solr=UTILIZENODE=javabin=2 and sendToOCPQueue=true ... 10448 INFO (OverseerThreadFactory-20-thread-4-processing-n:127.0.0.1:46180_solr) [n:127.0.0.1:46180_solr] o.a.s.c.a.c.MoveReplicaCmd Replica will be moved to node 127.0.0.1:55619_solr: core_node6:{"core":"utilizenodecoll_shard2_replica_n5","base_url":"http://127.0.0.1:46180/solr","node_name":"127.0.0.1:46180_solr","state":"active","type":"NRT","leader":"true"} 10450 INFO (OverseerThreadFactory-20-thread-4-processing-n:127.0.0.1:46180_solr) [n:127.0.0.1:46180_solr] o.a.s.c.a.c.AddReplicaCmd Node Identified 127.0.0.1:55619_solr for creating new replica ... 12710 INFO (qtp1498399719-45) [n:127.0.0.1:33567_solr] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={node=127.0.0.1:55619_solr=UTILIZENODE=javabin=2} status=0 QTime=2343 // but as you can see above, the replica that's added to jettyY/55619 // comes from a completley different node on port
[jira] [Commented] (SOLR-12050) UTILIZENODE does not enforce policy rules
[ https://issues.apache.org/jira/browse/SOLR-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382922#comment-16382922 ] ASF subversion and git services commented on SOLR-12050: Commit e2b3a97587a4387ab138252354d819ce253b625f in lucene-solr's branch refs/heads/branch_7x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e2b3a97 ] SOLR-12050: mark TestUtilizeNode as AwaitsFix as well as adding additional logging/assertions to help see what the bug is (cherry picked from commit 0424d9c06ba52037024ce5f0f678b2aca8c34fb7) > UTILIZENODE does not enforce policy rules > - > > Key: SOLR-12050 > URL: https://issues.apache.org/jira/browse/SOLR-12050 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > > I've been poking around TestUtilizeNode and some of it's recent jenkins > failures -- AFAICT the {{UTILIZENODE}} is not behaving correctly per it's > current documentation... > bq. It tries to fix any policy violations first and then it tries to move > some load off of the most loaded nodes according to the preferences > ...based on my testing w/a slightly modified testcase that does additional > logging/asserts, it will frequently choose to move a "random" replica to > move, even when there are existing replicas that violate the policy. > I will be commiting my current improvements to the test while citing this > issue, and marking the test \@AwaitsFix Then i'll attach some logs/comments > showing what i mean. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12050) UTILIZENODE does not enforce policy rules
[ https://issues.apache.org/jira/browse/SOLR-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382923#comment-16382923 ] ASF subversion and git services commented on SOLR-12050: Commit 0424d9c06ba52037024ce5f0f678b2aca8c34fb7 in lucene-solr's branch refs/heads/master from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0424d9c ] SOLR-12050: mark TestUtilizeNode as AwaitsFix as well as adding additional logging/assertions to help see what the bug is > UTILIZENODE does not enforce policy rules > - > > Key: SOLR-12050 > URL: https://issues.apache.org/jira/browse/SOLR-12050 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > > I've been poking around TestUtilizeNode and some of it's recent jenkins > failures -- AFAICT the {{UTILIZENODE}} is not behaving correctly per it's > current documentation... > bq. It tries to fix any policy violations first and then it tries to move > some load off of the most loaded nodes according to the preferences > ...based on my testing w/a slightly modified testcase that does additional > logging/asserts, it will frequently choose to move a "random" replica to > move, even when there are existing replicas that violate the policy. > I will be commiting my current improvements to the test while citing this > issue, and marking the test \@AwaitsFix Then i'll attach some logs/comments > showing what i mean. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org