[GitHub] [lucene-solr] beettlle commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits

2020-03-06 Thread GitBox
beettlle commented on a change in pull request #1297: SOLR-14253 Replace 
various sleep calls with ZK waits
URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r389208349
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
 ##
 @@ -1684,58 +1685,39 @@ private void 
doGetShardIdAndNodeNameProcess(CoreDescriptor cd) {
   }
 
   private void waitForCoreNodeName(CoreDescriptor descriptor) {
-int retryCount = 320;
-log.debug("look for our core node name");
-while (retryCount-- > 0) {
-  final DocCollection docCollection = zkStateReader.getClusterState()
-  
.getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName());
-  if (docCollection != null && docCollection.getSlicesMap() != null) {
-final Map slicesMap = docCollection.getSlicesMap();
-for (Slice slice : slicesMap.values()) {
-  for (Replica replica : slice.getReplicas()) {
-// TODO: for really large clusters, we could 'index' on this
-
-String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP);
-String core = replica.getStr(ZkStateReader.CORE_NAME_PROP);
-
-String msgNodeName = getNodeName();
-String msgCore = descriptor.getName();
-
-if (msgNodeName.equals(nodeName) && core.equals(msgCore)) {
-  descriptor.getCloudDescriptor()
-  .setCoreNodeName(replica.getName());
-  getCoreContainer().getCoresLocator().persist(getCoreContainer(), 
descriptor);
-  return;
-}
-  }
+log.debug("waitForCoreNodeName >>> look for our core node name");
+try {
+  zkStateReader.waitForState(descriptor.getCollectionName(), 320, 
TimeUnit.SECONDS, c -> {
 
 Review comment:
   Agreed bout having too many settings, we're already drowning in them.  
   
   Looking back looks like the number was added as part of SOLR-9140 and 
there's no comment of where the "320" came from.  As well, there's another 
retry number 
[here](https://github.com/apache/lucene-solr/pull/1297/files#diff-d5e1be02f6f0c397e18380598aa62b3dR476)
 of "30" but no idea why.  So we already have 2 different numbers of retries.
   
   If the numbers come from empirical experiments then I agree with them being 
constants but because they seem arbitrary seems like good candidates of 
per-application tuning.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] beettlle commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits

2020-03-06 Thread GitBox
beettlle commented on a change in pull request #1297: SOLR-14253 Replace 
various sleep calls with ZK waits
URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r389132693
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
 ##
 @@ -1684,58 +1685,39 @@ private void 
doGetShardIdAndNodeNameProcess(CoreDescriptor cd) {
   }
 
   private void waitForCoreNodeName(CoreDescriptor descriptor) {
-int retryCount = 320;
-log.debug("look for our core node name");
-while (retryCount-- > 0) {
-  final DocCollection docCollection = zkStateReader.getClusterState()
-  
.getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName());
-  if (docCollection != null && docCollection.getSlicesMap() != null) {
-final Map slicesMap = docCollection.getSlicesMap();
-for (Slice slice : slicesMap.values()) {
-  for (Replica replica : slice.getReplicas()) {
-// TODO: for really large clusters, we could 'index' on this
-
-String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP);
-String core = replica.getStr(ZkStateReader.CORE_NAME_PROP);
-
-String msgNodeName = getNodeName();
-String msgCore = descriptor.getName();
-
-if (msgNodeName.equals(nodeName) && core.equals(msgCore)) {
-  descriptor.getCloudDescriptor()
-  .setCoreNodeName(replica.getName());
-  getCoreContainer().getCoresLocator().persist(getCoreContainer(), 
descriptor);
-  return;
-}
-  }
+log.debug("waitForCoreNodeName >>> look for our core node name");
+try {
+  zkStateReader.waitForState(descriptor.getCollectionName(), 320, 
TimeUnit.SECONDS, c -> {
 
 Review comment:
   If this change is being made, should the number of retries be configurable?  
This hardcoded value seems to be used a lot in the code.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org