[GitHub] [geode] boglesby commented on pull request #7378: GEODE-10056: Improve gateway-receiver load balance

2022-03-24 Thread GitBox


boglesby commented on pull request #7378:
URL: https://github.com/apache/geode/pull/7378#issuecomment-1077908228


   Thats a pretty cool idea. I'm not sure whether the CacheServerMXBean has 
that behavior, but I guess it could be added. In any event, I think this change 
is good. I'm approving this change, but you need to address the 
ParallelGatewaySenderConnectionLoadBalanceDistributedTest failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] boglesby commented on pull request #7378: GEODE-10056: Improve gateway-receiver load balance

2022-03-18 Thread GitBox


boglesby commented on pull request #7378:
URL: https://github.com/apache/geode/pull/7378#issuecomment-1071176231


   I ran a few tests with some extra logging on these changes. They look good.
   
    The receiver exchanges profiles with the locator:
   ```
   [warn 2022/03/16 14:16:12.440 PDT locator-ln  tid=0x50] XXX LocatorLoadSnapshot.updateConnectionLoadMap 
location=192.168.1.5:5370; load=0.0
   
   [warn 2022/03/16 14:16:12.441 PDT locator-ln  tid=0x50] XXX LocatorLoadSnapshot.updateConnectionLoadMap current 
load for location=192.168.1.5:5370; group=__recv__group; inputLoad=0.0; 
currentLoad=0.0
   
   [warn 2022/03/16 14:16:12.441 PDT locator-ln  tid=0x50] XXX LocatorLoadSnapshot.updateConnectionLoadMap updated 
load for location=192.168.1.5:5370; group=__recv__group; inputLoad=0.0; 
newLoad=0.0
   ```
   The connectionLoadMap shows 2 groups, namely the null group (default) and 
the __recv__group group (gateway receiver), each with load=0.0:
   ```
   [warn 2022/03/16 14:16:13.777 PDT locator-ln  tid=0x43] XXX 
LocatorLoadSnapshot.logConnectionLoadMap
   The connectionLoadMap contains the following 2 entries:
group=null
location=192.168.1.5:56224; load=0.0
group=__recv__group
location=192.168.1.5:5370; load=0.0
   ```
    Sender connects to the receiver:
   
   With the default of 5 dispatcher threads, 5 connections are made to the 
receiver. The load goes from 0.0 to 0.006246:
   ```
   [warn 2022/03/16 14:16:53.836 PDT locator-ln  
tid=0x47] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.0
   
   [warn 2022/03/16 14:16:53.836 PDT locator-ln  
tid=0x47] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.00125
   
   
   [warn 2022/03/16 14:16:53.836 PDT locator-ln  
tid=0x5c] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.00125
   
   [warn 2022/03/16 14:16:53.836 PDT locator-ln  
tid=0x5c] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.0025
   
   
   [warn 2022/03/16 14:16:53.837 PDT locator-ln  
tid=0x5b] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.0025
   
   [warn 2022/03/16 14:16:53.837 PDT locator-ln  
tid=0x5b] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.00375
   
   
   [warn 2022/03/16 14:16:53.837 PDT locator-ln  
tid=0x5a] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.00375
   
   [warn 2022/03/16 14:16:53.837 PDT locator-ln  
tid=0x5a] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.005
   
   
   [warn 2022/03/16 14:16:53.838 PDT locator-ln  
tid=0x59] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.005
   
   [warn 2022/03/16 14:16:53.838 PDT locator-ln  
tid=0x59] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.006246
   ```
   The connectionLoadMap shows the same 2 groups but now the __recv__group 
group load is 0.006246 for the gateway receiver:
   ```
   [warn 2022/03/16 14:16:55.831 PDT locator-ln  tid=0x43] XXX 
LocatorLoadSnapshot.logConnectionLoadMap
   The connectionLoadMap contains the following 2 entries:
group=null
location=192.168.1.5:56224; load=0.0
group=__recv__group
location=192.168.1.5:5370; load=0.006246
   ```
    Update the load:
   
   Periodically, the server sends an updated load to the locator.
   ```
   [warn 2022/03/16 14:16:57.464 PDT locator-ln :41002 unshared ordered sender uid=5 dom #1 local 
port=45635 remote port=56270> tid=0x5e] XXX 
LocatorLoadSnapshot.updateConnectionLoadMap current load for 
location=192.168.1.5:5370; group=__recv__group; inputLoad=0.00625; 
currentLoad=0.006246
   
   [warn 2022/03/16 14:16:57.464 PDT locator-ln :41002 unshared ordered sender uid=5 dom #1 local 
port=45635 remote port=56270> tid=0x5e] XXX 
LocatorLoadSnapshot.updateConnectionLoadMap updated load for 
location=192.168.1.5:5370; group=__recv__group; inputLoad=0.00625; 
newLoad=0.00625
   
   [warn 2022/03/16 14:16:57.832 PDT locator-ln  tid=0x43] XXX 
LocatorLoadSnapshot.logConnectionLoadMap
   The connectionLoadMap contains the following 2 entries:
group=null
location=192.168.1.5:56224; load=0.0
group=__recv__group
location=192.168.1.5:5370; load=0.00625
   ```
    Update the load after ping connection has been made:
   
   After another connection is made, the load is updated again.
   ```
   [warn 2022/03/16 14:17:02.466 PDT locator-ln 

[GitHub] [geode] boglesby commented on pull request #7378: GEODE-10056: Improve gateway-receiver load balance

2022-03-17 Thread GitBox


boglesby commented on pull request #7378:
URL: https://github.com/apache/geode/pull/7378#issuecomment-1071176231


   I ran a few tests with some extra logging on these changes. They look good.
   
    The receiver exchanges profiles with the locator:
   ```
   [warn 2022/03/16 14:16:12.440 PDT locator-ln  tid=0x50] XXX LocatorLoadSnapshot.updateConnectionLoadMap 
location=192.168.1.5:5370; load=0.0
   
   [warn 2022/03/16 14:16:12.441 PDT locator-ln  tid=0x50] XXX LocatorLoadSnapshot.updateConnectionLoadMap current 
load for location=192.168.1.5:5370; group=__recv__group; inputLoad=0.0; 
currentLoad=0.0
   
   [warn 2022/03/16 14:16:12.441 PDT locator-ln  tid=0x50] XXX LocatorLoadSnapshot.updateConnectionLoadMap updated 
load for location=192.168.1.5:5370; group=__recv__group; inputLoad=0.0; 
newLoad=0.0
   ```
   The connectionLoadMap shows 2 groups, namely the null group (default) and 
the __recv__group group (gateway receiver), each with load=0.0:
   ```
   [warn 2022/03/16 14:16:13.777 PDT locator-ln  tid=0x43] XXX 
LocatorLoadSnapshot.logConnectionLoadMap
   The connectionLoadMap contains the following 2 entries:
group=null
location=192.168.1.5:56224; load=0.0
group=__recv__group
location=192.168.1.5:5370; load=0.0
   ```
    Sender connects to the receiver:
   
   With the default of 5 dispatcher threads, 5 connections are made to the 
receiver. The load goes from 0.0 to 0.006246:
   ```
   [warn 2022/03/16 14:16:53.836 PDT locator-ln  
tid=0x47] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.0
   
   [warn 2022/03/16 14:16:53.836 PDT locator-ln  
tid=0x47] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.00125
   
   
   [warn 2022/03/16 14:16:53.836 PDT locator-ln  
tid=0x5c] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.00125
   
   [warn 2022/03/16 14:16:53.836 PDT locator-ln  
tid=0x5c] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.0025
   
   
   [warn 2022/03/16 14:16:53.837 PDT locator-ln  
tid=0x5b] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.0025
   
   [warn 2022/03/16 14:16:53.837 PDT locator-ln  
tid=0x5b] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.00375
   
   
   [warn 2022/03/16 14:16:53.837 PDT locator-ln  
tid=0x5a] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.00375
   
   [warn 2022/03/16 14:16:53.837 PDT locator-ln  
tid=0x5a] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.005
   
   
   [warn 2022/03/16 14:16:53.838 PDT locator-ln  
tid=0x59] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadBeforeUpdate=0.005
   
   [warn 2022/03/16 14:16:53.838 PDT locator-ln  
tid=0x59] XXX LocatorLoadSnapshot.getServerForConnection group=__recv__group; 
server=192.168.1.5:5370; loadAfterUpdate=0.006246
   ```
   The connectionLoadMap shows the same 2 groups but now the __recv__group 
group load is 0.006246 for the gateway receiver:
   ```
   [warn 2022/03/16 14:16:55.831 PDT locator-ln  tid=0x43] XXX 
LocatorLoadSnapshot.logConnectionLoadMap
   The connectionLoadMap contains the following 2 entries:
group=null
location=192.168.1.5:56224; load=0.0
group=__recv__group
location=192.168.1.5:5370; load=0.006246
   ```
    Update the load:
   
   Periodically, the server sends an updated load to the locator.
   ```
   [warn 2022/03/16 14:16:57.464 PDT locator-ln :41002 unshared ordered sender uid=5 dom #1 local 
port=45635 remote port=56270> tid=0x5e] XXX 
LocatorLoadSnapshot.updateConnectionLoadMap current load for 
location=192.168.1.5:5370; group=__recv__group; inputLoad=0.00625; 
currentLoad=0.006246
   
   [warn 2022/03/16 14:16:57.464 PDT locator-ln :41002 unshared ordered sender uid=5 dom #1 local 
port=45635 remote port=56270> tid=0x5e] XXX 
LocatorLoadSnapshot.updateConnectionLoadMap updated load for 
location=192.168.1.5:5370; group=__recv__group; inputLoad=0.00625; 
newLoad=0.00625
   
   [warn 2022/03/16 14:16:57.832 PDT locator-ln  tid=0x43] XXX 
LocatorLoadSnapshot.logConnectionLoadMap
   The connectionLoadMap contains the following 2 entries:
group=null
location=192.168.1.5:56224; load=0.0
group=__recv__group
location=192.168.1.5:5370; load=0.00625
   ```
    Update the load after ping connection has been made:
   
   After another connection is made, the load is updated again.
   ```
   [warn 2022/03/16 14:17:02.466 PDT locator-ln 

[GitHub] [geode] boglesby commented on pull request #7378: GEODE-10056: Improve gateway-receiver load balance

2022-03-15 Thread GitBox


boglesby commented on pull request #7378:
URL: https://github.com/apache/geode/pull/7378#issuecomment-1068616384


   I'm not sure how to resolve the race condition you mention, but I see 
similar behavior with client/server connections.
   
   If a burst of connections is requested and none of those are made before the 
next load is received from the server, then the locator's load for that server 
gets reset back to zero.
   
   A burst of connections (10 in this case) causes the load to go from 0.0 to 
0.01248:
   ```
   [warn 2022/03/15 14:38:37.905 PDT locator  
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200):41001=LoadHolder[0.0,
 192.168.1.5:51249, loadPollInterval=5000, 0.00125]}
   
   [warn 2022/03/15 14:38:37.906 PDT locator  
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.0
   
   [warn 2022/03/15 14:38:37.907 PDT locator  
tid=0x24] XXX LoadHolder.incConnections location=192.168.1.5:51249; load=0.00125
   
   [warn 2022/03/15 14:38:37.907 PDT locator  
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadAfterUpdate=0.00125
   
   ...
   
   [warn 2022/03/15 14:38:38.005 PDT locator  
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200):41001=LoadHolder[0.01124,
 192.168.1.5:51249, loadPollInterval=5000, 0.00125]}
   
   [warn 2022/03/15 14:38:38.005 PDT locator  
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.01124
   
   [warn 2022/03/15 14:38:38.005 PDT locator  
tid=0x24] XXX LoadHolder.incConnections location=192.168.1.5:51249; 
load=0.01248
   
   [warn 2022/03/15 14:38:38.005 PDT locator  
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadAfterUpdate=0.01248
   ```
   If none of those connections are made before the next load is sent by that 
server, its load goes from 0.01248 to 0.0:
   ```
   [warn 2022/03/15 14:39:25.140 PDT locator :41001 unshared ordered sender uid=5 dom #1 local 
port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateLoad 
about to update connectionLoadMap location=192.168.1.5:51249; load=0.0; 
loadPerConnection=0.00125
   
   [warn 2022/03/15 14:39:25.140 PDT locator :41001 unshared ordered sender uid=5 dom #1 local 
port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateMap 
location=192.168.1.5:51249; loadBeforeUpdate=0.01248
   
   [warn 2022/03/15 14:39:25.141 PDT locator :41001 unshared ordered sender uid=5 dom #1 local 
port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateMap 
location=192.168.1.5:51249; loadAfterUpdate=0.0
   
   [warn 2022/03/15 14:39:25.141 PDT locator :41001 unshared ordered sender uid=5 dom #1 local 
port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateLoad done 
update connectionLoadMap location=192.168.1.5:51249
   ```
   The load for the next request starts is 0.0 again:
   ```
   [warn 2022/03/15 14:39:33.475 PDT locator  
tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection 
potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200):41001=LoadHolder[0.0,
 192.168.1.5:51249, loadPollInterval=5000, 0.00125]}
   
   [warn 2022/03/15 14:39:33.475 PDT locator  
tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.0
   
   [warn 2022/03/15 14:39:33.475 PDT locator  
tid=0x54] XXX LoadHolder.incConnections location=192.168.1.5:51249; load=0.00125
   
   [warn 2022/03/15 14:39:33.475 PDT locator  
tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadAfterUpdate=0.00125
   
   ...
   ```
   One thing to note is that the load is only sent load-poll-interval 
(default=5 seconds) if it has changed. If it hasn't changed then it only gets 
sent every update frequency (which is 10 * 5 seconds by default).
   
   There is a boolean to control that frequency too:
   ```
   private static final int FORCE_LOAD_UPDATE_FREQUENCY = getInteger(
 GeodeGlossary.GEMFIRE_PREFIX + "BridgeServer.FORCE_LOAD_UPDATE_FREQUENCY", 
10);
   ```
   The load-poll-interva is configurable, but currently only for the cache 
server not the gateway receiver. It probably wouldn't be too hard to add this 
support to gateway receiver.
   
   Also, there is a gfsh load-balance gateway-sender command that could help 
alleviate this condition.
   
   I'm still reviewing the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org