subject:"\[grpc\-io\] GRPC C\+\+ Question on best practices for Client handling of servers going up and down"

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront

I agree that it is strange about the different backoff time values. It 
seems values of 1000ms or less for the min backoff time cause the rpc calls 
to not reconnect to the server. Just did a quick test with 1100ms and that 
worked the 10 times I shut the server down and restarted it. Will see how 
robust it is with some testing. At this point we are over a year out of 
date on our version of grpc so we probably should update to a new version 
before doing much delving. A lot of work was done over that time.

Thanks!



On Wednesday, November 21, 2018 at 12:26:07 PM UTC-5, Robert Engels wrote:
>
> To be clear, if a server is down, there are only two reasons, it is 
> off/failed, or some networking condition does not allow traffic. In either 
> case a client cannot reasonably determine this in a fast manner, by 
> specification this can take a long time, especially if a recent connection 
> attempt was good. 
>
> What it can determine in a fast manner is if there is no process available 
> listening on the requested port on the remote machine, or that there is no 
> route available to the requested machine. 
>
> If you have lots of outages like the former, you are going to have issues. 
>
> On Nov 21, 2018, at 11:09 AM, justin.c...@ansys.com  wrote:
>
> I found a fix for my problem.
>
> Looked at the latest source and there was a test argument being used 
> "grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
> backoff times to that value. Found the source for 1.3.9 on a machine and it 
> was being used then as well. Setting that argument to 2000ms does what I 
> wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
> fail. 2000 seems to reconnect just fine. Once we update to a newer version 
> of grpc I can change that to set the min and max backoff times.
>
> On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>>
>> Sorry. Still if you forcibly remove the cable or hard shut the server, 
>> the client can’t tell if the server is down with out some sort of ping pong 
>> protocol with timeout. The tcp ip timeout is on the order of hours, and 
>> minutes if keep alive is set. 
>>
>> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com wrote:
>>
>> That must have been a different person :). I'm actually taking down the 
>> server and restarting it, no simulation of it.
>>
>> On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>>>
>>> I thought your original message said you were simulating the server 
>>> going down using iptables and causing packet loss?
>>>
>>> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
>>>
>>> I'm not sure I follow you on that one. I am taking the server up and 
>>> down myself. Everything works fine if I just make rpc calls on the client 
>>> and check the error codes. The problem was the 20 seconds blocking on 
>>> secondary rpc calls for the reconnect, which seems to be due to the backoff 
>>> algorithm. I was hoping to shrink that wait if possible to something 
>>> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
>>> still take the full 20 seconds when making an RPC call.
>>>
>>> Using GetState on the channel looked like it was going to get rid of the 
>>> blocking nature on a broken connection but the state of the channel doesn't 
>>> seem to change from transient failure once the server comes back up. Tried 
>>> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
>>> but those didn't seem to trigger a state change on the channel.
>>>
>>> Seems like the only way to trigger a state change on the channel is to 
>>> make an actual rpc call.
>>>
>>> I think the answer might just be update to a newer version of rpc and 
>>> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
>>> downloading the source and looking at how those variables are used :). 
>>>
>>>
>>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels 
>>> wrote:

 The other thing to keep in mind is that the way you are “forcing 
 failure” is error prone - the connection is valid as packets are making it 
 through. It is just that is will be very slow due to extreme packet loss. 
 I 
 am not sure this is considered a failure by gRPC. I think you would need 
 to 
 detect slow network connections and abort that server yourself. 

 On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:

 I do check the error code after each update and skip the rest of the 
 current iterations updates if a failure occurred.

 I could skip all updates for 20 seconds after an update but that seems 
 less than ideal.

 By server available I was using the GetState on the channel. The 
 problem I was running into was that if I only call GetState on the channel 
 to see if the server is around it "forever" stays in the state of 
 transient 
 failure (at least for 60 seconds). I

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels

To be clear, if a server is down, there are only two reasons, it is off/failed, 
or some networking condition does not allow traffic. In either case a client 
cannot reasonably determine this in a fast manner, by specification this can 
take a long time, especially if a recent connection attempt was good. 

What it can determine in a fast manner is if there is no process available 
listening on the requested port on the remote machine, or that there is no 
route available to the requested machine. 

If you have lots of outages like the former, you are going to have issues. 

> On Nov 21, 2018, at 11:09 AM, justin.cheuvr...@ansys.com wrote:
> 
> I found a fix for my problem.
> 
> Looked at the latest source and there was a test argument being used 
> "grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
> backoff times to that value. Found the source for 1.3.9 on a machine and it 
> was being used then as well. Setting that argument to 2000ms does what I 
> wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
> fail. 2000 seems to reconnect just fine. Once we update to a newer version of 
> grpc I can change that to set the min and max backoff times.
> 
>> On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>> Sorry. Still if you forcibly remove the cable or hard shut the server, the 
>> client can’t tell if the server is down with out some sort of ping pong 
>> protocol with timeout. The tcp ip timeout is on the order of hours, and 
>> minutes if keep alive is set. 
>> 
>>> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com wrote:
>>> 
>>> That must have been a different person :). I'm actually taking down the 
>>> server and restarting it, no simulation of it.
>>> 
 On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
 I thought your original message said you were simulating the server going 
 down using iptables and causing packet loss?
 
> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
> 
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on 
> secondary rpc calls for the reconnect, which seems to be due to the 
> backoff algorithm. I was hoping to shrink that wait if possible to 
> something smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 
> seemed to still take the full 20 seconds when making an RPC call.
> 
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel 
> doesn't seem to change from transient failure once the server comes back 
> up. Tried using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and 
> KEEPALIVE_PERMIT_WITHOUT_CALLS but those didn't seem to trigger a state 
> change on the channel.
> 
> Seems like the only way to trigger a state change on the channel is to 
> make an actual rpc call.
> 
> I think the answer might just be update to a newer version of rpc and 
> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
> downloading the source and looking at how those variables are used :). 
> 
> 
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels 
>> wrote:
>> The other thing to keep in mind is that the way you are “forcing 
>> failure” is error prone - the connection is valid as packets are making 
>> it through. It is just that is will be very slow due to extreme packet 
>> loss. I am not sure this is considered a failure by gRPC. I think you 
>> would need to detect slow network connections and abort that server 
>> yourself. 
>> 
>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>> 
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>> 
>>> I could skip all updates for 20 seconds after an update but that seems 
>>> less than ideal.
>>> 
>>> By server available I was using the GetState on the channel. The 
>>> problem I was running into was that if I only call GetState on the 
>>> channel to see if the server is around it "forever" stays in the state 
>>> of transient failure (at least for 60 seconds). I was expecting to see 
>>> a state change back to idle/ready after a bit.
>>> 
 On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels 
 wrote:
 You should track the err after each update, and if non-nil, just 
 return… why keep trying the further updates in that loop.
 
 It is also trivial too - to not even attempt the next loop if it has 
 been less than N ms since the last error.
 
 According to your pseudo

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels

I’m glad you got it working. Something doesn’t seem right though that going 
from 1 sec to 2 sec causes things to work... I would think that production 
anomalies could easily cause similar degradation, so I think this solution may 
be fragile... still, I’m still not sure I 100% get the problem you’re having 
but if you understand exactly why it works that’s good enough for me :)

> On Nov 21, 2018, at 11:09 AM, justin.cheuvr...@ansys.com wrote:
> 
> I found a fix for my problem.
> 
> Looked at the latest source and there was a test argument being used 
> "grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
> backoff times to that value. Found the source for 1.3.9 on a machine and it 
> was being used then as well. Setting that argument to 2000ms does what I 
> wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
> fail. 2000 seems to reconnect just fine. Once we update to a newer version of 
> grpc I can change that to set the min and max backoff times.
> 
>> On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>> Sorry. Still if you forcibly remove the cable or hard shut the server, the 
>> client can’t tell if the server is down with out some sort of ping pong 
>> protocol with timeout. The tcp ip timeout is on the order of hours, and 
>> minutes if keep alive is set. 
>> 
>>> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com wrote:
>>> 
>>> That must have been a different person :). I'm actually taking down the 
>>> server and restarting it, no simulation of it.
>>> 
 On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
 I thought your original message said you were simulating the server going 
 down using iptables and causing packet loss?
 
> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
> 
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on 
> secondary rpc calls for the reconnect, which seems to be due to the 
> backoff algorithm. I was hoping to shrink that wait if possible to 
> something smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 
> seemed to still take the full 20 seconds when making an RPC call.
> 
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel 
> doesn't seem to change from transient failure once the server comes back 
> up. Tried using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and 
> KEEPALIVE_PERMIT_WITHOUT_CALLS but those didn't seem to trigger a state 
> change on the channel.
> 
> Seems like the only way to trigger a state change on the channel is to 
> make an actual rpc call.
> 
> I think the answer might just be update to a newer version of rpc and 
> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
> downloading the source and looking at how those variables are used :). 
> 
> 
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels 
>> wrote:
>> The other thing to keep in mind is that the way you are “forcing 
>> failure” is error prone - the connection is valid as packets are making 
>> it through. It is just that is will be very slow due to extreme packet 
>> loss. I am not sure this is considered a failure by gRPC. I think you 
>> would need to detect slow network connections and abort that server 
>> yourself. 
>> 
>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>> 
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>> 
>>> I could skip all updates for 20 seconds after an update but that seems 
>>> less than ideal.
>>> 
>>> By server available I was using the GetState on the channel. The 
>>> problem I was running into was that if I only call GetState on the 
>>> channel to see if the server is around it "forever" stays in the state 
>>> of transient failure (at least for 60 seconds). I was expecting to see 
>>> a state change back to idle/ready after a bit.
>>> 
 On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels 
 wrote:
 You should track the err after each update, and if non-nil, just 
 return… why keep trying the further updates in that loop.
 
 It is also trivial too - to not even attempt the next loop if it has 
 been less than N ms since the last error.
 
 According to your pseudo code, you already have the ‘server available’ 
 status.
 
> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
> 
> GRPC Version: 1.3.9
> Platform:

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront

I found a fix for my problem.

Looked at the latest source and there was a test argument being used 
"grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
backoff times to that value. Found the source for 1.3.9 on a machine and it 
was being used then as well. Setting that argument to 2000ms does what I 
wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
fail. 2000 seems to reconnect just fine. Once we update to a newer version 
of grpc I can change that to set the min and max backoff times.

On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>
> Sorry. Still if you forcibly remove the cable or hard shut the server, the 
> client can’t tell if the server is down with out some sort of ping pong 
> protocol with timeout. The tcp ip timeout is on the order of hours, and 
> minutes if keep alive is set. 
>
> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com  wrote:
>
> That must have been a different person :). I'm actually taking down the 
> server and restarting it, no simulation of it.
>
> On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>>
>> I thought your original message said you were simulating the server going 
>> down using iptables and causing packet loss?
>>
>> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
>>
>> I'm not sure I follow you on that one. I am taking the server up and down 
>> myself. Everything works fine if I just make rpc calls on the client and 
>> check the error codes. The problem was the 20 seconds blocking on secondary 
>> rpc calls for the reconnect, which seems to be due to the backoff 
>> algorithm. I was hoping to shrink that wait if possible to something 
>> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
>> still take the full 20 seconds when making an RPC call.
>>
>> Using GetState on the channel looked like it was going to get rid of the 
>> blocking nature on a broken connection but the state of the channel doesn't 
>> seem to change from transient failure once the server comes back up. Tried 
>> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
>> but those didn't seem to trigger a state change on the channel.
>>
>> Seems like the only way to trigger a state change on the channel is to 
>> make an actual rpc call.
>>
>> I think the answer might just be update to a newer version of rpc and 
>> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
>> downloading the source and looking at how those variables are used :). 
>>
>>
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>>>
>>> The other thing to keep in mind is that the way you are “forcing 
>>> failure” is error prone - the connection is valid as packets are making it 
>>> through. It is just that is will be very slow due to extreme packet loss. I 
>>> am not sure this is considered a failure by gRPC. I think you would need to 
>>> detect slow network connections and abort that server yourself. 
>>>
>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>>
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>>
>>> I could skip all updates for 20 seconds after an update but that seems 
>>> less than ideal.
>>>
>>> By server available I was using the GetState on the channel. The problem 
>>> I was running into was that if I only call GetState on the channel to see 
>>> if the server is around it "forever" stays in the state of transient 
>>> failure (at least for 60 seconds). I was expecting to see a state change 
>>> back to idle/ready after a bit.
>>>
>>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:

 You should track the err after each update, and if non-nil, just 
 return… why keep trying the further updates in that loop.

 It is also trivial too - to not even attempt the next loop if it has 
 been less than N ms since the last error.

 According to your pseudo code, you already have the ‘server available’ 
 status.

 On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:

 GRPC Version: 1.3.9
 Platform: Windows

 I'm working on a prototype application that periodically calculates 
 data and then in a multi-step process pushes the data to a server. The 
 design is that the server doesn't need to be up or can go down mid 
 process. 
 The client will not block (or block as little as possible) between updates 
 if there is problem pushing data.

 A simple model for the client would be:
 Loop Until Done
 {
  Calculate Data
  If Server Available and No Error Begin Update
  If Server Available and No Error UpdateX (Optional)
  If Server Available and No Error UpdateY (Optional)
  If Server Available and No Error UpdateZ (Optional)
  If Server Available and No Error End Update

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels

Sorry. Still if you forcibly remove the cable or hard shut the server, the 
client can’t tell if the server is down with out some sort of ping pong 
protocol with timeout. The tcp ip timeout is on the order of hours, and minutes 
if keep alive is set. 

> On Nov 21, 2018, at 10:20 AM, justin.cheuvr...@ansys.com wrote:
> 
> That must have been a different person :). I'm actually taking down the 
> server and restarting it, no simulation of it.
> 
>> On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>> I thought your original message said you were simulating the server going 
>> down using iptables and causing packet loss?
>> 
>>> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
>>> 
>>> I'm not sure I follow you on that one. I am taking the server up and down 
>>> myself. Everything works fine if I just make rpc calls on the client and 
>>> check the error codes. The problem was the 20 seconds blocking on secondary 
>>> rpc calls for the reconnect, which seems to be due to the backoff 
>>> algorithm. I was hoping to shrink that wait if possible to something 
>>> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
>>> still take the full 20 seconds when making an RPC call.
>>> 
>>> Using GetState on the channel looked like it was going to get rid of the 
>>> blocking nature on a broken connection but the state of the channel doesn't 
>>> seem to change from transient failure once the server comes back up. Tried 
>>> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
>>> but those didn't seem to trigger a state change on the channel.
>>> 
>>> Seems like the only way to trigger a state change on the channel is to make 
>>> an actual rpc call.
>>> 
>>> I think the answer might just be update to a newer version of rpc and look 
>>> at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
>>> downloading the source and looking at how those variables are used :). 
>>> 
>>> 
 On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
 The other thing to keep in mind is that the way you are “forcing failure” 
 is error prone - the connection is valid as packets are making it through. 
 It is just that is will be very slow due to extreme packet loss. I am not 
 sure this is considered a failure by gRPC. I think you would need to 
 detect slow network connections and abort that server yourself. 
 
> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
> 
> I do check the error code after each update and skip the rest of the 
> current iterations updates if a failure occurred.
> 
> I could skip all updates for 20 seconds after an update but that seems 
> less than ideal.
> 
> By server available I was using the GetState on the channel. The problem 
> I was running into was that if I only call GetState on the channel to see 
> if the server is around it "forever" stays in the state of transient 
> failure (at least for 60 seconds). I was expecting to see a state change 
> back to idle/ready after a bit.
> 
>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>> You should track the err after each update, and if non-nil, just return… 
>> why keep trying the further updates in that loop.
>> 
>> It is also trivial too - to not even attempt the next loop if it has 
>> been less than N ms since the last error.
>> 
>> According to your pseudo code, you already have the ‘server available’ 
>> status.
>> 
>>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
>>> 
>>> GRPC Version: 1.3.9
>>> Platform: Windows
>>> 
>>> I'm working on a prototype application that periodically calculates 
>>> data and then in a multi-step process pushes the data to a server. The 
>>> design is that the server doesn't need to be up or can go down mid 
>>> process. The client will not block (or block as little as possible) 
>>> between updates if there is problem pushing data.
>>> 
>>> A simple model for the client would be:
>>> Loop Until Done
>>> {
>>>  Calculate Data
>>>  If Server Available and No Error Begin Update
>>>  If Server Available and No Error UpdateX (Optional)
>>>  If Server Available and No Error UpdateY (Optional)
>>>  If Server Available and No Error UpdateZ (Optional)
>>>  If Server Available and No Error End Update
>>> }
>>> 
>>> The client doesn't care if the server is available but if it is should 
>>> push data, if any errors skip everything else until next update.
>>> 
>>> The problem is that if I make an call on the client (and the server 
>>> isn't available) the first fails very quickly (~1sec) and the rest take 
>>> a "long" time, ~20sec. It looks like this is due to the reconnect 
>>> backoff time. I tried setting the

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront

That must have been a different person :). I'm actually taking down the 
server and restarting it, no simulation of it.

On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>
> I thought your original message said you were simulating the server going 
> down using iptables and causing packet loss?
>
> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com  wrote:
>
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on secondary 
> rpc calls for the reconnect, which seems to be due to the backoff 
> algorithm. I was hoping to shrink that wait if possible to something 
> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
> still take the full 20 seconds when making an RPC call.
>
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel doesn't 
> seem to change from transient failure once the server comes back up. Tried 
> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
> but those didn't seem to trigger a state change on the channel.
>
> Seems like the only way to trigger a state change on the channel is to 
> make an actual rpc call.
>
> I think the answer might just be update to a newer version of rpc and look 
> at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
> downloading the source and looking at how those variables are used :). 
>
>
> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>>
>> The other thing to keep in mind is that the way you are “forcing failure” 
>> is error prone - the connection is valid as packets are making it through. 
>> It is just that is will be very slow due to extreme packet loss. I am not 
>> sure this is considered a failure by gRPC. I think you would need to detect 
>> slow network connections and abort that server yourself. 
>>
>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>
>> I do check the error code after each update and skip the rest of the 
>> current iterations updates if a failure occurred.
>>
>> I could skip all updates for 20 seconds after an update but that seems 
>> less than ideal.
>>
>> By server available I was using the GetState on the channel. The problem 
>> I was running into was that if I only call GetState on the channel to see 
>> if the server is around it "forever" stays in the state of transient 
>> failure (at least for 60 seconds). I was expecting to see a state change 
>> back to idle/ready after a bit.
>>
>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>>>
>>> You should track the err after each update, and if non-nil, just return… 
>>> why keep trying the further updates in that loop.
>>>
>>> It is also trivial too - to not even attempt the next loop if it has 
>>> been less than N ms since the last error.
>>>
>>> According to your pseudo code, you already have the ‘server available’ 
>>> status.
>>>
>>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
>>>
>>> GRPC Version: 1.3.9
>>> Platform: Windows
>>>
>>> I'm working on a prototype application that periodically calculates data 
>>> and then in a multi-step process pushes the data to a server. The design is 
>>> that the server doesn't need to be up or can go down mid process. The 
>>> client will not block (or block as little as possible) between updates if 
>>> there is problem pushing data.
>>>
>>> A simple model for the client would be:
>>> Loop Until Done
>>> {
>>>  Calculate Data
>>>  If Server Available and No Error Begin Update
>>>  If Server Available and No Error UpdateX (Optional)
>>>  If Server Available and No Error UpdateY (Optional)
>>>  If Server Available and No Error UpdateZ (Optional)
>>>  If Server Available and No Error End Update
>>> }
>>>
>>> The client doesn't care if the server is available but if it is should 
>>> push data, if any errors skip everything else until next update.
>>>
>>> The problem is that if I make an call on the client (and the server 
>>> isn't available) the first fails very quickly (~1sec) and the rest take a 
>>> "long" time, ~20sec. It looks like this is due to the reconnect backoff 
>>> time. I tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel 
>>> args to a lower value (2000) but that didn't have any positive affect.
>>>
>>> I tried using GetState(true) on the channel to determine if we need to 
>>> skip an update. This call fails very quickly but never seems to get out of 
>>> the transient failure state after the server was started (waited for over 
>>> 60 seconds). On the documentation it looked like the param for GetState 
>>> only affects if the channel was in the idle state to attempt a reconnect.
>>>
>>> What is the best way to achieve the functionality we'd like?
>>>
>>> I noticed there was a new

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels

I thought your original message said you were simulating the server going down 
using iptables and causing packet loss?

> On Nov 21, 2018, at 9:52 AM, justin.cheuvr...@ansys.com wrote:
> 
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on secondary 
> rpc calls for the reconnect, which seems to be due to the backoff algorithm. 
> I was hoping to shrink that wait if possible to something smaller. Setting 
> the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to still take the full 
> 20 seconds when making an RPC call.
> 
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel doesn't 
> seem to change from transient failure once the server comes back up. Tried 
> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
> but those didn't seem to trigger a state change on the channel.
> 
> Seems like the only way to trigger a state change on the channel is to make 
> an actual rpc call.
> 
> I think the answer might just be update to a newer version of rpc and look at 
> using the MIN_RECONNECT_BACKOFF channel arg setting and probably downloading 
> the source and looking at how those variables are used :). 
> 
> 
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>> The other thing to keep in mind is that the way you are “forcing failure” is 
>> error prone - the connection is valid as packets are making it through. It 
>> is just that is will be very slow due to extreme packet loss. I am not sure 
>> this is considered a failure by gRPC. I think you would need to detect slow 
>> network connections and abort that server yourself. 
>> 
>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>> 
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>> 
>>> I could skip all updates for 20 seconds after an update but that seems less 
>>> than ideal.
>>> 
>>> By server available I was using the GetState on the channel. The problem I 
>>> was running into was that if I only call GetState on the channel to see if 
>>> the server is around it "forever" stays in the state of transient failure 
>>> (at least for 60 seconds). I was expecting to see a state change back to 
>>> idle/ready after a bit.
>>> 
 On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
 You should track the err after each update, and if non-nil, just return… 
 why keep trying the further updates in that loop.
 
 It is also trivial too - to not even attempt the next loop if it has been 
 less than N ms since the last error.
 
 According to your pseudo code, you already have the ‘server available’ 
 status.
 
> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
> 
> GRPC Version: 1.3.9
> Platform: Windows
> 
> I'm working on a prototype application that periodically calculates data 
> and then in a multi-step process pushes the data to a server. The design 
> is that the server doesn't need to be up or can go down mid process. The 
> client will not block (or block as little as possible) between updates if 
> there is problem pushing data.
> 
> A simple model for the client would be:
> Loop Until Done
> {
>  Calculate Data
>  If Server Available and No Error Begin Update
>  If Server Available and No Error UpdateX (Optional)
>  If Server Available and No Error UpdateY (Optional)
>  If Server Available and No Error UpdateZ (Optional)
>  If Server Available and No Error End Update
> }
> 
> The client doesn't care if the server is available but if it is should 
> push data, if any errors skip everything else until next update.
> 
> The problem is that if I make an call on the client (and the server isn't 
> available) the first fails very quickly (~1sec) and the rest take a 
> "long" time, ~20sec. It looks like this is due to the reconnect backoff 
> time. I tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the 
> channel args to a lower value (2000) but that didn't have any positive 
> affect.
> 
> I tried using GetState(true) on the channel to determine if we need to 
> skip an update. This call fails very quickly but never seems to get out 
> of the transient failure state after the server was started (waited for 
> over 60 seconds). On the documentation it looked like the param for 
> GetState only affects if the channel was in the idle state to attempt a 
> reconnect.
> 
> What is the best way to achieve the functionality we'd like?
> 
> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added 
> in a

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront

I'm not sure I follow you on that one. I am taking the server up and down 
myself. Everything works fine if I just make rpc calls on the client and 
check the error codes. The problem was the 20 seconds blocking on secondary 
rpc calls for the reconnect, which seems to be due to the backoff 
algorithm. I was hoping to shrink that wait if possible to something 
smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
still take the full 20 seconds when making an RPC call.

Using GetState on the channel looked like it was going to get rid of the 
blocking nature on a broken connection but the state of the channel doesn't 
seem to change from transient failure once the server comes back up. Tried 
using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
but those didn't seem to trigger a state change on the channel.

Seems like the only way to trigger a state change on the channel is to make 
an actual rpc call.

I think the answer might just be update to a newer version of rpc and look 
at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
downloading the source and looking at how those variables are used :). 


On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>
> The other thing to keep in mind is that the way you are “forcing failure” 
> is error prone - the connection is valid as packets are making it through. 
> It is just that is will be very slow due to extreme packet loss. I am not 
> sure this is considered a failure by gRPC. I think you would need to detect 
> slow network connections and abort that server yourself. 
>
> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com  wrote:
>
> I do check the error code after each update and skip the rest of the 
> current iterations updates if a failure occurred.
>
> I could skip all updates for 20 seconds after an update but that seems 
> less than ideal.
>
> By server available I was using the GetState on the channel. The problem I 
> was running into was that if I only call GetState on the channel to see if 
> the server is around it "forever" stays in the state of transient failure 
> (at least for 60 seconds). I was expecting to see a state change back to 
> idle/ready after a bit.
>
> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>>
>> You should track the err after each update, and if non-nil, just return… 
>> why keep trying the further updates in that loop.
>>
>> It is also trivial too - to not even attempt the next loop if it has been 
>> less than N ms since the last error.
>>
>> According to your pseudo code, you already have the ‘server available’ 
>> status.
>>
>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
>>
>> GRPC Version: 1.3.9
>> Platform: Windows
>>
>> I'm working on a prototype application that periodically calculates data 
>> and then in a multi-step process pushes the data to a server. The design is 
>> that the server doesn't need to be up or can go down mid process. The 
>> client will not block (or block as little as possible) between updates if 
>> there is problem pushing data.
>>
>> A simple model for the client would be:
>> Loop Until Done
>> {
>>  Calculate Data
>>  If Server Available and No Error Begin Update
>>  If Server Available and No Error UpdateX (Optional)
>>  If Server Available and No Error UpdateY (Optional)
>>  If Server Available and No Error UpdateZ (Optional)
>>  If Server Available and No Error End Update
>> }
>>
>> The client doesn't care if the server is available but if it is should 
>> push data, if any errors skip everything else until next update.
>>
>> The problem is that if I make an call on the client (and the server isn't 
>> available) the first fails very quickly (~1sec) and the rest take a "long" 
>> time, ~20sec. It looks like this is due to the reconnect backoff time. I 
>> tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to 
>> a lower value (2000) but that didn't have any positive affect.
>>
>> I tried using GetState(true) on the channel to determine if we need to 
>> skip an update. This call fails very quickly but never seems to get out of 
>> the transient failure state after the server was started (waited for over 
>> 60 seconds). On the documentation it looked like the param for GetState 
>> only affects if the channel was in the idle state to attempt a reconnect.
>>
>> What is the best way to achieve the functionality we'd like?
>>
>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added 
>> in a later version of grpc, would that cause the grpc call to "fail fast" 
>> if I upgraded and set that to a low value ~1sec?
>>
>> Is there a better way to handle this situation in general?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to grpc-io+u...@googlegroups.com.
>> To post to this group,

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels

The other thing to keep in mind is that the way you are “forcing failure” is 
error prone - the connection is valid as packets are making it through. It is 
just that is will be very slow due to extreme packet loss. I am not sure this 
is considered a failure by gRPC. I think you would need to detect slow network 
connections and abort that server yourself. 

> On Nov 21, 2018, at 9:12 AM, justin.cheuvr...@ansys.com wrote:
> 
> I do check the error code after each update and skip the rest of the current 
> iterations updates if a failure occurred.
> 
> I could skip all updates for 20 seconds after an update but that seems less 
> than ideal.
> 
> By server available I was using the GetState on the channel. The problem I 
> was running into was that if I only call GetState on the channel to see if 
> the server is around it "forever" stays in the state of transient failure (at 
> least for 60 seconds). I was expecting to see a state change back to 
> idle/ready after a bit.
> 
>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>> You should track the err after each update, and if non-nil, just return… why 
>> keep trying the further updates in that loop.
>> 
>> It is also trivial too - to not even attempt the next loop if it has been 
>> less than N ms since the last error.
>> 
>> According to your pseudo code, you already have the ‘server available’ 
>> status.
>> 
>>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
>>> 
>>> GRPC Version: 1.3.9
>>> Platform: Windows
>>> 
>>> I'm working on a prototype application that periodically calculates data 
>>> and then in a multi-step process pushes the data to a server. The design is 
>>> that the server doesn't need to be up or can go down mid process. The 
>>> client will not block (or block as little as possible) between updates if 
>>> there is problem pushing data.
>>> 
>>> A simple model for the client would be:
>>> Loop Until Done
>>> {
>>>  Calculate Data
>>>  If Server Available and No Error Begin Update
>>>  If Server Available and No Error UpdateX (Optional)
>>>  If Server Available and No Error UpdateY (Optional)
>>>  If Server Available and No Error UpdateZ (Optional)
>>>  If Server Available and No Error End Update
>>> }
>>> 
>>> The client doesn't care if the server is available but if it is should push 
>>> data, if any errors skip everything else until next update.
>>> 
>>> The problem is that if I make an call on the client (and the server isn't 
>>> available) the first fails very quickly (~1sec) and the rest take a "long" 
>>> time, ~20sec. It looks like this is due to the reconnect backoff time. I 
>>> tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to 
>>> a lower value (2000) but that didn't have any positive affect.
>>> 
>>> I tried using GetState(true) on the channel to determine if we need to skip 
>>> an update. This call fails very quickly but never seems to get out of the 
>>> transient failure state after the server was started (waited for over 60 
>>> seconds). On the documentation it looked like the param for GetState only 
>>> affects if the channel was in the idle state to attempt a reconnect.
>>> 
>>> What is the best way to achieve the functionality we'd like?
>>> 
>>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added in 
>>> a later version of grpc, would that cause the grpc call to "fail fast" if I 
>>> upgraded and set that to a low value ~1sec?
>>> 
>>> Is there a better way to handle this situation in general?
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "grpc.io" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to grpc-io+u...@googlegroups.com.
>>> To post to this group, send email to grp...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/grpc-io.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront

I do check the error code after each update and skip the rest of the 
current iterations updates if a failure occurred.

I could skip all updates for 20 seconds after an update but that seems less 
than ideal.

By server available I was using the GetState on the channel. The problem I 
was running into was that if I only call GetState on the channel to see if 
the server is around it "forever" stays in the state of transient failure 
(at least for 60 seconds). I was expecting to see a state change back to 
idle/ready after a bit.

On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>
> You should track the err after each update, and if non-nil, just return… 
> why keep trying the further updates in that loop.
>
> It is also trivial too - to not even attempt the next loop if it has been 
> less than N ms since the last error.
>
> According to your pseudo code, you already have the ‘server available’ 
> status.
>
> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com  wrote:
>
> GRPC Version: 1.3.9
> Platform: Windows
>
> I'm working on a prototype application that periodically calculates data 
> and then in a multi-step process pushes the data to a server. The design is 
> that the server doesn't need to be up or can go down mid process. The 
> client will not block (or block as little as possible) between updates if 
> there is problem pushing data.
>
> A simple model for the client would be:
> Loop Until Done
> {
>  Calculate Data
>  If Server Available and No Error Begin Update
>  If Server Available and No Error UpdateX (Optional)
>  If Server Available and No Error UpdateY (Optional)
>  If Server Available and No Error UpdateZ (Optional)
>  If Server Available and No Error End Update
> }
>
> The client doesn't care if the server is available but if it is should 
> push data, if any errors skip everything else until next update.
>
> The problem is that if I make an call on the client (and the server isn't 
> available) the first fails very quickly (~1sec) and the rest take a "long" 
> time, ~20sec. It looks like this is due to the reconnect backoff time. I 
> tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to 
> a lower value (2000) but that didn't have any positive affect.
>
> I tried using GetState(true) on the channel to determine if we need to 
> skip an update. This call fails very quickly but never seems to get out of 
> the transient failure state after the server was started (waited for over 
> 60 seconds). On the documentation it looked like the param for GetState 
> only affects if the channel was in the idle state to attempt a reconnect.
>
> What is the best way to achieve the functionality we'd like?
>
> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added 
> in a later version of grpc, would that cause the grpc call to "fail fast" 
> if I upgraded and set that to a low value ~1sec?
>
> Is there a better way to handle this situation in general?
>
> -- 
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+u...@googlegroups.com .
> To post to this group, send email to grp...@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-20 Thread robert engels

You should track the err after each update, and if non-nil, just return… why 
keep trying the further updates in that loop.

It is also trivial too - to not even attempt the next loop if it has been less 
than N ms since the last error.

According to your pseudo code, you already have the ‘server available’ status.

> On Nov 20, 2018, at 9:22 PM, justin.cheuvr...@ansys.com wrote:
> 
> GRPC Version: 1.3.9
> Platform: Windows
> 
> I'm working on a prototype application that periodically calculates data and 
> then in a multi-step process pushes the data to a server. The design is that 
> the server doesn't need to be up or can go down mid process. The client will 
> not block (or block as little as possible) between updates if there is 
> problem pushing data.
> 
> A simple model for the client would be:
> Loop Until Done
> {
>  Calculate Data
>  If Server Available and No Error Begin Update
>  If Server Available and No Error UpdateX (Optional)
>  If Server Available and No Error UpdateY (Optional)
>  If Server Available and No Error UpdateZ (Optional)
>  If Server Available and No Error End Update
> }
> 
> The client doesn't care if the server is available but if it is should push 
> data, if any errors skip everything else until next update.
> 
> The problem is that if I make an call on the client (and the server isn't 
> available) the first fails very quickly (~1sec) and the rest take a "long" 
> time, ~20sec. It looks like this is due to the reconnect backoff time. I 
> tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to a 
> lower value (2000) but that didn't have any positive affect.
> 
> I tried using GetState(true) on the channel to determine if we need to skip 
> an update. This call fails very quickly but never seems to get out of the 
> transient failure state after the server was started (waited for over 60 
> seconds). On the documentation it looked like the param for GetState only 
> affects if the channel was in the idle state to attempt a reconnect.
> 
> What is the best way to achieve the functionality we'd like?
> 
> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added in a 
> later version of grpc, would that cause the grpc call to "fail fast" if I 
> upgraded and set that to a low value ~1sec?
> 
> Is there a better way to handle this situation in general?
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+unsubscr...@googlegroups.com 
> .
> To post to this group, send email to grpc-io@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/grpc-io 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/A62C8426-45F1-46E6-A7FC-07FA640CC17D%40earthlink.net.
For more options, visit https://groups.google.com/d/optout.

[grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-20 Thread justin . cheuvront

GRPC Version: 1.3.9
Platform: Windows

I'm working on a prototype application that periodically calculates data
and then in a multi-step process pushes the data to a server. The design is
that the server doesn't need to be up or can go down mid process. The
client will not block (or block as little as possible) between updates if
there is problem pushing data.

A simple model for the client would be:
Loop Until Done
{
Calculate Data
If Server Available and No Error Begin Update
If Server Available and No Error UpdateX (Optional)
If Server Available and No Error UpdateY (Optional)
If Server Available and No Error UpdateZ (Optional)
If Server Available and No Error End Update
}

The client doesn't care if the server is available but if it is should push
data, if any errors skip everything else until next update.

The problem is that if I make an call on the client (and the server isn't
available) the first fails very quickly (~1sec) and the rest take a "long"
time, ~20sec. It looks like this is due to the reconnect backoff time. I
tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to
a lower value (2000) but that didn't have any positive affect.

I tried using GetState(true) on the channel to determine if we need to skip
an update. This call fails very quickly but never seems to get out of the
transient failure state after the server was started (waited for over 60
seconds). On the documentation it looked like the param for GetState only
affects if the channel was in the idle state to attempt a reconnect.

What is the best way to achieve the functionality we'd like?

I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added in
a later version of grpc, would that cause the grpc call to "fail fast" if I
upgraded and set that to a low value ~1sec?

Is there a better way to handle this situation in general?

--
You received this message because you are subscribed to the Google Groups
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit
https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

[grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

12 matches

Site Navigation

Mail list logo

Footer information