[grpc-io] Re: gRPC C++ stream throughput performance significantly slower on windows vs. linux

2018-11-21 Thread sujit2405
Hi Justin,

We are new to gRPC(just started looking into yesterday) and were looking 
for something similar in linux. Its great to see that you have achieved  
1.3 GB/sec throughput. Could you please let know what are all the things 
considered to get this ? When you say 1.3 GB/sec , is it 1.3 GB of data 
being sent and received over the network by splitting the large data into 
smaller messages(chunks) and sending them in parallel? Are there any sample 
code related to data transfer we can start with? Any help will be greatly 
appreciable. 

Thanks & Regards
Sujit

On Thursday, 23 August 2018 07:25:05 UTC+5:30, justin.c...@gmail.com wrote:
>
> Background:
>
> Machine: ~3.0Ghz, 8 cores (4 logical), 32.0 GB RAM
>
> I was looking into grpc performance on large amounts of data to see if it 
> was viable for our use, data size could be over 10GB. The basic payload 
> would just be an array of floats. Using a synchronous server/client and 
> streams on linux I was able to get around 1.3GB/s throughput on a message. 
> This was by streaming the data in ~200-300KB chunks. When the chunks go 
> above 1MB the throughput starts to slow down, < 100KB chunks start to 
> greatly slow down as well, < 1MB seemed to be a good sweet spot. Sending 
> large non-streamed messages was much lower < 500MB/s, so streams seemed the 
> way to go.
>
> Tried the same tests that yielded ~1.3GB/s (on linux) on windows (win10). 
> Those same tests achieved ~300MB/s on windows.
>
> Question:
>
> Is there a good way to increase performance on windows (or just in 
> general) for large streamed messages? We like some of the benefits of 
> grpc/protobufs, especially the ability to just send a client a proto file 
> so they can write their own client in their choice of language. I was 
> expecting a decrease in performance on windows but not by that magnitude. 
> We aren't looking at changing the underpinnings of gRPC for this project. 
> Mainly looking at if there are some good ways to increase performance of 
> streams on windows (particularly on the server side).
>
> We have plenty of other options to get optimal data rate transfers but 
> were hoping we could use gRPC out of the box so we could hand a client a 
> proto file and they could handle the "rest".
>
> Very new to gRPC/protobufs so I could be missing a lot of things so I 
> might be missing some crucial information.
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/5ae53b98-6b0d-4e8f-91f6-55a4aa970002%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront
I agree that it is strange about the different backoff time values. It 
seems values of 1000ms or less for the min backoff time cause the rpc calls 
to not reconnect to the server. Just did a quick test with 1100ms and that 
worked the 10 times I shut the server down and restarted it. Will see how 
robust it is with some testing. At this point we are over a year out of 
date on our version of grpc so we probably should update to a new version 
before doing much delving. A lot of work was done over that time.

Thanks!



On Wednesday, November 21, 2018 at 12:26:07 PM UTC-5, Robert Engels wrote:
>
> To be clear, if a server is down, there are only two reasons, it is 
> off/failed, or some networking condition does not allow traffic. In either 
> case a client cannot reasonably determine this in a fast manner, by 
> specification this can take a long time, especially if a recent connection 
> attempt was good. 
>
> What it can determine in a fast manner is if there is no process available 
> listening on the requested port on the remote machine, or that there is no 
> route available to the requested machine. 
>
> If you have lots of outages like the former, you are going to have issues. 
>
> On Nov 21, 2018, at 11:09 AM, justin.c...@ansys.com  wrote:
>
> I found a fix for my problem.
>
> Looked at the latest source and there was a test argument being used 
> "grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
> backoff times to that value. Found the source for 1.3.9 on a machine and it 
> was being used then as well. Setting that argument to 2000ms does what I 
> wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
> fail. 2000 seems to reconnect just fine. Once we update to a newer version 
> of grpc I can change that to set the min and max backoff times.
>
> On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>>
>> Sorry. Still if you forcibly remove the cable or hard shut the server, 
>> the client can’t tell if the server is down with out some sort of ping pong 
>> protocol with timeout. The tcp ip timeout is on the order of hours, and 
>> minutes if keep alive is set. 
>>
>> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com wrote:
>>
>> That must have been a different person :). I'm actually taking down the 
>> server and restarting it, no simulation of it.
>>
>> On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>>>
>>> I thought your original message said you were simulating the server 
>>> going down using iptables and causing packet loss?
>>>
>>> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
>>>
>>> I'm not sure I follow you on that one. I am taking the server up and 
>>> down myself. Everything works fine if I just make rpc calls on the client 
>>> and check the error codes. The problem was the 20 seconds blocking on 
>>> secondary rpc calls for the reconnect, which seems to be due to the backoff 
>>> algorithm. I was hoping to shrink that wait if possible to something 
>>> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
>>> still take the full 20 seconds when making an RPC call.
>>>
>>> Using GetState on the channel looked like it was going to get rid of the 
>>> blocking nature on a broken connection but the state of the channel doesn't 
>>> seem to change from transient failure once the server comes back up. Tried 
>>> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
>>> but those didn't seem to trigger a state change on the channel.
>>>
>>> Seems like the only way to trigger a state change on the channel is to 
>>> make an actual rpc call.
>>>
>>> I think the answer might just be update to a newer version of rpc and 
>>> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
>>> downloading the source and looking at how those variables are used :). 
>>>
>>>
>>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels 
>>> wrote:

 The other thing to keep in mind is that the way you are “forcing 
 failure” is error prone - the connection is valid as packets are making it 
 through. It is just that is will be very slow due to extreme packet loss. 
 I 
 am not sure this is considered a failure by gRPC. I think you would need 
 to 
 detect slow network connections and abort that server yourself. 

 On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:

 I do check the error code after each update and skip the rest of the 
 current iterations updates if a failure occurred.

 I could skip all updates for 20 seconds after an update but that seems 
 less than ideal.

 By server available I was using the GetState on the channel. The 
 problem I was running into was that if I only call GetState on the channel 
 to see if the server is around it "forever" stays in the state of 
 transient 
 failure (at least for 60 seconds). I 

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels
To be clear, if a server is down, there are only two reasons, it is off/failed, 
or some networking condition does not allow traffic. In either case a client 
cannot reasonably determine this in a fast manner, by specification this can 
take a long time, especially if a recent connection attempt was good. 

What it can determine in a fast manner is if there is no process available 
listening on the requested port on the remote machine, or that there is no 
route available to the requested machine. 

If you have lots of outages like the former, you are going to have issues. 

> On Nov 21, 2018, at 11:09 AM, justin.cheuvr...@ansys.com wrote:
> 
> I found a fix for my problem.
> 
> Looked at the latest source and there was a test argument being used 
> "grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
> backoff times to that value. Found the source for 1.3.9 on a machine and it 
> was being used then as well. Setting that argument to 2000ms does what I 
> wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
> fail. 2000 seems to reconnect just fine. Once we update to a newer version of 
> grpc I can change that to set the min and max backoff times.
> 
>> On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>> Sorry. Still if you forcibly remove the cable or hard shut the server, the 
>> client can’t tell if the server is down with out some sort of ping pong 
>> protocol with timeout. The tcp ip timeout is on the order of hours, and 
>> minutes if keep alive is set. 
>> 
>>> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com wrote:
>>> 
>>> That must have been a different person :). I'm actually taking down the 
>>> server and restarting it, no simulation of it.
>>> 
 On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
 I thought your original message said you were simulating the server going 
 down using iptables and causing packet loss?
 
> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
> 
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on 
> secondary rpc calls for the reconnect, which seems to be due to the 
> backoff algorithm. I was hoping to shrink that wait if possible to 
> something smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 
> seemed to still take the full 20 seconds when making an RPC call.
> 
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel 
> doesn't seem to change from transient failure once the server comes back 
> up. Tried using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and 
> KEEPALIVE_PERMIT_WITHOUT_CALLS but those didn't seem to trigger a state 
> change on the channel.
> 
> Seems like the only way to trigger a state change on the channel is to 
> make an actual rpc call.
> 
> I think the answer might just be update to a newer version of rpc and 
> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
> downloading the source and looking at how those variables are used :). 
> 
> 
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels 
>> wrote:
>> The other thing to keep in mind is that the way you are “forcing 
>> failure” is error prone - the connection is valid as packets are making 
>> it through. It is just that is will be very slow due to extreme packet 
>> loss. I am not sure this is considered a failure by gRPC. I think you 
>> would need to detect slow network connections and abort that server 
>> yourself. 
>> 
>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>> 
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>> 
>>> I could skip all updates for 20 seconds after an update but that seems 
>>> less than ideal.
>>> 
>>> By server available I was using the GetState on the channel. The 
>>> problem I was running into was that if I only call GetState on the 
>>> channel to see if the server is around it "forever" stays in the state 
>>> of transient failure (at least for 60 seconds). I was expecting to see 
>>> a state change back to idle/ready after a bit.
>>> 
 On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels 
 wrote:
 You should track the err after each update, and if non-nil, just 
 return… why keep trying the further updates in that loop.
 
 It is also trivial too - to not even attempt the next loop if it has 
 been less than N ms since the last error.
 
 According to your pseudo 

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels
I’m glad you got it working. Something doesn’t seem right though that going 
from 1 sec to 2 sec causes things to work... I would think that production 
anomalies could easily cause similar degradation, so I think this solution may 
be fragile... still, I’m still not sure I 100% get the problem you’re having 
but if you understand exactly why it works that’s good enough for me :)

> On Nov 21, 2018, at 11:09 AM, justin.cheuvr...@ansys.com wrote:
> 
> I found a fix for my problem.
> 
> Looked at the latest source and there was a test argument being used 
> "grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
> backoff times to that value. Found the source for 1.3.9 on a machine and it 
> was being used then as well. Setting that argument to 2000ms does what I 
> wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
> fail. 2000 seems to reconnect just fine. Once we update to a newer version of 
> grpc I can change that to set the min and max backoff times.
> 
>> On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>> Sorry. Still if you forcibly remove the cable or hard shut the server, the 
>> client can’t tell if the server is down with out some sort of ping pong 
>> protocol with timeout. The tcp ip timeout is on the order of hours, and 
>> minutes if keep alive is set. 
>> 
>>> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com wrote:
>>> 
>>> That must have been a different person :). I'm actually taking down the 
>>> server and restarting it, no simulation of it.
>>> 
 On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
 I thought your original message said you were simulating the server going 
 down using iptables and causing packet loss?
 
> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
> 
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on 
> secondary rpc calls for the reconnect, which seems to be due to the 
> backoff algorithm. I was hoping to shrink that wait if possible to 
> something smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 
> seemed to still take the full 20 seconds when making an RPC call.
> 
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel 
> doesn't seem to change from transient failure once the server comes back 
> up. Tried using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and 
> KEEPALIVE_PERMIT_WITHOUT_CALLS but those didn't seem to trigger a state 
> change on the channel.
> 
> Seems like the only way to trigger a state change on the channel is to 
> make an actual rpc call.
> 
> I think the answer might just be update to a newer version of rpc and 
> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
> downloading the source and looking at how those variables are used :). 
> 
> 
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels 
>> wrote:
>> The other thing to keep in mind is that the way you are “forcing 
>> failure” is error prone - the connection is valid as packets are making 
>> it through. It is just that is will be very slow due to extreme packet 
>> loss. I am not sure this is considered a failure by gRPC. I think you 
>> would need to detect slow network connections and abort that server 
>> yourself. 
>> 
>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>> 
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>> 
>>> I could skip all updates for 20 seconds after an update but that seems 
>>> less than ideal.
>>> 
>>> By server available I was using the GetState on the channel. The 
>>> problem I was running into was that if I only call GetState on the 
>>> channel to see if the server is around it "forever" stays in the state 
>>> of transient failure (at least for 60 seconds). I was expecting to see 
>>> a state change back to idle/ready after a bit.
>>> 
 On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels 
 wrote:
 You should track the err after each update, and if non-nil, just 
 return… why keep trying the further updates in that loop.
 
 It is also trivial too - to not even attempt the next loop if it has 
 been less than N ms since the last error.
 
 According to your pseudo code, you already have the ‘server available’ 
 status.
 
> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
> 
> GRPC Version: 1.3.9
> Platform: 

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront
I found a fix for my problem.

Looked at the latest source and there was a test argument being used 
"grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
backoff times to that value. Found the source for 1.3.9 on a machine and it 
was being used then as well. Setting that argument to 2000ms does what I 
wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
fail. 2000 seems to reconnect just fine. Once we update to a newer version 
of grpc I can change that to set the min and max backoff times.

On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>
> Sorry. Still if you forcibly remove the cable or hard shut the server, the 
> client can’t tell if the server is down with out some sort of ping pong 
> protocol with timeout. The tcp ip timeout is on the order of hours, and 
> minutes if keep alive is set. 
>
> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com  wrote:
>
> That must have been a different person :). I'm actually taking down the 
> server and restarting it, no simulation of it.
>
> On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>>
>> I thought your original message said you were simulating the server going 
>> down using iptables and causing packet loss?
>>
>> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
>>
>> I'm not sure I follow you on that one. I am taking the server up and down 
>> myself. Everything works fine if I just make rpc calls on the client and 
>> check the error codes. The problem was the 20 seconds blocking on secondary 
>> rpc calls for the reconnect, which seems to be due to the backoff 
>> algorithm. I was hoping to shrink that wait if possible to something 
>> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
>> still take the full 20 seconds when making an RPC call.
>>
>> Using GetState on the channel looked like it was going to get rid of the 
>> blocking nature on a broken connection but the state of the channel doesn't 
>> seem to change from transient failure once the server comes back up. Tried 
>> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
>> but those didn't seem to trigger a state change on the channel.
>>
>> Seems like the only way to trigger a state change on the channel is to 
>> make an actual rpc call.
>>
>> I think the answer might just be update to a newer version of rpc and 
>> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
>> downloading the source and looking at how those variables are used :). 
>>
>>
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>>>
>>> The other thing to keep in mind is that the way you are “forcing 
>>> failure” is error prone - the connection is valid as packets are making it 
>>> through. It is just that is will be very slow due to extreme packet loss. I 
>>> am not sure this is considered a failure by gRPC. I think you would need to 
>>> detect slow network connections and abort that server yourself. 
>>>
>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>>
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>>
>>> I could skip all updates for 20 seconds after an update but that seems 
>>> less than ideal.
>>>
>>> By server available I was using the GetState on the channel. The problem 
>>> I was running into was that if I only call GetState on the channel to see 
>>> if the server is around it "forever" stays in the state of transient 
>>> failure (at least for 60 seconds). I was expecting to see a state change 
>>> back to idle/ready after a bit.
>>>
>>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:

 You should track the err after each update, and if non-nil, just 
 return… why keep trying the further updates in that loop.

 It is also trivial too - to not even attempt the next loop if it has 
 been less than N ms since the last error.

 According to your pseudo code, you already have the ‘server available’ 
 status.

 On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:

 GRPC Version: 1.3.9
 Platform: Windows

 I'm working on a prototype application that periodically calculates 
 data and then in a multi-step process pushes the data to a server. The 
 design is that the server doesn't need to be up or can go down mid 
 process. 
 The client will not block (or block as little as possible) between updates 
 if there is problem pushing data.

 A simple model for the client would be:
 Loop Until Done
 {
  Calculate Data
  If Server Available and No Error Begin Update
  If Server Available and No Error UpdateX (Optional)
  If Server Available and No Error UpdateY (Optional)
  If Server Available and No Error UpdateZ (Optional)
  If Server Available and No Error End Update

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels
Sorry. Still if you forcibly remove the cable or hard shut the server, the 
client can’t tell if the server is down with out some sort of ping pong 
protocol with timeout. The tcp ip timeout is on the order of hours, and minutes 
if keep alive is set. 

> On Nov 21, 2018, at 10:20 AM, justin.cheuvr...@ansys.com wrote:
> 
> That must have been a different person :). I'm actually taking down the 
> server and restarting it, no simulation of it.
> 
>> On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>> I thought your original message said you were simulating the server going 
>> down using iptables and causing packet loss?
>> 
>>> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote:
>>> 
>>> I'm not sure I follow you on that one. I am taking the server up and down 
>>> myself. Everything works fine if I just make rpc calls on the client and 
>>> check the error codes. The problem was the 20 seconds blocking on secondary 
>>> rpc calls for the reconnect, which seems to be due to the backoff 
>>> algorithm. I was hoping to shrink that wait if possible to something 
>>> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
>>> still take the full 20 seconds when making an RPC call.
>>> 
>>> Using GetState on the channel looked like it was going to get rid of the 
>>> blocking nature on a broken connection but the state of the channel doesn't 
>>> seem to change from transient failure once the server comes back up. Tried 
>>> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
>>> but those didn't seem to trigger a state change on the channel.
>>> 
>>> Seems like the only way to trigger a state change on the channel is to make 
>>> an actual rpc call.
>>> 
>>> I think the answer might just be update to a newer version of rpc and look 
>>> at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
>>> downloading the source and looking at how those variables are used :). 
>>> 
>>> 
 On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
 The other thing to keep in mind is that the way you are “forcing failure” 
 is error prone - the connection is valid as packets are making it through. 
 It is just that is will be very slow due to extreme packet loss. I am not 
 sure this is considered a failure by gRPC. I think you would need to 
 detect slow network connections and abort that server yourself. 
 
> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
> 
> I do check the error code after each update and skip the rest of the 
> current iterations updates if a failure occurred.
> 
> I could skip all updates for 20 seconds after an update but that seems 
> less than ideal.
> 
> By server available I was using the GetState on the channel. The problem 
> I was running into was that if I only call GetState on the channel to see 
> if the server is around it "forever" stays in the state of transient 
> failure (at least for 60 seconds). I was expecting to see a state change 
> back to idle/ready after a bit.
> 
>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>> You should track the err after each update, and if non-nil, just return… 
>> why keep trying the further updates in that loop.
>> 
>> It is also trivial too - to not even attempt the next loop if it has 
>> been less than N ms since the last error.
>> 
>> According to your pseudo code, you already have the ‘server available’ 
>> status.
>> 
>>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
>>> 
>>> GRPC Version: 1.3.9
>>> Platform: Windows
>>> 
>>> I'm working on a prototype application that periodically calculates 
>>> data and then in a multi-step process pushes the data to a server. The 
>>> design is that the server doesn't need to be up or can go down mid 
>>> process. The client will not block (or block as little as possible) 
>>> between updates if there is problem pushing data.
>>> 
>>> A simple model for the client would be:
>>> Loop Until Done
>>> {
>>>  Calculate Data
>>>  If Server Available and No Error Begin Update
>>>  If Server Available and No Error UpdateX (Optional)
>>>  If Server Available and No Error UpdateY (Optional)
>>>  If Server Available and No Error UpdateZ (Optional)
>>>  If Server Available and No Error End Update
>>> }
>>> 
>>> The client doesn't care if the server is available but if it is should 
>>> push data, if any errors skip everything else until next update.
>>> 
>>> The problem is that if I make an call on the client (and the server 
>>> isn't available) the first fails very quickly (~1sec) and the rest take 
>>> a "long" time, ~20sec. It looks like this is due to the reconnect 
>>> backoff time. I tried setting the 

[grpc-io] Re: C++ API, unidirectional streaming: How to tell if receiver closed the connection?

2018-11-21 Thread mi
I did, sort of. I made the streaming bidirectional (basically rpc MyMethod 
(stream ClientMsg) return (stream google.protobuf.Empty) ), and attempting 
client-side recv's now gives an option to reliably check for error. I also 
wrapped the whole grpc interface into a select-style interface (as I didn't 
want to use threads and thought the async interface is overly complicated, 
especially considering I didn't *really* need that full-duplex 
functionality). I can share it here if you're interested (need to check w/ 
employer but shouldn't be an issue).

Hope that helps,
Malte

On Wednesday, November 21, 2018 at 8:17:04 AM UTC-8, ple...@swissonline.ch 
wrote:
>
> Hi,
>
> I face the same problem, did you find a solution ?
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/07218d77-b58b-42ff-a4a6-1688eb9b3639%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront
That must have been a different person :). I'm actually taking down the 
server and restarting it, no simulation of it.

On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>
> I thought your original message said you were simulating the server going 
> down using iptables and causing packet loss?
>
> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com  wrote:
>
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on secondary 
> rpc calls for the reconnect, which seems to be due to the backoff 
> algorithm. I was hoping to shrink that wait if possible to something 
> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
> still take the full 20 seconds when making an RPC call.
>
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel doesn't 
> seem to change from transient failure once the server comes back up. Tried 
> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
> but those didn't seem to trigger a state change on the channel.
>
> Seems like the only way to trigger a state change on the channel is to 
> make an actual rpc call.
>
> I think the answer might just be update to a newer version of rpc and look 
> at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
> downloading the source and looking at how those variables are used :). 
>
>
> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>>
>> The other thing to keep in mind is that the way you are “forcing failure” 
>> is error prone - the connection is valid as packets are making it through. 
>> It is just that is will be very slow due to extreme packet loss. I am not 
>> sure this is considered a failure by gRPC. I think you would need to detect 
>> slow network connections and abort that server yourself. 
>>
>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>
>> I do check the error code after each update and skip the rest of the 
>> current iterations updates if a failure occurred.
>>
>> I could skip all updates for 20 seconds after an update but that seems 
>> less than ideal.
>>
>> By server available I was using the GetState on the channel. The problem 
>> I was running into was that if I only call GetState on the channel to see 
>> if the server is around it "forever" stays in the state of transient 
>> failure (at least for 60 seconds). I was expecting to see a state change 
>> back to idle/ready after a bit.
>>
>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>>>
>>> You should track the err after each update, and if non-nil, just return… 
>>> why keep trying the further updates in that loop.
>>>
>>> It is also trivial too - to not even attempt the next loop if it has 
>>> been less than N ms since the last error.
>>>
>>> According to your pseudo code, you already have the ‘server available’ 
>>> status.
>>>
>>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
>>>
>>> GRPC Version: 1.3.9
>>> Platform: Windows
>>>
>>> I'm working on a prototype application that periodically calculates data 
>>> and then in a multi-step process pushes the data to a server. The design is 
>>> that the server doesn't need to be up or can go down mid process. The 
>>> client will not block (or block as little as possible) between updates if 
>>> there is problem pushing data.
>>>
>>> A simple model for the client would be:
>>> Loop Until Done
>>> {
>>>  Calculate Data
>>>  If Server Available and No Error Begin Update
>>>  If Server Available and No Error UpdateX (Optional)
>>>  If Server Available and No Error UpdateY (Optional)
>>>  If Server Available and No Error UpdateZ (Optional)
>>>  If Server Available and No Error End Update
>>> }
>>>
>>> The client doesn't care if the server is available but if it is should 
>>> push data, if any errors skip everything else until next update.
>>>
>>> The problem is that if I make an call on the client (and the server 
>>> isn't available) the first fails very quickly (~1sec) and the rest take a 
>>> "long" time, ~20sec. It looks like this is due to the reconnect backoff 
>>> time. I tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel 
>>> args to a lower value (2000) but that didn't have any positive affect.
>>>
>>> I tried using GetState(true) on the channel to determine if we need to 
>>> skip an update. This call fails very quickly but never seems to get out of 
>>> the transient failure state after the server was started (waited for over 
>>> 60 seconds). On the documentation it looked like the param for GetState 
>>> only affects if the channel was in the idle state to attempt a reconnect.
>>>
>>> What is the best way to achieve the functionality we'd like?
>>>
>>> I noticed there was a new 

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels
I thought your original message said you were simulating the server going down 
using iptables and causing packet loss?

> On Nov 21, 2018, at 9:52 AM, justin.cheuvr...@ansys.com wrote:
> 
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on secondary 
> rpc calls for the reconnect, which seems to be due to the backoff algorithm. 
> I was hoping to shrink that wait if possible to something smaller. Setting 
> the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to still take the full 
> 20 seconds when making an RPC call.
> 
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel doesn't 
> seem to change from transient failure once the server comes back up. Tried 
> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
> but those didn't seem to trigger a state change on the channel.
> 
> Seems like the only way to trigger a state change on the channel is to make 
> an actual rpc call.
> 
> I think the answer might just be update to a newer version of rpc and look at 
> using the MIN_RECONNECT_BACKOFF channel arg setting and probably downloading 
> the source and looking at how those variables are used :). 
> 
> 
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>> The other thing to keep in mind is that the way you are “forcing failure” is 
>> error prone - the connection is valid as packets are making it through. It 
>> is just that is will be very slow due to extreme packet loss. I am not sure 
>> this is considered a failure by gRPC. I think you would need to detect slow 
>> network connections and abort that server yourself. 
>> 
>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote:
>>> 
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>> 
>>> I could skip all updates for 20 seconds after an update but that seems less 
>>> than ideal.
>>> 
>>> By server available I was using the GetState on the channel. The problem I 
>>> was running into was that if I only call GetState on the channel to see if 
>>> the server is around it "forever" stays in the state of transient failure 
>>> (at least for 60 seconds). I was expecting to see a state change back to 
>>> idle/ready after a bit.
>>> 
 On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
 You should track the err after each update, and if non-nil, just return… 
 why keep trying the further updates in that loop.
 
 It is also trivial too - to not even attempt the next loop if it has been 
 less than N ms since the last error.
 
 According to your pseudo code, you already have the ‘server available’ 
 status.
 
> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
> 
> GRPC Version: 1.3.9
> Platform: Windows
> 
> I'm working on a prototype application that periodically calculates data 
> and then in a multi-step process pushes the data to a server. The design 
> is that the server doesn't need to be up or can go down mid process. The 
> client will not block (or block as little as possible) between updates if 
> there is problem pushing data.
> 
> A simple model for the client would be:
> Loop Until Done
> {
>  Calculate Data
>  If Server Available and No Error Begin Update
>  If Server Available and No Error UpdateX (Optional)
>  If Server Available and No Error UpdateY (Optional)
>  If Server Available and No Error UpdateZ (Optional)
>  If Server Available and No Error End Update
> }
> 
> The client doesn't care if the server is available but if it is should 
> push data, if any errors skip everything else until next update.
> 
> The problem is that if I make an call on the client (and the server isn't 
> available) the first fails very quickly (~1sec) and the rest take a 
> "long" time, ~20sec. It looks like this is due to the reconnect backoff 
> time. I tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the 
> channel args to a lower value (2000) but that didn't have any positive 
> affect.
> 
> I tried using GetState(true) on the channel to determine if we need to 
> skip an update. This call fails very quickly but never seems to get out 
> of the transient failure state after the server was started (waited for 
> over 60 seconds). On the documentation it looked like the param for 
> GetState only affects if the channel was in the idle state to attempt a 
> reconnect.
> 
> What is the best way to achieve the functionality we'd like?
> 
> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added 
> in a 

[grpc-io] Re: C++ API, unidirectional streaming: How to tell if receiver closed the connection?

2018-11-21 Thread pleuba
Hi,

I face the same problem, did you find a solution ?

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/6e325872-5b81-44f0-8692-243cb2e18a02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[grpc-io] Re: C++ client streaming RPC: How to detect sever-side stream termination on the client?

2018-11-21 Thread pleuba

Hi,

I face the same issue, and read the whole discussion on github. 

What is the status ? Does the behaviour changes regarding which grpc 
version we are using ?

Thanks

Philippe Leuba

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/b771f7b4-221c-40e6-8865-ac56bbdf8c78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront
I'm not sure I follow you on that one. I am taking the server up and down 
myself. Everything works fine if I just make rpc calls on the client and 
check the error codes. The problem was the 20 seconds blocking on secondary 
rpc calls for the reconnect, which seems to be due to the backoff 
algorithm. I was hoping to shrink that wait if possible to something 
smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
still take the full 20 seconds when making an RPC call.

Using GetState on the channel looked like it was going to get rid of the 
blocking nature on a broken connection but the state of the channel doesn't 
seem to change from transient failure once the server comes back up. Tried 
using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
but those didn't seem to trigger a state change on the channel.

Seems like the only way to trigger a state change on the channel is to make 
an actual rpc call.

I think the answer might just be update to a newer version of rpc and look 
at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
downloading the source and looking at how those variables are used :). 


On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>
> The other thing to keep in mind is that the way you are “forcing failure” 
> is error prone - the connection is valid as packets are making it through. 
> It is just that is will be very slow due to extreme packet loss. I am not 
> sure this is considered a failure by gRPC. I think you would need to detect 
> slow network connections and abort that server yourself. 
>
> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com  wrote:
>
> I do check the error code after each update and skip the rest of the 
> current iterations updates if a failure occurred.
>
> I could skip all updates for 20 seconds after an update but that seems 
> less than ideal.
>
> By server available I was using the GetState on the channel. The problem I 
> was running into was that if I only call GetState on the channel to see if 
> the server is around it "forever" stays in the state of transient failure 
> (at least for 60 seconds). I was expecting to see a state change back to 
> idle/ready after a bit.
>
> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>>
>> You should track the err after each update, and if non-nil, just return… 
>> why keep trying the further updates in that loop.
>>
>> It is also trivial too - to not even attempt the next loop if it has been 
>> less than N ms since the last error.
>>
>> According to your pseudo code, you already have the ‘server available’ 
>> status.
>>
>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
>>
>> GRPC Version: 1.3.9
>> Platform: Windows
>>
>> I'm working on a prototype application that periodically calculates data 
>> and then in a multi-step process pushes the data to a server. The design is 
>> that the server doesn't need to be up or can go down mid process. The 
>> client will not block (or block as little as possible) between updates if 
>> there is problem pushing data.
>>
>> A simple model for the client would be:
>> Loop Until Done
>> {
>>  Calculate Data
>>  If Server Available and No Error Begin Update
>>  If Server Available and No Error UpdateX (Optional)
>>  If Server Available and No Error UpdateY (Optional)
>>  If Server Available and No Error UpdateZ (Optional)
>>  If Server Available and No Error End Update
>> }
>>
>> The client doesn't care if the server is available but if it is should 
>> push data, if any errors skip everything else until next update.
>>
>> The problem is that if I make an call on the client (and the server isn't 
>> available) the first fails very quickly (~1sec) and the rest take a "long" 
>> time, ~20sec. It looks like this is due to the reconnect backoff time. I 
>> tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to 
>> a lower value (2000) but that didn't have any positive affect.
>>
>> I tried using GetState(true) on the channel to determine if we need to 
>> skip an update. This call fails very quickly but never seems to get out of 
>> the transient failure state after the server was started (waited for over 
>> 60 seconds). On the documentation it looked like the param for GetState 
>> only affects if the channel was in the idle state to attempt a reconnect.
>>
>> What is the best way to achieve the functionality we'd like?
>>
>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added 
>> in a later version of grpc, would that cause the grpc call to "fail fast" 
>> if I upgraded and set that to a low value ~1sec?
>>
>> Is there a better way to handle this situation in general?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to grpc-io+u...@googlegroups.com.
>> To post to this group, 

Re: [grpc-io] [gRPC Java] need help to diagnose weird stall in grpc call

2018-11-21 Thread Robert Engels
Are the client and server on different machines over a WAN?

Are you certain the server side isn’t blocking? I would jstack the server as 
soon as a delay is detected during the test. I would also run a constant ping 
from client to server during the test to make sure there are no network 
failures. 

Other than that, I think you have to enable the gRPC tracing on both sides to 
isolate where the hang is happening. 

> On Nov 21, 2018, at 9:18 AM, namless...@gmail.com wrote:
> 
> 
>final NettyChannelBuilder channelBuilder = 
> NettyChannelBuilder.forAddress(getHost(), getPort())
> .usePlaintext(grpcProperties.isUsePlainText())
> 
> .loadBalancerFactory(RoundRobinLoadBalancerFactory.getInstance())
> .intercept(getClientInterceptors());
> 
> addConnectionPersistenceConfig(channelBuilder);
> 
> 
> if (grpcProperties.isEnableClientFixedConcurrency()) {
> 
> channelBuilder.executor(Executors.newFixedThreadPool(grpcProperties.getClientThreadNumber()));
> }
> 
> 
> this.channel = channelBuilder.build();
> ...
> 
> private void addConnectionPersistenceConfig(final NettyChannelBuilder 
> channelBuilder) {
> if (grpcProperties.getClientKeepAlive() != 0) {
> channelBuilder
> .keepAliveTime(grpcProperties.getClientKeepAlive(), 
> SECONDS) // 5 
> 
> .keepAliveWithoutCalls(grpcProperties.isClientKeepAliveWithoutCalls()) //true 
> 
> .keepAliveTimeout(grpcProperties.getClientKeepAliveTimeout(), SECONDS); //60
> }
> 
> if (grpcProperties.getClientIdle() != 0) {
> channelBuilder.idleTimeout(grpcProperties.getClientIdle(), 
> SECONDS); //60
> }
> }
> 
> 
> I have added the relevant bits of code that build the client, i think it 
> should reuse the connection.
> 
> 
> 
> 
> Another log from a client: 
> 
> 112ms:event=Started call
> 113ms:event=Message sent
> 113ms:event=Finished sending messages
> 5.16s:Response headers 
> received=Metadata(content-type=application/grpc,grpc-encoding=identity,grpc-accept-encoding=gzip)
> 5.16s:event=Response received
> 5.16s:event=Call closed
> 
> 
> On the server side, there is no request that took more than 50ms. 
> 
> 
> Regarding file descriptors, both the client and server have about 100 open 
> file descriptors. 
> 
> 
>> On Wednesday, November 21, 2018 at 4:19:59 PM UTC+2, Robert Engels wrote:
>> There are also ways to abort the connection to avoid the close delay. 
>> 
>>> On Nov 21, 2018, at 8:18 AM, Robert Engels  wrote:
>>> 
>>> It could be a wait for tcp connection.  If you are continually creating new 
>>> connections, the server will run out of file descriptors since some 
>>> connections will remain in a close wait state - so it has to wait for these 
>>> to finally close in order to make a new connection. 
>>> 
>>> You might want to make sure your test is reusing connections. 
>>> 
 On Nov 21, 2018, at 8:15 AM, Alexandru Keszeg  
 wrote:
 
 That was my first thought as well, but monitoring doesn't show any long GC 
 pauses. 
 
 What seems odd is that I have not seen a "pause" between two query 
 traces(check the attached image in the first post), only at the "start" of 
 a request. 
 
> On Wed, Nov 21, 2018 at 3:34 PM Robert Engels  
> wrote:
> Maybe a full GC is occurring on the server ? That’s what I would look 
> for. 
> 
>> On Nov 21, 2018, at 2:50 AM, ake...@pitechnologies.ro wrote:
>> 
>> Randomly, some gRPC calls which usually complete in 20 milliseconds take 
>> a few seconds.
>> We have Jager in place, traces show a few seconds where the call does 
>> nothing and then processing begins, which seems to imply queuing of some 
>> sorts? 
>> 
>> On the server side, we have fixed concurrency, a thread dump shows most 
>> of them idle.
>> Our environment: Kuberentes 1.9 on Google Cloud, services are exposed 
>> using ClusterIP: None, clients connect using DNS load balancing
>> 
>> - Is there some build-in queuing on the server side?
>> - Is there any way to track the queue depth?
>> - Any other tips on debugging this? 
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send 
>> an email to grpc-io+u...@googlegroups.com.
>> To post to this group, send email to grp...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/grpc-io.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>> 
 
 
 -- 
 
 
 

Re: [grpc-io] [gRPC Java] need help to diagnose weird stall in grpc call

2018-11-21 Thread namlessone

   final NettyChannelBuilder channelBuilder = NettyChannelBuilder.
forAddress(getHost(), getPort())
.usePlaintext(grpcProperties.isUsePlainText())
.loadBalancerFactory(RoundRobinLoadBalancerFactory.
getInstance())
.intercept(getClientInterceptors());

addConnectionPersistenceConfig(channelBuilder);


if (grpcProperties.isEnableClientFixedConcurrency()) {
channelBuilder.executor(Executors.newFixedThreadPool(
grpcProperties.getClientThreadNumber()));
}


this.channel = channelBuilder.build();
...

private void addConnectionPersistenceConfig(final NettyChannelBuilder 
channelBuilder) {
if (grpcProperties.getClientKeepAlive() != 0) {
channelBuilder
.keepAliveTime(grpcProperties.getClientKeepAlive(), 
SECONDS) // 5 
.keepAliveWithoutCalls(grpcProperties.
isClientKeepAliveWithoutCalls()) //true 
.keepAliveTimeout(grpcProperties.
getClientKeepAliveTimeout(), SECONDS); //60
}

if (grpcProperties.getClientIdle() != 0) {
channelBuilder.idleTimeout(grpcProperties.getClientIdle(), 
SECONDS); //60
}
}

I have added the relevant bits of code that build the client, i think it 
should reuse the connection.




Another log from a client: 

112ms:
   
   - event=Started call

113ms:
   
   - event=Message sent

113ms:
   
   - event=Finished sending messages

5.16s:
   
   - Response headers received=
   
Metadata(content-type=application/grpc,grpc-encoding=identity,grpc-accept-encoding=gzip)

5.16s:
   
   - event=Response received

5.16s:
   
   - event=Call closed



On the server side, there is no request that took more than 50ms. 


Regarding file descriptors, both the client and server have about 100 open 
file descriptors. 


On Wednesday, November 21, 2018 at 4:19:59 PM UTC+2, Robert Engels wrote:
>
> There are also ways to abort the connection to avoid the close delay. 
>
> On Nov 21, 2018, at 8:18 AM, Robert Engels  > wrote:
>
> It could be a wait for tcp connection.  If you are continually creating 
> new connections, the server will run out of file descriptors since some 
> connections will remain in a close wait state - so it has to wait for these 
> to finally close in order to make a new connection. 
>
> You might want to make sure your test is reusing connections. 
>
> On Nov 21, 2018, at 8:15 AM, Alexandru Keszeg  > wrote:
>
> That was my first thought as well, but monitoring doesn't show any long GC 
> pauses. 
>
> What seems odd is that I have not seen a "pause" between two query 
> traces(check the attached image in the first post), only at the "start" of 
> a request. 
>
> On Wed, Nov 21, 2018 at 3:34 PM Robert Engels  > wrote:
>
>> Maybe a full GC is occurring on the server ? That’s what I would look 
>> for. 
>>
>> On Nov 21, 2018, at 2:50 AM, ake...@pitechnologies.ro  
>> wrote:
>>
>> Randomly, some gRPC calls which usually complete in 20 milliseconds take 
>> a few seconds.
>> We have Jager in place, traces show a few seconds where the call does 
>> nothing and then processing begins, which seems to imply queuing of some 
>> sorts? 
>>
>> On the server side, we have fixed concurrency, a thread dump shows most 
>> of them idle.
>> Our environment: Kuberentes 1.9 on Google Cloud, services are exposed 
>> using ClusterIP: None, clients connect using DNS load balancing
>>
>> - Is there some build-in queuing on the server side?
>> - Is there any way to track the queue depth?
>> - Any other tips on debugging this? 
>>
>> 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to grpc-io+u...@googlegroups.com .
>> To post to this group, send email to grp...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/grpc-io.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> 
>>
>>
>
> -- 
> 
>
>
> 
>
> Alexandru Keszeg
>
> Developer
>
> +40 747124216
>
> Coratim Business Center 
>
> Campul Painii nr.3-5 
>
> Cluj Napoca ROMANIA
>
> This message (including any attachment(s)) may be copyright-protected 
> and/or contain privileged and confidential information intended for use by 
> the above-mentioned recipient only.  If you are not the intended recipient 
> of this message, then please inform the sender immediately via the 
> telephone number, fax number or e-mail address indicated above and promptly 
> delete this message from your system.  Any unauthorized 

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread Robert Engels
The other thing to keep in mind is that the way you are “forcing failure” is 
error prone - the connection is valid as packets are making it through. It is 
just that is will be very slow due to extreme packet loss. I am not sure this 
is considered a failure by gRPC. I think you would need to detect slow network 
connections and abort that server yourself. 

> On Nov 21, 2018, at 9:12 AM, justin.cheuvr...@ansys.com wrote:
> 
> I do check the error code after each update and skip the rest of the current 
> iterations updates if a failure occurred.
> 
> I could skip all updates for 20 seconds after an update but that seems less 
> than ideal.
> 
> By server available I was using the GetState on the channel. The problem I 
> was running into was that if I only call GetState on the channel to see if 
> the server is around it "forever" stays in the state of transient failure (at 
> least for 60 seconds). I was expecting to see a state change back to 
> idle/ready after a bit.
> 
>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>> You should track the err after each update, and if non-nil, just return… why 
>> keep trying the further updates in that loop.
>> 
>> It is also trivial too - to not even attempt the next loop if it has been 
>> less than N ms since the last error.
>> 
>> According to your pseudo code, you already have the ‘server available’ 
>> status.
>> 
>>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote:
>>> 
>>> GRPC Version: 1.3.9
>>> Platform: Windows
>>> 
>>> I'm working on a prototype application that periodically calculates data 
>>> and then in a multi-step process pushes the data to a server. The design is 
>>> that the server doesn't need to be up or can go down mid process. The 
>>> client will not block (or block as little as possible) between updates if 
>>> there is problem pushing data.
>>> 
>>> A simple model for the client would be:
>>> Loop Until Done
>>> {
>>>  Calculate Data
>>>  If Server Available and No Error Begin Update
>>>  If Server Available and No Error UpdateX (Optional)
>>>  If Server Available and No Error UpdateY (Optional)
>>>  If Server Available and No Error UpdateZ (Optional)
>>>  If Server Available and No Error End Update
>>> }
>>> 
>>> The client doesn't care if the server is available but if it is should push 
>>> data, if any errors skip everything else until next update.
>>> 
>>> The problem is that if I make an call on the client (and the server isn't 
>>> available) the first fails very quickly (~1sec) and the rest take a "long" 
>>> time, ~20sec. It looks like this is due to the reconnect backoff time. I 
>>> tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to 
>>> a lower value (2000) but that didn't have any positive affect.
>>> 
>>> I tried using GetState(true) on the channel to determine if we need to skip 
>>> an update. This call fails very quickly but never seems to get out of the 
>>> transient failure state after the server was started (waited for over 60 
>>> seconds). On the documentation it looked like the param for GetState only 
>>> affects if the channel was in the idle state to attempt a reconnect.
>>> 
>>> What is the best way to achieve the functionality we'd like?
>>> 
>>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added in 
>>> a later version of grpc, would that cause the grpc call to "fail fast" if I 
>>> upgraded and set that to a low value ~1sec?
>>> 
>>> Is there a better way to handle this situation in general?
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "grpc.io" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to grpc-io+u...@googlegroups.com.
>>> To post to this group, send email to grp...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/grpc-io.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at 

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

2018-11-21 Thread justin . cheuvront
I do check the error code after each update and skip the rest of the 
current iterations updates if a failure occurred.

I could skip all updates for 20 seconds after an update but that seems less 
than ideal.

By server available I was using the GetState on the channel. The problem I 
was running into was that if I only call GetState on the channel to see if 
the server is around it "forever" stays in the state of transient failure 
(at least for 60 seconds). I was expecting to see a state change back to 
idle/ready after a bit.

On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>
> You should track the err after each update, and if non-nil, just return… 
> why keep trying the further updates in that loop.
>
> It is also trivial too - to not even attempt the next loop if it has been 
> less than N ms since the last error.
>
> According to your pseudo code, you already have the ‘server available’ 
> status.
>
> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com  wrote:
>
> GRPC Version: 1.3.9
> Platform: Windows
>
> I'm working on a prototype application that periodically calculates data 
> and then in a multi-step process pushes the data to a server. The design is 
> that the server doesn't need to be up or can go down mid process. The 
> client will not block (or block as little as possible) between updates if 
> there is problem pushing data.
>
> A simple model for the client would be:
> Loop Until Done
> {
>  Calculate Data
>  If Server Available and No Error Begin Update
>  If Server Available and No Error UpdateX (Optional)
>  If Server Available and No Error UpdateY (Optional)
>  If Server Available and No Error UpdateZ (Optional)
>  If Server Available and No Error End Update
> }
>
> The client doesn't care if the server is available but if it is should 
> push data, if any errors skip everything else until next update.
>
> The problem is that if I make an call on the client (and the server isn't 
> available) the first fails very quickly (~1sec) and the rest take a "long" 
> time, ~20sec. It looks like this is due to the reconnect backoff time. I 
> tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to 
> a lower value (2000) but that didn't have any positive affect.
>
> I tried using GetState(true) on the channel to determine if we need to 
> skip an update. This call fails very quickly but never seems to get out of 
> the transient failure state after the server was started (waited for over 
> 60 seconds). On the documentation it looked like the param for GetState 
> only affects if the channel was in the idle state to attempt a reconnect.
>
> What is the best way to achieve the functionality we'd like?
>
> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added 
> in a later version of grpc, would that cause the grpc call to "fail fast" 
> if I upgraded and set that to a low value ~1sec?
>
> Is there a better way to handle this situation in general?
>
> -- 
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+u...@googlegroups.com .
> To post to this group, send email to grp...@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] [gRPC Java] need help to diagnose weird stall in grpc call

2018-11-21 Thread Robert Engels
There are also ways to abort the connection to avoid the close delay. 

> On Nov 21, 2018, at 8:18 AM, Robert Engels  wrote:
> 
> It could be a wait for tcp connection.  If you are continually creating new 
> connections, the server will run out of file descriptors since some 
> connections will remain in a close wait state - so it has to wait for these 
> to finally close in order to make a new connection. 
> 
> You might want to make sure your test is reusing connections. 
> 
>> On Nov 21, 2018, at 8:15 AM, Alexandru Keszeg  wrote:
>> 
>> That was my first thought as well, but monitoring doesn't show any long GC 
>> pauses. 
>> 
>> What seems odd is that I have not seen a "pause" between two query 
>> traces(check the attached image in the first post), only at the "start" of a 
>> request. 
>> 
>>> On Wed, Nov 21, 2018 at 3:34 PM Robert Engels  wrote:
>>> Maybe a full GC is occurring on the server ? That’s what I would look for. 
>>> 
 On Nov 21, 2018, at 2:50 AM, akes...@pitechnologies.ro wrote:
 
 Randomly, some gRPC calls which usually complete in 20 milliseconds take a 
 few seconds.
 We have Jager in place, traces show a few seconds where the call does 
 nothing and then processing begins, which seems to imply queuing of some 
 sorts? 
 
 On the server side, we have fixed concurrency, a thread dump shows most of 
 them idle.
 Our environment: Kuberentes 1.9 on Google Cloud, services are exposed 
 using ClusterIP: None, clients connect using DNS load balancing
 
 - Is there some build-in queuing on the server side?
 - Is there any way to track the queue depth?
 - Any other tips on debugging this? 
 
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 "grpc.io" group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to grpc-io+unsubscr...@googlegroups.com.
 To post to this group, send email to grpc-io@googlegroups.com.
 Visit this group at https://groups.google.com/group/grpc-io.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
>> 
>> 
>> -- 
>> 
>> 
>> 
>> 
>> Alexandru Keszeg
>> Developer
>> +40 747124216
>> Coratim Business Center 
>> Campul Painii nr.3-5 
>> Cluj Napoca ROMANIA
>> This message (including any attachment(s)) may be copyright-protected and/or 
>> contain privileged and confidential information intended for use by the 
>> above-mentioned recipient only.  If you are not the intended recipient of 
>> this message, then please inform the sender immediately via the telephone 
>> number, fax number or e-mail address indicated above and promptly delete 
>> this message from your system.  Any unauthorized copying, disclosure to 
>> third parties or use of this message (including any attachment(s)) is 
>> strictly prohibited.  It is generally accepted that the security of 
>> electronic communications is not failsafe. Despite our best efforts, we 
>> cannot guarantee that electronic communications received were in fact sent 
>> by the purported sender and we shall not be liable for the improper or 
>> incomplete transmission of the information contained in this communication, 
>> nor for any delay in its receipt or damage to your system.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to grpc-io+unsubscr...@googlegroups.com.
>> To post to this group, send email to grpc-io@googlegroups.com.
>> Visit this group at https://groups.google.com/group/grpc-io.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/grpc-io/CACekFaEPNG6ZhXEZgyY-YwyNSE2V-aAgcsnCGcvN_nb13ovipg%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/FFD79299-249F-47F3-88D8-3251B378B023%40earthlink.net.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this 

Re: [grpc-io] [gRPC Java] need help to diagnose weird stall in grpc call

2018-11-21 Thread Robert Engels
It could be a wait for tcp connection.  If you are continually creating new 
connections, the server will run out of file descriptors since some connections 
will remain in a close wait state - so it has to wait for these to finally 
close in order to make a new connection. 

You might want to make sure your test is reusing connections. 

> On Nov 21, 2018, at 8:15 AM, Alexandru Keszeg  wrote:
> 
> That was my first thought as well, but monitoring doesn't show any long GC 
> pauses. 
> 
> What seems odd is that I have not seen a "pause" between two query 
> traces(check the attached image in the first post), only at the "start" of a 
> request. 
> 
>> On Wed, Nov 21, 2018 at 3:34 PM Robert Engels  wrote:
>> Maybe a full GC is occurring on the server ? That’s what I would look for. 
>> 
>>> On Nov 21, 2018, at 2:50 AM, akes...@pitechnologies.ro wrote:
>>> 
>>> Randomly, some gRPC calls which usually complete in 20 milliseconds take a 
>>> few seconds.
>>> We have Jager in place, traces show a few seconds where the call does 
>>> nothing and then processing begins, which seems to imply queuing of some 
>>> sorts? 
>>> 
>>> On the server side, we have fixed concurrency, a thread dump shows most of 
>>> them idle.
>>> Our environment: Kuberentes 1.9 on Google Cloud, services are exposed using 
>>> ClusterIP: None, clients connect using DNS load balancing
>>> 
>>> - Is there some build-in queuing on the server side?
>>> - Is there any way to track the queue depth?
>>> - Any other tips on debugging this? 
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "grpc.io" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to grpc-io+unsubscr...@googlegroups.com.
>>> To post to this group, send email to grpc-io@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/grpc-io.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>> 
> 
> 
> -- 
> 
> 
> 
> 
> Alexandru Keszeg
> Developer
> +40 747124216
> Coratim Business Center 
> Campul Painii nr.3-5 
> Cluj Napoca ROMANIA
> This message (including any attachment(s)) may be copyright-protected and/or 
> contain privileged and confidential information intended for use by the 
> above-mentioned recipient only.  If you are not the intended recipient of 
> this message, then please inform the sender immediately via the telephone 
> number, fax number or e-mail address indicated above and promptly delete this 
> message from your system.  Any unauthorized copying, disclosure to third 
> parties or use of this message (including any attachment(s)) is strictly 
> prohibited.  It is generally accepted that the security of electronic 
> communications is not failsafe. Despite our best efforts, we cannot guarantee 
> that electronic communications received were in fact sent by the purported 
> sender and we shall not be liable for the improper or incomplete transmission 
> of the information contained in this communication, nor for any delay in its 
> receipt or damage to your system.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/CACekFaEPNG6ZhXEZgyY-YwyNSE2V-aAgcsnCGcvN_nb13ovipg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/FFD79299-249F-47F3-88D8-3251B378B023%40earthlink.net.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] [gRPC Java] need help to diagnose weird stall in grpc call

2018-11-21 Thread Alexandru Keszeg
That was my first thought as well, but monitoring doesn't show any long GC
pauses.

What seems odd is that I have not seen a "pause" between two query
traces(check the attached image in the first post), only at the "start" of
a request.

On Wed, Nov 21, 2018 at 3:34 PM Robert Engels  wrote:

> Maybe a full GC is occurring on the server ? That’s what I would look for.
>
> On Nov 21, 2018, at 2:50 AM, akes...@pitechnologies.ro wrote:
>
> Randomly, some gRPC calls which usually complete in 20 milliseconds take a
> few seconds.
> We have Jager in place, traces show a few seconds where the call does
> nothing and then processing begins, which seems to imply queuing of some
> sorts?
>
> On the server side, we have fixed concurrency, a thread dump shows most of
> them idle.
> Our environment: Kuberentes 1.9 on Google Cloud, services are exposed
> using ClusterIP: None, clients connect using DNS load balancing
>
> - Is there some build-in queuing on the server side?
> - Is there any way to track the queue depth?
> - Any other tips on debugging this?
>
> 
>
> --
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
> 
>
>

-- 





Alexandru Keszeg

Developer

+40 747124216

Coratim Business Center

Campul Painii nr.3-5

Cluj Napoca ROMANIA

This message (including any attachment(s)) may be copyright-protected
and/or contain privileged and confidential information intended for use by
the above-mentioned recipient only.  If you are not the intended recipient
of this message, then please inform the sender immediately via the
telephone number, fax number or e-mail address indicated above and promptly
delete this message from your system.  Any unauthorized copying, disclosure
to third parties or use of this message (including any attachment(s)) is
strictly prohibited.  It is generally accepted that the security of
electronic communications is not failsafe. Despite our best efforts, we
cannot guarantee that electronic communications received were in fact sent
by the purported sender and we shall not be liable for the improper or
incomplete transmission of the information contained in this communication,
nor for any delay in its receipt or damage to your system.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CACekFaEPNG6ZhXEZgyY-YwyNSE2V-aAgcsnCGcvN_nb13ovipg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] Re: gRPC server executor pool slowing process execution

2018-11-21 Thread Robert Engels
1L? Still, you might want to look at /proc/schedstat 600% utilization is pretty 
high, especially since there may be only 4 real cores. 

Also, what is the user cpu % vs system ? You might just be thrashing scheduling 
all of those threads on 8 cores. 

> On Nov 21, 2018, at 7:45 AM, qplc  wrote:
> 
> Single publisher is sending 1L messages and being distributed over 2mins.
> 
> Observed:
> load average: 1.24, 0.48, 0.17
> %CPU: 600
> %MEM: 65
> 
> 
> 
>> On Thursday, October 25, 2018 at 3:49:07 PM UTC+5:30, qplc wrote:
>> Hi,
>> 
>> I've implemented zookeeper balanced grpc server and client.
>> Following are the execution configuration details:
>> Grpc Client:
>> Channel Count: 1
>> Boss/Acceptor Thread: 1
>> Nio Threads: 100
>> Executor/App Threads: 100
>> 
>> Grpc Server:
>> Nio Threads: 100
>> Executor/App Threads: 400
>> Max conucurrent calls per connection: 100
>> 
>> Here, I'm using ForkJoinPool while setting executor. I'm sending messages on 
>> RabbitMQ and forwards to gRPC client. Publisher rate is 10k messages per 
>> 10sec. 
>> 
>> As I've observed each request when reaches server it executes till 
>> 10seconds. And as configured most 400 tasks are getting executed 
>> concurrently at a time. Due to this piling up other 9600 requests as waiting 
>> for application threads to be available.
>> 
>> This is slowing overall process as piling up the requests and I couldn't 
>> delegate the tasks to other thread pool as executor has already been 
>> dedicated for the same.
>> 
>> Also, I've given 16gb for application. Increasing the thread count could 
>> help here as I have already given sufficient configuration.
>> How do I improve the execution faster?
>> 
>> 
>> Thanks,
>> qplc
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/7bc25124-5456-4fcd-9ad3-aff2917337dd%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/1A899BB6-2FA0-49D2-AE09-D7156565D663%40earthlink.net.
For more options, visit https://groups.google.com/d/optout.


[grpc-io] Re: gRPC server executor pool slowing process execution

2018-11-21 Thread qplc
Single publisher is sending 1L messages and being distributed over 2mins.

*Observed:*
load average: 1.24, 0.48, 0.17
%CPU: 600
%MEM: 65



On Thursday, October 25, 2018 at 3:49:07 PM UTC+5:30, qplc wrote:
>
> Hi,
>
> I've implemented zookeeper balanced grpc server and client.
> Following are the execution configuration details:
> Grpc Client:
> Channel Count: 1
> Boss/Acceptor Thread: 1
> Nio Threads: 100
> Executor/App Threads: 100
>
> Grpc Server:
> Nio Threads: 100
> Executor/App Threads: 400
> Max conucurrent calls per connection: 100
>
> Here, I'm using ForkJoinPool while setting executor. I'm sending messages 
> on RabbitMQ and forwards to gRPC client. Publisher rate is 10k messages per 
> 10sec. 
>
> As I've observed each request when reaches server it executes till 
> 10seconds. And as configured most 400 tasks are getting executed 
> concurrently at a time. Due to this piling up other 9600 requests as 
> waiting for application threads to be available.
>
> This is slowing overall process as piling up the requests and I couldn't 
> delegate the tasks to other thread pool as executor has already been 
> dedicated for the same.
>
> Also, I've given 16gb for application. Increasing the thread count could 
> help here as I have already given sufficient configuration.
> How do I improve the execution faster?
>
>
> Thanks,
> qplc
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/7bc25124-5456-4fcd-9ad3-aff2917337dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] [gRPC Java] need help to diagnose weird stall in grpc call

2018-11-21 Thread Robert Engels
Maybe a full GC is occurring on the server ? That’s what I would look for. 

> On Nov 21, 2018, at 2:50 AM, akes...@pitechnologies.ro wrote:
> 
> Randomly, some gRPC calls which usually complete in 20 milliseconds take a 
> few seconds.
> We have Jager in place, traces show a few seconds where the call does nothing 
> and then processing begins, which seems to imply queuing of some sorts? 
> 
> On the server side, we have fixed concurrency, a thread dump shows most of 
> them idle.
> Our environment: Kuberentes 1.9 on Google Cloud, services are exposed using 
> ClusterIP: None, clients connect using DNS load balancing
> 
> - Is there some build-in queuing on the server side?
> - Is there any way to track the queue depth?
> - Any other tips on debugging this? 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/FA956101-4237-4B03-A19F-8B5B9A3EC825%40earthlink.net.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] Re: gRPC server executor pool slowing process execution

2018-11-21 Thread Robert Engels
What is the cpu utilization and load factor during the slowdown ? If a single 
publisher, and the rate is 10k per 10 sec then every task must complete in 1 ms 
or they will back up, unless you allow out of order handling. 

> On Nov 21, 2018, at 1:53 AM, qplc  wrote:
> 
> Thanks Kun for your reply.
> 
> As you mentioned, I have implemented non streaming async client and server 
> with limited threads configured.
> 
> Here are the details:
> Machine Cores: 8
> Channel Count: 1
> Grpc Client NIO Threads: (Cores-2) * 2 = 12
> Grpc Client App Threads: 30
> Grpc Server NIO Threads: (Cores-2) * 2 = 12
> Grpc Server App Threads: 30
> Grpc Server Max. Concurrent Calls Per Connection = Grpc Server NIO Threads 
> i.e. 12
> 
> RabbitMQ Max. Pool Size = 150
> Grpc Client Threadpool size for Submitting RPC Call = 200
> Grpc Server Threadpool size to execute business functionality = 200 (Added 
> this threadpool to free Grpc Server App Threads) 
> 
> I have best tried to finetune grpc client and server but still I have seen a 
> slowness while submitting tasks for business functionality execution.
> 
> I have seen many messages piled up in the rabbitmq queue. This shows threads 
> are busy submitting rpc tasks.
> 
> Am I wrongly configuring grpc server/client here?
> 
> On Thursday, October 25, 2018 at 3:49:07 PM UTC+5:30, qplc wrote:
>> 
>> Hi,
>> 
>> I've implemented zookeeper balanced grpc server and client.
>> Following are the execution configuration details:
>> Grpc Client:
>> Channel Count: 1
>> Boss/Acceptor Thread: 1
>> Nio Threads: 100
>> Executor/App Threads: 100
>> 
>> Grpc Server:
>> Nio Threads: 100
>> Executor/App Threads: 400
>> Max conucurrent calls per connection: 100
>> 
>> Here, I'm using ForkJoinPool while setting executor. I'm sending messages on 
>> RabbitMQ and forwards to gRPC client. Publisher rate is 10k messages per 
>> 10sec. 
>> 
>> As I've observed each request when reaches server it executes till 
>> 10seconds. And as configured most 400 tasks are getting executed 
>> concurrently at a time. Due to this piling up other 9600 requests as waiting 
>> for application threads to be available.
>> 
>> This is slowing overall process as piling up the requests and I couldn't 
>> delegate the tasks to other thread pool as executor has already been 
>> dedicated for the same.
>> 
>> Also, I've given 16gb for application. Increasing the thread count could 
>> help here as I have already given sufficient configuration.
>> How do I improve the execution faster?
>> 
>> 
>> Thanks,
>> qplc
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/e619cf3a-fc77-486e-ad78-72f9186da003%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/9F52D676-6A37-40D5-84D2-2F882557E8A1%40earthlink.net.
For more options, visit https://groups.google.com/d/optout.


Re: [grpc-io] how to create a server to consume messages from multiple clients in Nodejs?

2018-11-21 Thread Arunkumar P
Hi Nicolas,
Currently I am facing this issue, when multiple clients are trying to
connect to single server with in the network.Kindly do the needful.

Grpc.Core.RpcException: Status(StatusCode=Unavailable, Detail="Connect
Failed")

Thanks & Regards,
Arun Kumar.P

On Thu, Nov 1, 2018 at 11:56 AM Arunkumar P 
wrote:

> Thanks a lot
>
> On Thu, 1 Nov 2018, 21:52 Nicolas Noble 
>> There's nothing special to do. Any gRPC server from any language should
>> be able to natively handle many different clients from many different
>> origins without any special handling.
>>
>> On Thu, Nov 1, 2018 at 5:51 AM  wrote:
>>
>>> how to create a server to consume messages from multiple clients in
>>> Nodejs?i.e
>>>
>>> I should have one listener in Node Js and 500 clients push the different
>>> messages.My server should be able to consume all client messages.
>>>
>>> Thanks,
>>> Arun
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "grpc.io" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to grpc-io+unsubscr...@googlegroups.com.
>>> To post to this group, send email to grpc-io@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/grpc-io.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/grpc-io/3e9e7315-f478-4459-b0dc-87c072a054e8%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAN-SuZDiV0%3DH5d7q%3D%3DZ6UcQ3ps%2B93npsrNbtNSHb9ETVtvWzMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[grpc-io] [gRPC Java] need help to diagnose weird stall in grpc call

2018-11-21 Thread akeszeg
Randomly, some gRPC calls which usually complete in 20 milliseconds take a 
few seconds.
We have Jager in place, traces show a few seconds where the call does 
nothing and then processing begins, which seems to imply queuing of some 
sorts? 

On the server side, we have fixed concurrency, a thread dump shows most of 
them idle.
Our environment: Kuberentes 1.9 on Google Cloud, services are exposed using 
ClusterIP: None, clients connect using DNS load balancing

- Is there some build-in queuing on the server side?
- Is there any way to track the queue depth?
- Any other tips on debugging this? 

[image: Selection_120.png]

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.