[GitHub] [incubator-brpc] guocanwen commented on issue #962: brpc block timeout

2020-04-12 Thread GitBox
guocanwen commented on issue #962: brpc block timeout
URL: https://github.com/apache/incubator-brpc/issues/962#issuecomment-612738870
 
 
   bthread 和 pthread 混用有冲突,要么全换成bthread,要么改成pthread模式
   
   
   
   
   -- 原始邮件 --
   发件人: "youcheng huang"

[GitHub] [incubator-brpc] guocanwen commented on issue #962: brpc block timeout

2019-11-09 Thread GitBox
guocanwen commented on issue #962: brpc block timeout
URL: https://github.com/apache/incubator-brpc/issues/962#issuecomment-552082459
 
 
   应该是超时没生效,死锁了


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@brpc.apache.org
For additional commands, e-mail: dev-h...@brpc.apache.org



[GitHub] [incubator-brpc] guocanwen commented on issue #962: brpc block timeout

2019-11-09 Thread GitBox
guocanwen commented on issue #962: brpc block timeout
URL: https://github.com/apache/incubator-brpc/issues/962#issuecomment-552081747
 
 
   线下复现,应该是死锁了。 测试的时候,进程X是个brpc server(其他服务会调用它),收到请求之后,会调用brpc client访问一些依赖服务。
   
   
   多次捕捉现场的时候,确认 发送给进程X的所有请求都被refuse了,X调用远端服务也没有收到任何请求了,但是从堆栈上看,这些请求并没有返回。
   
   
   
   
   
   -- 原始邮件 --
   发件人: "Ge Jun"

[GitHub] [incubator-brpc] guocanwen commented on issue #962: brpc block timeout

2019-11-07 Thread GitBox
guocanwen commented on issue #962: brpc block timeout
URL: https://github.com/apache/incubator-brpc/issues/962#issuecomment-551381616
 
 
   bvar没有开,监控是只有超时率。我们先线下做个测试,不断模拟服务节点异常,看看能不能复现,看看能不能采集下现场。谢谢了
   
   
   -- 原始邮件 --
   发件人: "Ge Jun"

[GitHub] [incubator-brpc] guocanwen commented on issue #962: brpc block timeout

2019-11-07 Thread GitBox
guocanwen commented on issue #962: brpc block timeout
URL: https://github.com/apache/incubator-brpc/issues/962#issuecomment-551369956
 
 
   
是的,现在不清楚为什么一个server节点出问题,会影响访问其他server节点,这个比较费解。另外,这个服务还会访问其他类型的服务,有些也使用brpc接口,那些服务也超时了,而且只能通过重启恢复
   
   
   
   
   -- 原始邮件 --
   发件人: "Ge Jun"

[GitHub] [incubator-brpc] guocanwen commented on issue #962: brpc block timeout

2019-11-07 Thread GitBox
guocanwen commented on issue #962: brpc block timeout
URL: https://github.com/apache/incubator-brpc/issues/962#issuecomment-551366875
 
 
   谢谢回复,具体的一些情况如下:
   
   
   具体的操作是:
   2019-11-07 20:17:56     远端服务10.12.145.207:12306 core掉(grpc 
server)
   开始出现大量的超时
   2019-11-07 20:18:05    
 名字服务(自己实现的)发现10.12.145.207:12306心跳超时,本地关掉连接(代码实现的时候,stub和channel 
都是调用析构函数进行对象回收,没有close之类的接口)。
   错误一直持续
   
   
   在server端,请求是正常的(server端有其他服务也在使用),而且出问题的机器也不是全部失败,就是失败率很高,20%左右
   
   
   其中一台机器(一半的机器有问题)出问题时的日志:
   
   
   [2019-11-07 20:17:00.865998][17804][connection_pool.cc:54] Sync connection 
pool size: 1
   [2019-11-07 20:17:01.633376][ 1152][forward_table_data_manager.h:389] 
Current consume msg_count=525
   [2019-11-07 20:17:14.881232][20310][connection_pool.cc:54] Sync connection 
pool size: 1
   [2019-11-07 20:17:19.816668][ 1234][forward_table_data_manager.h:229] 
Message is HEART_BEAT, timestamp=1573129039
   [2019-11-07 20:17:24.024580][ 1260][http2_rpc_protocol.cpp:206] Unknown 
setting, id=65027 value=1
   [2019-11-07 20:17:25.304358][ 1152][forward_table_data_manager.h:389] 
Current consume msg_count=526
   [2019-11-07 20:17:26.629755][ 1288][http2_rpc_protocol.cpp:206] Unknown 
setting, id=65027 value=1
   [2019-11-07 20:17:28.404240][22809][connection_pool.cc:54] Sync connection 
pool size: 1
   [2019-11-07 20:17:41.649389][25284][connection_pool.cc:54] Sync connection 
pool size: 1
   [2019-11-07 20:17:49.494298][ 1152][forward_table_data_manager.h:389] 
Current consume msg_count=527
   [2019-11-07 20:17:55.607590][27801][connection_pool.cc:54] Sync connection 
pool size: 1
   [2019-11-07 20:17:57.247980][  876][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.145.207:12306
   [2019-11-07 20:17:57.468590][  401][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.145.207:12306
   [2019-11-07 20:17:58.866565][ 1273][creative_service.cc:190] wait future in 
FetchCreative timeout
   [2019-11-07 20:17:58.851737][  957][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.170.229:12306
   [2019-11-07 20:17:58.851717][  524][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.144.29:12306
   [2019-11-07 20:17:58.618955][  458][remote_call.cc:107] Call failed. 
error code=[E111]Fail to connect Socket{id=8590041217 addr=10.12.145.207:12306} 
(0x0x7fabd6180840): Connection refused [R1][E112]Not connected to 
10.12.145.207:12306 yet, server_id=8589934890 [R2][E112]Not connected to 
10.12.145.207:12306 yet, server_id=8589934890 [R3][E112]Not connected to 
10.12.145.207:12306 yet, server_id=8589934890
   [2019-11-07 20:17:58.851699][  453][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.144.183:12306
   [2019-11-07 20:17:58.719017][ 1260][socket.cpp:2260] Checking 
Socket{id=8589934890 addr=10.12.145.207:12306} (0x7fb17e6f5fc0)
   [2019-11-07 20:17:58.501103][ 1277][input_messenger.cpp:212] Fail to read 
from Socket{id=8589934884 fd=351 addr=10.12.145.207:12306:42632} 
(0x7fb17e6f53c0): Connection reset by peer [104]
   [2019-11-07 20:17:58.982826][  878][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.145.113:12306
   [2019-11-07 20:17:58.851709][  914][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.171.13:12306
   [2019-11-07 20:17:58.984495][  953][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.203.127:12306
   [2019-11-07 20:17:58.859853][  710][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.171.219:12306
   [2019-11-07 20:17:58.984462][ 1110][remote_call.cc:107] Call failed. error 
code=[E1008]Reached timeout=120ms @10.12.247.59:12306
   [2019-11-07 20:17:58.982813][  623][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.145.97:12306
   [2019-11-07 20:17:58.325278][ 1023][remote_call.cc:107] Call failed. error 
code=[E1008]Reached timeout=120ms @10.12.145.207:12306
   [2019-11-07 20:17:58.993119][  534][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.220.121:12306
   [2019-11-07 20:17:58.853020][  647][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.171.1:12306
   [2019-11-07 20:17:58.860050][  715][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.170.105:12306
   [2019-11-07 20:17:58.011910][  330][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.145.207:12306
   [2019-11-07 20:17:58.851717][  723][remote_call.cc:107] Call failed. 
error code=[E1008]Reached timeout=120ms @10.12.247.49:12306
   [2019-11-07 20:17:58.984458][ 1024][remote_call.cc:107] Call failed. error 
code=[E1008]Reached timeout=120ms @10.12.224.99:12306
   [2019-11-07 20:17:59.258399][ 1025][remote_call.cc:107] Call failed. error 
code=[E1008]Reached timeout=120ms @10.12.144.29:12306