We change the configure rpc_service_queue_length=1024, rpc_num_service_threa ds=128,consensus_rpc_timeout_ms=30000, the kudu cluster looks work well now.
2017-11-04 5:15 GMT+00:00 Todd Lipcon <t...@cloudera.com>: > One thing you might try is to update the consensus rpc timeout to 30 > seconds instead of 1. We changed the default in later versions. > > I'd also recommend updating up 1.4 or 1.5 for other related fixes to > consensus stability. I think I recall you were on 1.3 still? > > Todd > > > On Nov 3, 2017 7:47 PM, "Lee King" <yuyunliu...@gmail.com> wrote: > > Hi, > Our kudu cluster have ran well a long time, but write became slowly > recently,client also come out rpc timeout. I check the warning and find > vast error look this: > W1104 10:25:16.833736 10271 consensus_peers.cc:365] T > 149ffa58ac274c9ba8385ccfdc01ea14 P 59c768eb799243678ee7fa3f83801316 -> > Peer 1c67a7e7ff8f4de494469766641fccd1 (cloud-sk-ds-08:7050): Couldn't > send request to peer 1c67a7e7ff8f4de494469766641fccd1 for tablet > 149ffa58ac274c9ba8385ccfdc01ea14. Status: Timed out: UpdateConsensus RPC > to 10.6.60.9:7050 timed out after 1.000s (SENT). Retrying in the next > heartbeat period. Already tried 5 times. > I change the configure rpc_service_queue_le > ngth=400,rpc_num_service_threads=40, but it takes no effect. > Our cluster include 5 master , 10 ts. 3800G data, 800 tablet per ts. I > check one of the ts machine's memory, 14G left(128 In all), thread 4739(max > 32000), openfile 28000(max 65536), cpu disk utilization ratio about > 30%(32 core), disk util less than 30%. > Any suggestion for this? Thanks! > > >