Re: The service queue is full; it has 400 items.. Retrying in the next heartbeat period.

Lee King Tue, 07 Nov 2017 18:29:43 -0800

We change the configure rpc_service_queue_length=1024, rpc_num_service_threa
ds=128,consensus_rpc_timeout_ms=30000, the kudu cluster looks work well now.


2017-11-04 5:15 GMT+00:00 Todd Lipcon <[email protected]>:

> One thing you might try is to update the consensus rpc timeout to 30
> seconds instead of 1. We changed the default in later versions.
>
> I'd also recommend updating up 1.4 or 1.5 for other related fixes to
> consensus stability. I think I recall you were on 1.3 still?
>
> Todd
>
>
> On Nov 3, 2017 7:47 PM, "Lee King" <[email protected]> wrote:
>
> Hi,
>     Our kudu cluster have ran well a long time,  but write became slowly
> recently,client also come out rpc timeout. I check the warning and find
> vast error look this:
> W1104 10:25:16.833736 10271 consensus_peers.cc:365] T
> 149ffa58ac274c9ba8385ccfdc01ea14 P 59c768eb799243678ee7fa3f83801316 ->
> Peer 1c67a7e7ff8f4de494469766641fccd1 (cloud-sk-ds-08:7050): Couldn't
> send request to peer 1c67a7e7ff8f4de494469766641fccd1 for tablet
> 149ffa58ac274c9ba8385ccfdc01ea14. Status: Timed out: UpdateConsensus RPC
> to 10.6.60.9:7050 timed out after 1.000s (SENT). Retrying in the next
> heartbeat period. Already tried 5 times.
>     I change the configure rpc_service_queue_le
> ngth=400,rpc_num_service_threads=40, but it takes no effect.
>     Our cluster include 5 master , 10 ts. 3800G data, 800 tablet per ts. I
> check one of the ts machine's memory, 14G left(128 In all), thread 4739(max
> 32000), openfile 28000(max 65536), cpu disk utilization ratio about
> 30%(32 core), disk util  less than 30%.
>     Any suggestion for this? Thanks!
>
>
>

Re: The service queue is full; it has 400 items.. Retrying in the next heartbeat period.

Reply via email to