Bit more information. Using jmxterm and inspecting the state of a node when
it's "slow" playing hints, I can see the following from the node that has
hints to play:

$>get MaxHintsInProgress
#mbean = org.apache.cassandra.db:type=StorageProxy:
MaxHintsInProgress = 2048;

$>get HintsInProgress
#mbean = org.apache.cassandra.db:type=StorageProxy:
HintsInProgress = 0;

$>get TotalHints
#mbean = org.apache.cassandra.db:type=StorageProxy:
TotalHints = 129687;

Is there some throttling that would cause hints to not be played at all if,
for instance, the cluster has enough load or something related to a timeout
setting?

On Fri, Oct 27, 2017 at 1:49 AM, Andrew Bialecki <
andrew.biale...@klaviyo.com> wrote:

> We have a 96 node cluster running 3.11 with 256 vnodes each. We're running
> a rolling restart. As we restart nodes, we notice that each node takes a
> while to have all other nodes be marked as up and this corresponds to nodes
> that haven't finished playing hints.
>
> We looked at the hinted handoff throttling, noticed it was still the
> default of 1024, so we tried to turn it off by setting it to zero. Reading
> the source, it looks like that rate limiting won't take affect until the
> current set of hints have finished. So we made that change cluster wide and
> then restarted the next node. However, we still saw the same issue.
>
> Looking at iftop and network throughput, it's very low (~10kB/s) and
> therefore the few 100k of hints that accumulate while the node is restart
> end up take several minutes to get sent.
>
> Any other knobs we should be tuning to increase hinted handoff throughput?
> Or other reasons why hinted handoff runs so slowly?
>
> --
> Andrew Bialecki
>



-- 
Andrew Bialecki

<https://www.klaviyo.com/>

Reply via email to