Re: Understanding Blocked and All Time Blocked columns in tpstats

John Sanda Fri, 23 Mar 2018 11:37:08 -0700

We do small inserts. For a modest size environment we do about 90,000
inserts every 30 seconds. For a larger environment, we could be doing
300,000 or more inserts every 30 seconds. In earlier versions of the
project, each insert was a separate request as each insert targets a
different partition. In more recent versions though, we introduced micro
batching. We batch up to about 25 inserts where inserts are grouped by
token range. Even though batches are used, I assume that does not reduce
the overall number of inserts or mutations. Inserts are always async,
prepared statements. Client code is written with RxJava which makes doing
async, concurrent writes a lot easier.


On Fri, Mar 23, 2018 at 1:29 PM, Chris Lohfink <clohf...@apple.com> wrote:

> Increasing queue would increase the number of requests waiting. It could
> make GCs worse if the requests are like large INSERTs, but for a lot of
> super tiny queries it helps to increase queue size (to a point). Might want
> to look into what and how queries are being made, since there are possibly
> options to help with that (ie prepared queries, what queries are, limiting
> number of async inflight queries)
>
> Chris
>
>
> On Mar 23, 2018, at 11:42 AM, John Sanda <john.sa...@gmail.com> wrote:
>
> Thanks for the explanation. In the past when I have run into problems
> related to CASSANDRA-11363, I have increased the queue size via the
> cassandra.max_queued_native_transport_requests system property. If I find
> that the queue is frequently at capacity, would that be an indicator that
> the node is having trouble keeping up with the load? And if so, will
> increasing the queue size just exacerbate the problem?
>
> On Fri, Mar 23, 2018 at 11:51 AM, Chris Lohfink <clohf...@apple.com>
> wrote:
>
>> It blocks the caller attempting to add the task until theres room in
>> queue, applying back pressure. It does not reject it. It mimics the
>> behavior from pre-SEP DebuggableThreadPoolExecutor's
>> RejectionExecutionHandler that the other thread pools use (exception on
>> sampling/trace which just throw away on rejections).
>>
>> Worth noting this is only really possible in the native transport pool
>> (sep pool) last I checked. Since 2.1 at least, before that there were a few
>> others. That changes version to version. For (basically) all other thread
>> pools the queue is limited by memory.
>>
>> Chris
>>
>>
>> On Mar 22, 2018, at 10:44 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>> I have been doing some work on a cluster that is impacted by
>> https://issues.apache.org/jira/browse/CASSANDRA-11363. Reading through
>> the ticket prompted me to take a closer look at
>> org.apache.cassandra.concurrent.SEPExecutor. I am looking at the 3.0.14
>> code. I am a little confused about the Blocked and All Time Blocked columns
>> reported in nodetool tpstats and reported by StatusLogger. I understand
>> that there is a queue for tasks. In the case of RequestThreadPoolExecutor,
>> the size of that queue can be controlled via the
>> cassandra.max_queued_native_transport_requests system property.
>>
>> I have been looking at SEPExecutor.addTask(FutureTask<?> task), and here
>> is my question. If the queue is full, as defined by
>> SEPExector.maxTasksQueued, are tasks rejected? I do not fully grok the
>> code, but it looks like it is possible for tasks to be rejected here (some
>> code and comments omitted for brevity):
>>
>> public void addTask(FutureTask<?> task)
>> {
>>     tasks.add(task);
>>     ...
>>     else if (taskPermits >= maxTasksQueued)
>>     {
>>         WaitQueue.Signal s = hasRoom.register();
>>
>>         if (taskPermits(permits.get()) > maxTasksQueued)
>>         {
>>             if (takeWorkPermit(true))
>>                 pool.schedule(new Work(this))
>>
>>             metrics.totalBlocked.inc();
>>             metrics.currentBlocked.inc();
>>             s.awaitUninterruptibly();
>>             metrics.currentBlocked.dec();
>>         }
>>         else
>>             s.cancel();
>>     }
>> }
>>
>> The first thing that happens is that the task is added to the tasks
>> queue. pool.schedule() only gets called if takeWorkPermit() returns true. I
>> am still studying the code, but can someone explain what exactly happens
>> when the queue is full?
>>
>>
>> - John
>>
>>
>>
>
>
> --
>
> - John
>
>
>


-- 

- John

Re: Understanding Blocked and All Time Blocked columns in tpstats

Reply via email to