Re: Client-side timeouts after dropping table

Nate McCall Tue, 20 Sep 2016 19:30:55 -0700

If you can get to them in the test env. you want to look in
o.a.c.metrics.CommitLog for:
- TotalCommitlogSize: if this hovers near commitlog_size_in_mb and never
goes down, you are thrashing on segment allocation
- WaitingOnCommit: this is the time spent waiting on calls to sync and will
start to climb real fast if you cant sync within sync_interval
- WaitingOnSegmentAllocation: how long it took to allocate a new commitlog
segment, if it is all over the place it is IO bound


Try turning all the commit log settings way down for low-IO test
infrastructure like this. Maybe total commit log size of like 32mb with 4mb
segments (or even lower depending on test data volume) so they basically
flush constantly and don't try to hold any tables open. Also lower
concurrent_writes substantially while you are at it to add some write
throttling.

On Wed, Sep 21, 2016 at 2:14 PM, John Sanda <john.sa...@gmail.com> wrote:

> I have seen in various threads on the list that 3.0.x is probably best for
> prod. Just wondering though if there is anything in particular in 3.7 to be
> weary of.
>
> I need to check with one of our QA engineers to get specifics on the
> storage. Here is what I do know. We have a blade center running lots of
> virtual machines for various testing. Some of those vm's are running
> Cassandra and the Java web apps I previously mentioned via docker
> containers. The storage is shared. Beyond that I don't have any more
> specific details at the moment. I can also tell you that the storage can be
> quite slow.
>
> I have come across different threads that talk to one degree or another
> about the flush queue getting full. I have been looking at the code in
> ColumnFamilyStore.java. Is perDiskFlushExecutors the thread pool I should
> be interested in? It uses an unbounded queue, so I am not really sure what
> it means for it to get full. Is there anything I can check or look for to
> see if writes are getting blocked?
>
> On Tue, Sep 20, 2016 at 8:41 PM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
>> If you haven't yet deployed to prod I strongly recommend *not* using 3.7.
>>
>>
>> What network storage are you using?  Outside of a handful of highly
>> experienced experts using EBS in very specific ways, it usually ends in
>> failure.
>>
>> On Tue, Sep 20, 2016 at 3:30 PM John Sanda <john.sa...@gmail.com> wrote:
>>
>>> I am deploying multiple Java web apps that connect to a Cassandra 3.7
>>> instance. Each app creates its own schema at start up. One of the schema
>>> changes involves dropping a table. I am seeing frequent client-side
>>> timeouts reported by the DataStax driver after the DROP TABLE statement is
>>> executed. I don't see this behavior in all environments. I do see it
>>> consistently in a QA environment in which Cassandra is running in docker
>>> with network storage, so writes are pretty slow from the get go. In my logs
>>> I see a lot of tables getting flushed, which I guess are all of the dirty
>>> column families in the respective commit log segment. Then I seen a whole
>>> bunch of flushes getting queued up. Can I reach a point in which too many
>>> table flushes get queued such that writes would be blocked?
>>>
>>>
>>> --
>>>
>>> - John
>>>
>>
>
>
> --
>
> - John
>



-- 
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Client-side timeouts after dropping table

Reply via email to