Re: Drop tables takes too long

2017-05-08 Thread Bohdan Tantsiura
t; what is thrown at it. It can be related to pending flushes (blocking
> writes), huge Garbage Collection (Stop The World, including writes), due to
> hardware limits (CPU busy with compactions?) or even to a too conservative
> configuration of the concurrent_write.
>
>
>> About 700 InternalResponseState pending tasks appeared on 2 nodes.
>
>
> I never had issues with this one and so didn't knew much about it. But
> according to Chris Lohfink in this post https://www.pythian.com/blog/
> guide-to-cassandra-thread-pools/#InternalResponseStage, this thread pool
> is responsible for "Responding to non-client initiated messages, including
> bootstrapping and schema checking". Which again might be related with the
> huge number of tables in the cluster. How is CPU doing, is there any burst
> in CPU that could be related to these errors?
>
> About 60 MemtableFlushWriter appeared on 3 nodes.
>
>
> What number of MemtableFlushWriter are you using. Consider increasing it
> (or maybe the memtable size).
>
>
>> There were no blocked tasks, but there were "All time blocked" tasks
>> (they were before starting dropping tables) from 3 millions to 20 millions
>> on different nodes.
>
>
> What tasks were dropped.
>
> The cluster doesn't look completely healthy, but I believe it is possible
> to improve things, before thinking about splitting tables in multiples
> cluster. I would definitely not add more tables though...
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2017-04-28 14:35 GMT+01:00 Bohdan Tantsiura <bohdan...@gmail.com>:
>
>> Thanks Alain,
>>
>> > Or is it on happening during drop table actions?
>> Some other schema changes (e.g. adding columns to tables) also takes too
>> much time.
>>
>> Link to complete set of GC options: https://pastebin.com/4qyENeyu
>>
>> > Have you had a look at logs, mainly errors and warnings?
>> In logs I found warnings of 3 types:
>> 1) ColumnFamilyStore.java:542 - Failed unregistering mbean:
>> org.apache.cassandra.db:type=Tables,keyspace=...,table=...
>>  from MigrationStage thread
>> 2) Read 1715 live rows and 1505 tombstone cells for query ...
>>  from ReadStage thread
>> 3) GCInspector.java:282 - G1 Young Generation GC in 1725ms.  G1 Eden
>> Space: 38017171456 -> 0; G1 Survivor Space: 2516582400 <(251)%20658-2400>
>> -> 2650800128; from Service Thread
>>
>> > Are they some pending, blocked or dropped tasks in thread pool stats?
>> About 3000-6000 CompactionExecutor pending tasks appeared on all nodes
>> from time to time. About 1000 MigrationStage pending tasks appeared on 2
>> nodes. About 700 InternalResponseState pending tasks appeared on 2 nodes.
>> About 60 MemtableFlushWriter appeared on 3 nodes.
>> There were no blocked tasks, but there were "All time blocked" tasks
>> (they were before starting dropping tables) from 3 millions to 20 millions
>> on different nodes.
>>
>> > Are some resources constraint (CPU / disk IO,...)?
>> CPU and disk IO are not constraint
>>
>> Thanks
>>
>> 2017-04-27 11:10 GMT+03:00 Alain RODRIGUEZ <arodr...@gmail.com>:
>>
>>> Hi
>>>
>>>
>>>> Long GC Pauses take about one minute. But why it takes so much time and
>>>> how that can be fixed?
>>>
>>>
>>> This is very long. Looks like you are having a major issue, and it is
>>> not just about dropping tables... Or is it on happening during drop table
>>> actions? Knowing the complete set of GC options in use could help here,
>>> could you paste it here (or link to it)?
>>>
>>> Also, GC is often high as a consequence of other issues and not only
>>> when 'badly‘ tuned
>>>
>>>
>>>- Have you had a look at logs, mainly errors and warnings?
>>>
>>>$ grep -e "ERROR" -e "WARN" /var/log/cassandra/system.log
>>>
>>>- Are they some pending, blocked or dropped tasks in thread pool
>>>stats?
>>>
>>>$ watch -d nodetool tpstats
>>>
>>>- Are some resources constraint (CPU / disk IO,...)?
>>>
>>>
>>> We have about 60 keyspaces with about 80 tables in each keyspace
>>>
>>> In each keyspace we also have 11 MVs
>>>
>>>
>>> Even if I believe we can dig it and maybe improve things, I agree with
>>> Carlos, this is a l

Re: Drop tables takes too long

2017-04-28 Thread Bohdan Tantsiura
Thanks Alain,

> Or is it on happening during drop table actions?
Some other schema changes (e.g. adding columns to tables) also takes too
much time.

Link to complete set of GC options: https://pastebin.com/4qyENeyu

> Have you had a look at logs, mainly errors and warnings?
In logs I found warnings of 3 types:
1) ColumnFamilyStore.java:542 - Failed unregistering mbean:
org.apache.cassandra.db:type=Tables,keyspace=...,table=...
 from MigrationStage thread
2) Read 1715 live rows and 1505 tombstone cells for query ...
 from ReadStage thread
3) GCInspector.java:282 - G1 Young Generation GC in 1725ms.  G1 Eden Space:
38017171456 -> 0; G1 Survivor Space: 2516582400 -> 2650800128; from Service
Thread

> Are they some pending, blocked or dropped tasks in thread pool stats?
About 3000-6000 CompactionExecutor pending tasks appeared on all nodes from
time to time. About 1000 MigrationStage pending tasks appeared on 2 nodes.
About 700 InternalResponseState pending tasks appeared on 2 nodes. About
60 MemtableFlushWriter appeared on 3 nodes.
There were no blocked tasks, but there were "All time blocked" tasks (they
were before starting dropping tables) from 3 millions to 20 millions on
different nodes.

> Are some resources constraint (CPU / disk IO,...)?
CPU and disk IO are not constraint

Thanks

2017-04-27 11:10 GMT+03:00 Alain RODRIGUEZ <arodr...@gmail.com>:

> Hi
>
>
>> Long GC Pauses take about one minute. But why it takes so much time and
>> how that can be fixed?
>
>
> This is very long. Looks like you are having a major issue, and it is not
> just about dropping tables... Or is it on happening during drop table
> actions? Knowing the complete set of GC options in use could help here,
> could you paste it here (or link to it)?
>
> Also, GC is often high as a consequence of other issues and not only when
> 'badly‘ tuned
>
>
>- Have you had a look at logs, mainly errors and warnings?
>
>$ grep -e "ERROR" -e "WARN" /var/log/cassandra/system.log
>
>- Are they some pending, blocked or dropped tasks in thread pool
>stats?
>
>$ watch -d nodetool tpstats
>
>- Are some resources constraint (CPU / disk IO,...)?
>
>
> We have about 60 keyspaces with about 80 tables in each keyspace
>
> In each keyspace we also have 11 MVs
>
>
> Even if I believe we can dig it and maybe improve things, I agree with
> Carlos, this is a lot of Tables (4880) and even more a high number of MV
> (660). It might be interesting splitting it somehow if possible.
>
> Cannot achieve consistency level ALL
>
>
> Finally you could try to adjust the corresponding request timeout (not
> sure if it is the global one or the truncate timeout), so it may succeed
> even when nodes are having minutes GC, but it is a workaround as this
> minute GC will most definitely be an issue for the client queries running
> (default is 10 sec timeout, so many query are probably failing).
>
> C*heers,
> -------
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2017-04-25 13:58 GMT+02:00 Bohdan Tantsiura <bohdan...@gmail.com>:
>
>> Thanks Zhao Yang,
>>
>> > Could you try some jvm tool to find out which thread are allocating
>> memory or gc? maybe the migration stage thread..
>>
>> I use Cassandra Cluster Manager to locally reproduce the issue. I tried
>> to use VisualVM to find out which threads are allocating memory, but VisualVM
>> does not see cassandra processes and says "Cannot open application with
>> pid". Then I tried to use YourKit Java Profiler. It created snapshot when
>> process of one cassandra node failed. http://i.imgur.com/9jBcjcl.png -
>> how CPU is used by threads. http://i.imgur.com/ox5Sozy.png - how memory
>> is used by threads, but biggest part of memory is used by objects without
>> allocation information. http://i.imgur.com/oqx9crX.png - which objects
>> use biggest part of memory. Maybe you know some other good jvm tool that
>> can show by which threads biggest part of memory is used?
>>
>> > BTW, is your cluster under high load while dropping table?
>>
>> LA5 was <= 5 on all nodes almost all time while dropping tables
>>
>> Thanks
>>
>> 2017-04-21 19:49 GMT+03:00 Jasonstack Zhao Yang <
>> zhaoyangsingap...@gmail.com>:
>>
>>> Hi Bohdan, Carlos,
>>>
>>> Could you try some jvm tool to find out which thread are allocating
>>> memory or gc? maybe the migration stage thread..
>>>
>>> BTW, is your cluster under high load while dropping table?
>>>
>&g

Re: Drop tables takes too long

2017-04-25 Thread Bohdan Tantsiura
Thanks Zhao Yang,

> Could you try some jvm tool to find out which thread are allocating
memory or gc? maybe the migration stage thread..

I use Cassandra Cluster Manager to locally reproduce the issue. I tried to
use VisualVM to find out which threads are allocating memory, but VisualVM
does not see cassandra processes and says "Cannot open application with
pid". Then I tried to use YourKit Java Profiler. It created snapshot when
process of one cassandra node failed. http://i.imgur.com/9jBcjcl.png - how
CPU is used by threads. http://i.imgur.com/ox5Sozy.png - how memory is used
by threads, but biggest part of memory is used by objects without
allocation information. http://i.imgur.com/oqx9crX.png - which objects use
biggest part of memory. Maybe you know some other good jvm tool that can
show by which threads biggest part of memory is used?

> BTW, is your cluster under high load while dropping table?

LA5 was <= 5 on all nodes almost all time while dropping tables

Thanks

2017-04-21 19:49 GMT+03:00 Jasonstack Zhao Yang <zhaoyangsingap...@gmail.com
>:

> Hi Bohdan, Carlos,
>
> Could you try some jvm tool to find out which thread are allocating memory
> or gc? maybe the migration stage thread..
>
> BTW, is your cluster under high load while dropping table?
>
> As far as I remember, in older c* version, it applies the schema mutation
> in memory, ie. DROP, then flush all schema info into sstable, then reads
> all on disk schema into memory (5k tables info + related column info)..
>
> > You also might need to increase the node count if you're resource
> constrained.
>
> More nodes won't help and most probably make it worse due to coordination.
>
>
> Zhao Yang
>
>
>
> On Fri, 21 Apr 2017 at 21:10 Bohdan Tantsiura <bohdan...@gmail.com> wrote:
>
>> Hi,
>>
>> Problem is still not solved. Does anybody have any idea what to do with
>> it?
>>
>> Thanks
>>
>> 2017-04-20 15:05 GMT+03:00 Bohdan Tantsiura <bohdan...@gmail.com>:
>>
>>> Thanks Carlos,
>>>
>>> In each keyspace we also have 11 MVs.
>>>
>>> It is impossible to reduce number of tables now. Long GC Pauses take
>>> about one minute. But why it takes so much time and how that can be fixed?
>>>
>>> Each node in cluster has 128GB RAM, so resources are not constrained now
>>>
>>> Thanks
>>>
>>> 2017-04-20 13:18 GMT+03:00 Carlos Rolo <r...@pythian.com>:
>>>
>>>> You have 4800 Tables in total? That is a lot of tables, plus MVs? or
>>>> MVs are already considered in the 60*80 account?
>>>>
>>>> I would recommend to reduce the table number. Other thing is that you
>>>> need to check your log file for GC Pauses, and how long those pauses take.
>>>>
>>>> You also might need to increase the node count if you're resource
>>>> constrained.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
>>>> *linkedin.com/in/carlosjuzarterolo
>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>> Mobile: +351 918 918 100 <+351%20918%20918%20100>
>>>> www.pythian.com
>>>>
>>>> On Thu, Apr 20, 2017 at 11:10 AM, Bohdan Tantsiura <bohdan...@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are using cassandra 3.10 in a 10 nodes cluster with replication =
>>>>> 3. MAX_HEAP_SIZE=64GB on all nodes, G1 GC is used. We have about 60
>>>>> keyspaces with about 80 tables in each keyspace. We had to delete three
>>>>> tables and two materialized views from each keyspace. It began to take 
>>>>> more
>>>>> and more time for each next keyspace (for some keyspaces it took about 30
>>>>> minutes) and then failed with "Cannot achieve consistency level ALL". 
>>>>> After
>>>>> restarting the same repeated. It seems that cassandra hangs on GC. How 
>>>>> that
>>>>> can be solved?
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>
>>


Re: Drop tables takes too long

2017-04-21 Thread Bohdan Tantsiura
Hi,

Problem is still not solved. Does anybody have any idea what to do with it?

Thanks

2017-04-20 15:05 GMT+03:00 Bohdan Tantsiura <bohdan...@gmail.com>:

> Thanks Carlos,
>
> In each keyspace we also have 11 MVs.
>
> It is impossible to reduce number of tables now. Long GC Pauses take about
> one minute. But why it takes so much time and how that can be fixed?
>
> Each node in cluster has 128GB RAM, so resources are not constrained now
>
> Thanks
>
> 2017-04-20 13:18 GMT+03:00 Carlos Rolo <r...@pythian.com>:
>
>> You have 4800 Tables in total? That is a lot of tables, plus MVs? or MVs
>> are already considered in the 60*80 account?
>>
>> I would recommend to reduce the table number. Other thing is that you
>> need to check your log file for GC Pauses, and how long those pauses take.
>>
>> You also might need to increase the node count if you're resource
>> constrained.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
>> *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Mobile: +351 918 918 100 <+351%20918%20918%20100>
>> www.pythian.com
>>
>> On Thu, Apr 20, 2017 at 11:10 AM, Bohdan Tantsiura <bohdan...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We are using cassandra 3.10 in a 10 nodes cluster with replication = 3.
>>> MAX_HEAP_SIZE=64GB on all nodes, G1 GC is used. We have about 60 keyspaces
>>> with about 80 tables in each keyspace. We had to delete three tables and
>>> two materialized views from each keyspace. It began to take more and more
>>> time for each next keyspace (for some keyspaces it took about 30 minutes)
>>> and then failed with "Cannot achieve consistency level ALL". After
>>> restarting the same repeated. It seems that cassandra hangs on GC. How that
>>> can be solved?
>>>
>>> Thanks
>>>
>>
>>
>> --
>>
>>
>>
>>
>


Re: Drop tables takes too long

2017-04-20 Thread Bohdan Tantsiura
Thanks Carlos,

In each keyspace we also have 11 MVs.

It is impossible to reduce number of tables now. Long GC Pauses take about
one minute. But why it takes so much time and how that can be fixed?

Each node in cluster has 128GB RAM, so resources are not constrained now

Thanks

2017-04-20 13:18 GMT+03:00 Carlos Rolo <r...@pythian.com>:

> You have 4800 Tables in total? That is a lot of tables, plus MVs? or MVs
> are already considered in the 60*80 account?
>
> I would recommend to reduce the table number. Other thing is that you need
> to check your log file for GC Pauses, and how long those pauses take.
>
> You also might need to increase the node count if you're resource
> constrained.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +351 918 918 100 <+351%20918%20918%20100>
> www.pythian.com
>
> On Thu, Apr 20, 2017 at 11:10 AM, Bohdan Tantsiura <bohdan...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We are using cassandra 3.10 in a 10 nodes cluster with replication = 3.
>> MAX_HEAP_SIZE=64GB on all nodes, G1 GC is used. We have about 60 keyspaces
>> with about 80 tables in each keyspace. We had to delete three tables and
>> two materialized views from each keyspace. It began to take more and more
>> time for each next keyspace (for some keyspaces it took about 30 minutes)
>> and then failed with "Cannot achieve consistency level ALL". After
>> restarting the same repeated. It seems that cassandra hangs on GC. How that
>> can be solved?
>>
>> Thanks
>>
>
>
> --
>
>
>
>


Drop tables takes too long

2017-04-20 Thread Bohdan Tantsiura
Hi,

We are using cassandra 3.10 in a 10 nodes cluster with replication = 3.
MAX_HEAP_SIZE=64GB on all nodes, G1 GC is used. We have about 60 keyspaces
with about 80 tables in each keyspace. We had to delete three tables and
two materialized views from each keyspace. It began to take more and more
time for each next keyspace (for some keyspaces it took about 30 minutes)
and then failed with "Cannot achieve consistency level ALL". After
restarting the same repeated. It seems that cassandra hangs on GC. How that
can be solved?

Thanks