Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Peter Schuller
[4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence then node is technically Down? Correct? No; the concurrent mark/sweep phase runs concurrently with your application. CMS will cause a stop-the-world full pause it it fails to complete a CMS

Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Gustavo Scudeler
Hi guys, We have a 6 node Cassandra Cluster under heavy utilization. We have been dealing a lot with garbage collector stop the world event, which can take up to 50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not even accepting new logins. Extra details: - Cassandra

Concurrent Mark Sweep taking 12 seconds

2011-05-16 Thread Héctor Izquierdo Seliva
Hi everyone. I see in the logs that Concurrent Mark Sweep is taking 12 seconds to do its stuff. Is this normal? There is no stop-the-world GC, it just takes 12 seconds. Configuration: 0.7.5 , 8GB Heap, 16GB machines. 7 * 64 MB memtables.

CMS GC initial-mark taking 6 seconds , bad?

2011-09-25 Thread Yang
I see the following in my GC log 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00, real=6.07 secs] so there is a stop-the-world period of 6 seconds. does this sound bad ? or 6 seconds is OK and we should expect the built

Re: Nodes frozen in GC

2011-03-08 Thread Peter Schuller
up to the long stop-the-world pause. -- / Peter Schuller

Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
Hi everyone. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed

Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread kurt greaves
Have you tried CMS with that sized heap? G1 is only really worthwhile with 24gb+ heap size, which wouldn't really make sense on machines with 28gb of RAM. In general CMS is found to work better for C*, leaving excess memory to be utilised by the OS page cache​

Re: Reduce Cassandra GC

2013-06-17 Thread Joel Samuelsson
at all during the stop-the-world freezes. Was that what you wanted me to try? Uncomment the followings in cassandra-env.sh. Done. Will post results as soon as I get a new stop-the-world gc. If you are unable to find a JIRA, file one Unless this turns out to be a problem on my end, I will.

Re: Concurrent Mark Sweep taking 12 seconds

2011-05-16 Thread Jonathan Ellis
Yes. 2011/5/16 Héctor Izquierdo Seliva izquie...@strands.com: Hi everyone. I see in the logs that Concurrent Mark Sweep is taking 12 seconds to do its stuff. Is this normal? There is no stop-the-world GC, it just takes 12 seconds. Configuration: 0.7.5 , 8GB Heap, 16GB machines. 7 * 64 MB

Re: live data migration from mysql to cassandra

2011-01-14 Thread Edward Capriolo
On Fri, Jan 14, 2011 at 10:40 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello Dear community please share your experience, home you make live(without stop) migration from mysql or other RDBM to cassandra There is no built in way to do this. I remember hearing at hadoop world this year

Re: Nodes frozen in GC

2011-03-07 Thread ruslan usifov
2011/3/8 Jonathan Ellis jbel...@gmail.com It sounds like you're complaining that the JVM sometimes does stop-the-world GC. You can mitigate this but not (for most workloads) eliminate it with GC option tuning. That's simply the state of the art for Java garbage collection right now. Hm

Re: Issue with removing a node and adding it back

2015-03-30 Thread Robert Coli
either flaky network (AWS) or stop-the-world GC) and fix that OR 2) try tuning streaming_socket_timeout_in_ms =Rob

Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
the world. But I think it is not stop the world, but only stop the new world. For example in case of Cassandra, a large number of in_memory_compaction_limit_in_mb can cause this. This is a limit when a compaction compacts(merges) rows of a key into the latest in memory. So this creates a large byte array

Re: Nodes frozen in GC

2011-03-08 Thread Peter Schuller
this btw, although it has nothing to do with the problem being investigated in this thread: It's not about how *much* time is spent on memory management. That is of course relevant, but the issue here is to avoid long stop-the-world pauses. Even if you're avoiding doing allocation, as long

Re: Reduce Cassandra GC

2013-06-15 Thread Mohit Anchlia
can not be promoted to Old Generation because it requires such a large *contiguous* memory space that is unavailable at the point in time. This is called promotion failure. So it has to wait until concurrent collector collects a large enough space. Thus you experience stop the world. But I

Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
at the point in time. This is called promotion failure. So it has to wait until concurrent collector collects a large enough space. Thus you experience stop the world. But I think it is not stop the world, but only stop the new world. For example in case of Cassandra, a large number

Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
experience stop the world. But I think it is not stop the world, but only stop the new world. For example in case of Cassandra, a large number of in_memory_compaction_limit_in_mb can cause this. This is a limit when a compaction compacts(merges) rows of a key into the latest in memory. So

Re: Predictable low RW latency, SLABS and STW GC

2011-07-24 Thread aaron morton
Restarting the service will drop all the memmapped caches, cassandra caches are saved / persistent and you can also use memcachd if you want. Are you experiencing stop the world pauses? There are some things that can be done to reduce the chance of them happening. Cheers

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-09-25 Thread aaron morton
wrote: I see the following in my GC log 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00, real=6.07 secs] so there is a stop-the-world period of 6 seconds. does this sound bad ? or 6 seconds is OK and we should

Re: Cassandra out of Heap memory

2012-06-14 Thread rohit bhatia
generation runs out of memory to migrate objects to the old generation (a so-called concurrent mode failure), leading to stop-the-world full garbage collection. However, with a slightly lower setting of the CMS threshold, we get a bit more headroom, and more stable overall performance. I see

Re: Nodes frozen in GC

2011-03-07 Thread Chris Goffinet
stop-the-world GC. You can mitigate this but not (for most workloads) eliminate it with GC option tuning. That's simply the state of the art for Java garbage collection right now. Hm, but what to do in this cases?? In these moments throughput of cluster degrade, and I misunderstand what

Re: Upgrade to a different version?

2011-03-17 Thread Paul Pak
? I ask because I'm wondering how you have managed to deal with the stop-the-world garbage collection issues that seems to hit most clusters that have significant load and cause application timeouts. Have you found that cassandra scales in read/write capacity reasonably well as you add nodes? Also

Re: RE: batch_mutate failed: out of sequence response

2011-04-07 Thread Héctor Izquierdo Seliva
, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was instead of being dropped by the server?

Re: RE: batch_mutate failed: out of sequence response

2011-04-07 Thread Héctor Izquierdo Seliva
, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client

Re: Should one expect to see hints being stored/delivered occasionally?

2015-01-20 Thread Robert Coli
of this context is a warning sign that something may be wrong with your cluster. Probably what is happening is that you have stop the world GCs long enough to trigger queueing of hints via timeouts during these GCs. =Rob

Re: How to add a node with zero downtime

2017-03-21 Thread daemeon reiydelle
Possible areas to check: - too few nodes (node overload) - you did not indicate either replication factor, number of nodes. Assume nodes are *rather* full. - network overload (check your TORS's errors, also the tcp stats on the relevant nodes) - look for stop the world garbage collection

RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
scudel...@gmail.com] Sent: Montag, 09. Oktober 2017 13:12 To: user@cassandra.apache.org Subject: Cassandra and G1 Garbage collector stop the world event (STW) Hi guys, We have a 6 node Cassandra Cluster under heavy utilization. We have been dealing a lot with garbage collector stop the world event,

Re: Cassandra GC Settings

2011-01-17 Thread Peter Schuller
column families characterized by fairly fat rows (10 mb mean size, max sizes 150-200 mb, up to a million+ columns per row). My theory is that each row being compacted with the old settings was being promoted to the old generation, thereby running the heap out of space and causing a stop the world

Reduce Cassandra GC

2013-04-16 Thread Joel Samuelsson
. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full? I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average key cache hit ratio during the run

Re: Reduce Cassandra GC

2013-06-17 Thread Joel Samuelsson
anything in the environment config up until now. Also can you take a heap dump at 2 diff points so that we can compare it? I can't access the machine at all during the stop-the-world freezes. Was that what you wanted me to try? Uncomment the followings in cassandra-env.sh. Done. Will post

Re: Reduce Cassandra GC

2013-06-17 Thread Takenori Sato
dump at 2 diff points so that we can compare it? I can't access the machine at all during the stop-the-world freezes. Was that what you wanted me to try? Uncomment the followings in cassandra-env.sh. Done. Will post results as soon as I get a new stop-the-world gc. If you are unable

One node misbehaving (lot's of GC), ideas?

2015-04-15 Thread Erik Forsberg
Hi! We having problems with one node (out of 56 in total) misbehaving. Symptoms are: * High number of full CMS old space collections during early morning when we're doing bulkloads. Yes, bulkloads, not CQL, and only a few thrift insertions. * Really long stop-the-world GC events (I've seen up

RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
Hi, my previously mentioned G1 bug does not seem to be related to your case Thomas From: Gustavo Scudeler [mailto:scudel...@gmail.com] Sent: Montag, 09. Oktober 2017 15:13 To: user@cassandra.apache.org Subject: Re: Cassandra and G1 Garbage collector stop the world event (STW) Hello, @kurt

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-29 Thread Carsten Krebs
efficient the GC (if the application behaves according to the weak generational hypothesis - google it if you want a ref) because less data is promoted to old gen and because the overhead of stop-the-world is lessened. (3) The larger the young generation, the longer the pause times to do

RE: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Moshe Kranc
: Wednesday, May 13, 2015 10:40 PM To: user@cassandra.apache.org Subject: Re: Viewing Cassandra's Internal table Structure in a CQL world I think that you can still use cassandra-cli from 2.0.x to look into internal table structure. Of course you will see bytes instead of readable values but it's

Re: LCS and counters

2013-03-05 Thread Alain RODRIGUEZ
be appreciated. 2013/2/25 Janne Jalkanen janne.jalka...@ecyrd.com At least for our use case (reading slices from varyingly sized rows from 10-100k composite columns with counters and hundreds of writes/second) LCS has a nice ~75% lower read latency than Size Tiered. And compactions don't stop the world

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-10-20 Thread Maxim Potekhin
there is a stop-the-world period of 6 seconds. does this sound bad ? or 6 seconds is OK and we should expect the built-in fault-tolerance of Cassandra handle this? Thanks Yang

Re: better anti OOM

2011-12-27 Thread Edward Capriolo
of a stop the world garbage collection. Also less free space usually means more memory fragmentation and causes your system to work harder CPU. it is counter intuitive to leave free memory because you want to get the large caches etc, but the overhead gives more stability which in the end gives better

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-08 Thread Jonathan Ellis
mlockall can tag the entire heap as don't swap this out on startup. Secondarily whenever the heap resizes upwards the JVM does a stop-the-world gc, but no, not really a big deal when your uptime is in days or weeks. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano

Re: Nodes frozen in GC

2011-03-07 Thread Jonathan Ellis
It sounds like you're complaining that the JVM sometimes does stop-the-world GC. You can mitigate this but not (for most workloads) eliminate it with GC option tuning. That's simply the state of the art for Java garbage collection right now. On Sun, Mar 6, 2011 at 2:18 AM, ruslan usifov

Re: Nodes frozen in GC

2011-03-07 Thread Paul Pak
...@gmail.com It sounds like you're complaining that the JVM sometimes does stop-the-world GC. You can mitigate this but not (for most workloads) eliminate it with GC option tuning. That's simply the state of the art for Java garbage collection right now. Hm, but what

RE: Nodes frozen in GC

2011-03-10 Thread Gregory Szorc
once gave the JVM 30GB of heap and saw it run through the entire heap in a few seconds while doing a compaction! It would continuously blow through the heap, incur a stop-the-world collection, and repeat. Meanwhile, the listed compacted bytes from the JMX interface was never increasing

RE: batch_mutate failed: out of sequence response

2011-04-05 Thread Héctor Izquierdo Seliva
I'm still running into problems. Now I don't write more than 100 columns at a time, and I'm having lots of Stop-the-world gc pauses. I'm writing into three column families, with memtable_operations = 0.3 and memtable_throughput = 64. Is any of this wrong? -Original Message- From

RE: batch_mutate failed: out of sequence response

2011-04-05 Thread Héctor Izquierdo Seliva
Update with more info: I'm still running into problems. Now I don't write more than 100 columns at a time, and I'm having lots of Stop-the-world gc pauses. I'm writing into three column families, with memtable_operations = 0.3 and memtable_throughput = 64. There is now swapping, and full GCs

Re: RE: batch_mutate failed: out of sequence response

2011-04-07 Thread Dan Washusen
, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client assumed it had timed out?

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed to work or have I done something wrong? Thanks!

Re: Tombstones and memtable_operations

2011-04-19 Thread aaron morton
deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed to work or have I done something wrong? Thanks!

Weird GC

2014-01-29 Thread Joel Samuelsson
Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time from

Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Moshe Kranc
you tune your queries for best performance. To date, I have been using cassandra-cli to view the table's internal structure. But, I get bombarded with all kinds of warnings about how I should switch to CQL and stop using a deprecated product. My question: After the revolution (once Cassandra

Re: scylladb

2017-03-11 Thread Kant Kodali
@Dor 1) You guys have a CPU scheduler? you mean user level thread Scheduler that maps user level threads to kernel level threads? I thought C++ by default creates native kernel threads but sure nothing will stop someone to create a user level scheduling library if that's what you are talking

Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Chris Lohfink
> 2MB, you might end up in something >> called “humongous” allocations, spanning several G1 regions. If this >> happens in a very short very frequently and depending on your allocation >> rate in MB/s, a combination of the G1 bug and a small heap, might result >> going towards OOM. >

Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Gustavo Scudeler
> > > Regards, > > Thomas > > > > *From:* Gustavo Scudeler [mailto:scudel...@gmail.com] > *Sent:* Montag, 09. Oktober 2017 13:12 > *To:* user@cassandra.apache.org > *Subject:* Cassandra and G1 Garbage collector stop the world event (STW) > > > > H

Re: Cassandra out of Heap memory

2012-06-17 Thread rohit bhatia
I am using 1.0.5 . The logs suggest that it was one single instance of failure and I'm unable to reproduce it. From the logs, In a span of 30 seconds, heap usage went from 4.8 gb to 8.8 gb With stop-the-world gc running 20 times. I believe that parNew was unable to clean up memory due to some

Re: One node misbehaving (lot's of GC), ideas?

2015-04-15 Thread Michal Michalski
...@opera.com wrote: Hi! We having problems with one node (out of 56 in total) misbehaving. Symptoms are: * High number of full CMS old space collections during early morning when we're doing bulkloads. Yes, bulkloads, not CQL, and only a few thrift insertions. * Really long stop-the-world GC

Re: Nodes frozen in GC

2011-03-07 Thread Paul Pak
wrote: It sounds like you're complaining that the JVM sometimes does stop-the-world GC. You can mitigate this but not (for most workloads) eliminate it with GC option tuning. That's simply the state of the art for Java garbage collection right now. On Sun, Mar 6, 2011 at 2:18 AM, ruslan

Re: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Jonathan Haddad
, 2015 10:40 PM To: user@cassandra.apache.org Subject: Re: Viewing Cassandra's Internal table Structure in a CQL world I think that you can still use cassandra-cli from 2.0.x to look into internal table structure. Of course you will see bytes instead of readable values but it's better than

Re: Flush / Snapshot Triggering Full GCs, Leaving Ring

2011-04-06 Thread Jonathan Ellis
. Either way: concurrent mode failure is the easy GC problem. Hopefully you really are seeing mostly that -- this means the JVM didn't start CMS early enough, so it ran out of space before it could finish the concurrent collection, so it falls back to stop-the-world. The fix is a combination of reducing

RE: Reduce Cassandra GC

2013-04-16 Thread Viktor Jevdokimov
get no Heap is X full... messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full? I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average key

Re: Reduce Cassandra GC

2013-04-16 Thread Joel Samuelsson
is X full... messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full? ** ** I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-09-25 Thread Peter Schuller
I see the following in my GC log 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00, real=6.07 secs] so there is a stop-the-world period of 6 seconds. does this sound bad ? or 6 seconds is OK  and we should expect

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-09-25 Thread Yang
...@infidyne.com wrote: I see the following in my GC log 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00, real=6.07 secs] so there is a stop-the-world period of 6 seconds. does this sound bad ? or 6 seconds is OK  and we

TimedOutException caused by Stop the world activity

2012-05-27 Thread Jason Tang
. And when I have 1G memory 32 bit cassandra on standalone model, I didn't find so frequently Stop the world behavior. So I wonder what kind of operation will hang the cassandra system. How to collect information for tuning. From the system log and document, I guess there are three type operations: 1

Flush / Snapshot Triggering Full GCs, Leaving Ring

2011-04-06 Thread C. Scott Andreas
logs suggest that calling nodetool snapshot on a node is triggering 12 to 16 second CMS GCs and a promotion failure resulting in a full stop-the-world collection, during which the node is marked dead by the ring until re-joining shortly after. Here's a log from one of the nodes, along with system

Re: Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

2016-02-19 Thread daemeon reiydelle
timeouts. You may not be seeing a 2.0 vs. 2.1 issue, rather a 2.1 issue proper. While others did not find this associated with stop-the-world GC, I saw some evidence of same (using Cassandra stress, but I recently reproduce the issue with YCSB!) *...* *Daemeon C.M. ReiydelleUSA (+1

Re: CMS GC initial-mark taking 6 seconds , bad?

2011-10-20 Thread Todd Burruss
: user=6.10 sys=0.00, real=6.07 secs] so there is a stop-the-world period of 6 seconds. does this sound bad ? or 6 seconds is OK and we should expect the built-in fault-tolerance of Cassandra handle this? Thanks Yang

Re: Garbage collection freezes cassandra node

2011-12-19 Thread Peter Schuller
and is not in and of itself indicative of a stop-the-world 10 second pause. It is fully expected using the CMS collector that you'll have a sawtooth pattern as young gen is being collected, and then a sudden drop as CMS does its job concurrently without pausing the application for a long period of time. I will second

Re: Cassandra stress test and max vs. average read/write latency.

2011-12-19 Thread Peter Schuller
workload is such that you are suffering from fragmentation and eventually seeing Cassandra fall back to full compacting GC:s (stop-the-world) for the old generation. I would start by adjusting young gen so that your frequent pauses are at an acceptable level, and then see whether or not you can sustain

RE: Garbage collection freezes cassandra node

2011-12-23 Thread Rene Kochen
expected, and most of CMS is concurrent and implies only short pauses. A full pause can happen, but that log entry is expected and is not in and of itself indicative of a stop-the-world 10 second pause. It is fully expected using the CMS collector that you'll have a sawtooth pattern as young gen

Re: row_cache_provider = 'SerializingCacheProvider'

2012-06-04 Thread ruslan usifov
on nodetool -h localhost cfhistograms i calc avg row size 70KB I setup row cache only for one CF with follow settings: update column family building with rows_cached=1 and row_cache_provider='SerializingCacheProvider'; When i setup row cache i got promotion failure in GC (with stop the world pause

Re: row_cache_provider = 'SerializingCacheProvider'

2012-06-04 Thread ruslan usifov
with rows_cached=1 and row_cache_provider='SerializingCacheProvider'; When i setup row cache i got promotion failure in GC (with stop the world pause about 30secs) with almost HEAP filled. I very confused with this behavior. PS: i use cassandra 1.0.10, with JNA 3.4.0 on ubuntu lucid (kernel

Re: Cassandra out of Heap memory

2012-06-17 Thread aaron morton
the highest peak heap usage. The problem with this is that it raises the possibility that during the CMS cycle, a collection of the young generation runs out of memory to migrate objects to the old generation (a so-called concurrent mode failure), leading to stop-the-world full garbage collection

Node doesn't rejoin ring after restart

2012-08-03 Thread Edward Sargisson
on the assumption that a small outage on one node shouldn't cause extraordinary action. Nor do I want to have to stop every node before bringing them up one by one. What am I missing? Am I forced into those time consuming methods every time I want to restart? Thoughts? Cheers, Edward -- Edward

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-09 Thread Peter Schuller
The main reason to set Xms=Xmx is so mlockall can tag the entire heap as don't swap this out on startup.  Secondarily whenever the heap resizes upwards the JVM does a stop-the-world gc, but no, not really a big deal when your uptime is in days or weeks. I'm not sure where this is coming from

Re: Nodes frozen in GC

2011-03-08 Thread Peter Schuller
shouldn't be seeing very frequent long-term pauses due to GC. If your workload is not doing something particularly unusual, even if you do end up triggering a GC fallback to full stop-the-world GC, it should happen comparatively infrequently and you should be seeing many CMS cycles in between each

Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
how you have managed to deal with the stop-the-world garbage collection issues that seems to hit most clusters that have significant load and cause application timeouts. Have you found that cassandra scales in read/write capacity reasonably well as you add nodes? Also, you may also want

Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
case a typical web app or something like a scientific/data mining app? I ask because I'm wondering how you have managed to deal with the stop-the-world garbage collection issues that seems to hit most clusters that have significant load and cause application timeouts. Have you found that cassandra

Re: batch_mutate failed: out of sequence response

2011-04-05 Thread Jonathan Ellis
Step 1: disable swap. 2011/4/5 Héctor Izquierdo Seliva izquie...@strands.com: Update with more info: I'm still running into problems. Now I don't write more than 100 columns at a time, and I'm having lots of Stop-the-world gc pauses. I'm writing into three column families

Disable Swap? batch_mutate failed: out of sequence response

2011-04-05 Thread Jonathan Colby
at a time, and I'm having lots of Stop-the-world gc pauses. I'm writing into three column families, with memtable_operations = 0.3 and memtable_throughput = 64. There is now swapping, and full GCs are taking around 5 seconds. I'm running cassandra with a heap of 8 GB. Should I tune

Re: Flush / Snapshot Triggering Full GCs, Leaving Ring

2011-04-07 Thread ruslan usifov
to stop-the-world. The fix is a combination of reducing XX:CMSInitiatingOccupancyFraction and (possibly) increasing heap capacity if your heap is simply too full too much of the time. You can also mitigate it by increasing the phi threshold for the failure detector, so the node doing the GC doesn't

Re: Flush / Snapshot Triggering Full GCs, Leaving Ring

2011-04-07 Thread Jonathan Ellis
the JVM didn't start CMS early enough, so it ran out of space before it could finish the concurrent collection, so it falls back to stop-the-world. The fix is a combination of reducing XX:CMSInitiatingOccupancyFraction and (possibly) increasing heap capacity if your heap is simply too full too much

RE: batch_mutate failed: out of sequence response

2011-04-12 Thread Stephen McKamey
not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client assumed it had timed out?

Re: RE: batch_mutate failed: out of sequence response

2011-04-18 Thread Dan Washusen
application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client

Re: RE: batch_mutate failed: out of sequence response

2011-04-18 Thread Jonathan Ellis
a single connection from multiple threads. don't do that. I'm not using thrift directly, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed to work or have I done something wrong? Thanks!

Re: Tombstones and memtable_operations

2011-04-19 Thread shimi
, Héctor Izquierdo Seliva escribió: Hi everyone. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Eric Halpern
. During this failure concurrent GC completely stops java program (i.e. cassandra) and does a GC cycle. Other cassandra nodes discover, that node is not responding and considering it dead. If concurrent GC is properly tuned, it should never do stop-the-world and GC ( thats why it is called

Re: Weird GC

2014-01-29 Thread Benedict Elliott Smith
consider it a bug and we may be able to fix it. -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.com wrote: Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we

Re: Weird GC

2014-01-31 Thread Joel Samuelsson
:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.comwrote: Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I

Re: unstable write performance

2014-03-26 Thread Marcin Cabaj
ParNew GC (used by default in cassandra) uses 'stop-the-world' algorithm, which means your application has to be stopped to do gc. You can run jstat command to monitor gc activity and check if your write performance is related to GC, eg: $ jstat -gc CASSANDRA_PID 1s But it shouldn't drop

Re: What % of cassandra developers are employed by Datastax?

2014-05-23 Thread Redmumba
Another thing to keep in mind--even core pieces like the Linux kernel are dominated by corporations. Less than 20% of contributions last year were made by non-corporate sponsored contributors. Obviously, this is a bit different, but many parts of the open source world depend on upstream

Linux containers, docker, SSD, and RAID.

2014-06-04 Thread Kevin Burton
Hey guys. Question about using container with Cassandra. I think we will eventually deploy on containers… lxc with docker probably. Our first config will have one cassandra daemon per box. Of course there are issues here. Larger per VM heap means more GC time and potential stop the world

Re: Wide rows best practices and GC impact

2014-12-03 Thread Robert Coli
30GB of RAM on the machine, you could consider investigating large-heap configurations, rbranson from Instagram has some slides out there on the topic. What you pay is longer stop the world GCs, IOW latency if you happen to be talking to a replica node when it pauses.

Re: Should one expect to see hints being stored/delivered occasionally?

2015-01-30 Thread Vasileios Vlachos
. Probably what is happening is that you have stop the world GCs long enough to trigger queueing of hints via timeouts during these GCs. =Rob

Re: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread DuyHai Doan
. Only by keeping an eye on the physical structure can you tune your queries for best performance. To date, I have been using cassandra-cli to view the table's internal structure. But, I get bombarded with all kinds of warnings about how I should switch to CQL and stop using a deprecated

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Robert Coli
must admit: > I have not tried running rebuild. > I admit I haven't been following this thread closely, perhaps I have missed what exactly it is you're trying to do. It's possible you'd need to : 1) join the node with auto_bootstrap=false 2) immediately stop it 3) re-start it with join_ri

Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Justin Lin
to cassandra in most case. (Sometimes it still can connect to cassandra). And from cassandra log, we can see it takes roughly 3 seconds to do gc when there is an incoming connection. And the gc is the only difference between the timeout connection and the successful connection. So we suspect this Stop

Re: scylladb

2017-03-11 Thread Kant Kodali
e aren't available as > transparent hugepages). > > > On 03/11/2017 10:26 PM, Kant Kodali wrote: > > @Dor > > 1) You guys have a CPU scheduler? you mean user level thread Scheduler > that maps user level threads to kernel level threads? I thought C++ by > default creates native k

Re: scylladb

2017-03-11 Thread Avi Kivity
scheduler? you mean user level thread Scheduler that maps user level threads to kernel level threads? I thought C++ by default creates native kernel threads but sure nothing will stop someone to create a user level scheduling library if that's what you are talking about? 2) How can one create

Cassandra crashes....

2017-08-22 Thread Thakrar, Jayesh
the TWCS compaction properties to have min/max compaction sstables = 4 and by drastically reducing the size of the New/Eden space (to 5% of heap space = 800 MB). Its been about 12 hours and our stop-the-world gc pauses are under 90 ms. Since the servers have more than sufficient resources, we

  1   2   3   >