I ran into this. I also tried log_ring_state=false which also did not help.
The way I got through this was to stop the entire cluster and start the nodes
one-by-one.
I realize this is not a practical solution for everyone, but if you can afford
to stop the cluster for a few minutes, it's
Hi -
From what I understand, Peter's recommendation should work for you. They
have both worked for me. No need to copy anything by hand on the new node.
Bootstrap/repair does that for you. From the Wiki:
If a node goes down entirely, then you have two options:
(Recommended approach)
Hi - sorry if this was asked before but I couldn't find any answers about it.
Is the upgrade path from 0.7.6 to 0.8.4 possible via a simple rolling restart?
Are nodes with these different versions compatible - i.e., can one node be
upgraded in order to see if we run into any problems
A simple kill without -9 should work. Have you tried that?
On , Jason Pell jasonmp...@gmail.com wrote:
Check out the rpm packages from Cassandra they have init.d scripts that
work very nicely, there are debs as well for ubuntu
Sent from my iPhone
On Jul 27, 2011, at 3:19, Priyanka
One of the main reasons for regularly running repair is to make sure
deletes are propagated in the cluster, ie, data is not resurrected if a
node never received the delete call.
And repair-on-read takes care of repairing inconsistencies on-the-fly.
So if I were to set a universal TTL on all
good points Aaron. I realize now how expensive repair on reads are. I'm
going to keep doing repairs regularly but still have a max TTL on all
columns to make sure we don't have really old data we no longer need
getting buried in the cluster.
On , aaron morton aa...@thelastpickle.com wrote:
I regularly run repair on my cassandra cluster. However, I often seen that
during the repair operation very large amounts of data are transferred to other
nodes.
My questions is, if only some data is out of sync, why are entire Data files
being transferred?
situation.
Thanks. Looking forward to the release where these 2 things are fixed.
On , Jonathan Ellis jbel...@gmail.com wrote:
On Thu, Jul 21, 2011 at 9:14 AM, Jonathan Colby
jonathan.co...@gmail.com wrote:
I regularly run repair on my cassandra cluster. However, I often seen
that during
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 23 Jun 2011, at 19:58, Jonathan Colby wrote:
Hi -
I'd like to understand more how the token is hashed with the key to
determine on which node the data is stored - called decorating in cassandra
speak.
Can
Hi -
I'd like to understand more how the token is hashed with the key to determine
on which node the data is stored - called decorating in cassandra speak.
Can anyone share any documentation on this or describe this more in detail?
Yes, I could look at the code, but I was hoping to be able
A compaction will be triggered when min number of same sized SStable files
are found. So what's actually the purpose of the max part of the
threshold?
On Jun 23, 2011, at 12:55 AM, aaron morton wrote:
Setting them to 2 and 2 means compaction can only ever compact 2 files at
time, so
The way compaction works, x same-sized files are merged into a new SSTable.
This repeats itself and the SSTable get bigger and bigger.
So what is the upper limit?? If you are not deleting stuff fast enough,
wouldn't the SSTable sizes grow indefinitely?
I ask because we have some rather
to hit a dead end.
On Jun 22, 2011, at 6:50 PM, Eric tamme wrote:
On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby
jonathan.co...@gmail.com wrote:
The way compaction works, x same-sized files are merged into a new
SSTable. This repeats itself and the SSTable get bigger and bigger.
So
and avoid very large SSTables/node if possible.
Edward
On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby jonathan.co...@gmail.com
wrote:
The way compaction works, x same-sized files are merged into a new
SSTable. This repeats itself and the SSTable get bigger and bigger.
So what
Thanks Ryan. Done that : ) 1 TB is the striped size.We might look into
bigger disks for our blades.
On Jun 22, 2011, at 7:09 PM, Ryan King wrote:
On Wed, Jun 22, 2011 at 10:00 AM, Jonathan Colby
jonathan.co...@gmail.com wrote:
Thanks for the explanation. I'm still a bit skeptical
Awesome tip on TTL. We can really use this as a catch-all to make sure all
columns are purged based on time. Fits our use-case good. I forgot this
feature existed.
On Jun 22, 2011, at 7:11 PM, Eric tamme wrote:
Second, compacting such large files is an IO killer.What can be tuned
I just took a look at the demo. This is really great stuff! I will try this
on our cluster as soon as possible. I like this because it allows people not
too familiar with the cassandra CLI or Thrift a way to query cassandra data.
On Jun 20, 2011, at 10:56 AM, Markus Wiesenbacher |
jsvc is not very flexible. Check out wrapper software out. we swear by it.
http://wrapper.tanukisoftware.com/doc/english/download.jsp
On Jun 17, 2011, at 2:52 AM, Ken Brumer wrote:
Anton Belyaev anton.belyaev at gmail.com writes:
I guess it is not trivial to modify the package to make
? What would be the difference between cleanup and
compactions?
On Sat, Jun 11, 2011 at 8:14 AM, Jonathan Ellis jbel...@gmail.com wrote:
Yes.
On Sat, Jun 11, 2011 at 6:08 AM, Jonathan Colby
jonathan.co...@gmail.com wrote:
I've been reading inconsistent descriptions of what major
I've been reading inconsistent descriptions of what major and minor compactions
do. So my question for clarification:
Are tombstones purges (ie, space reclaimed) for minor AND major compactions?
Thanks.
I'm seeing this in my logs. We are storing emails in cassandra and some of
them might be rather large.
Is this bad? What exactly is happening when this appears?
INFO [CompactionExecutor:1] 2011-06-11 13:39:19,217 CompactionIterator.java
(line 150) Compacting large row
When I run repair on a node in my 0.7.6-2 cluster, the repair starts to stream
data and activity is seen in the logs.
However, after a while (a day or so) it seems like everything freezes up. The
repair command is still running (the command prompt has not returned) and
netstats shows output
I got myself into a situation where one node (10.47.108.100) has a lot more
data than the other nodes. In fact, the 1 TB disk on this node is almost
full. I added 3 new nodes and let cassandra automatically calculate new tokens
by taking the highest loaded nodes. Unfortunately there is
balancing should be an iteration on the above steps moving
through the range.
On 6/9/11 6:21 AM, Jonathan Colby wrote:
I got myself into a situation where one node (10.47.108.100) has a lot more
data than the other nodes. In fact, the 1 TB disk on this node is almost
full. I added 3 new nodes
I'm trying to run a repair on a 7.6-2 Node. After running the repair command,
this line shows up in the cassandra.log, but nothing else. It's been hours.
Nothing is seen in the logs from other servers or with nodetool commands like
netstats or tpstats.
How do I know if the repair is
OK, is seems a phantom node (one that was removed from the cluster)
kept being passed around in gossip as a down endpoint and was messing
up the gossip algorithm. I had the luxury of being able to stop the
entire cluster and bring the nodes up one by one. That purged the bad
node from gossip.
It might just not have occurred to me in the previous 0.7.4 version,
but when I do a repair on a node in v0.7.6, it seems like data is also
synced with neighboring nodes.
My understanding of repair is that the data is reconciled one the node
being repaired. i.e., data is removed or added to that
Hi -
Operations like repair and bootstrap on nodes in our cluster (average
load 150GB each) take a very long time.
By long I mean 1-2 days. With nodetool netstats I can see the
progress % very slowly progressing.
I guess there are some throttling mechanisms built into cassandra.
And yes
Thanks Ed! I was thinking about surrendering more memory to mmap
operations. I'm going to try bringing the Xmx down to 4G
On Fri, May 27, 2011 at 5:19 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
On Fri, May 27, 2011 at 9:08 AM, Jonathan Colby jonathan.co...@gmail.com
wrote:
Hi
rounds and then disappears.
Hope that helps.
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 26 May 2011, at 19:58, Jonathan Colby wrote:
@Aaron -
Unfortunately I'm still seeing message like: is down, removing
you check from the other nodes in the cluster to see if they are
receiving the stream ?
cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 26 May 2011, at 00:42, Jonathan Colby wrote:
I recently
I recently removed a node (with decommission) from our cluster.
I added a couple new nodes and am now trying to rebalance the cluster using
nodetool move.
However, netstats shows that the node being moved is trying to stream data
to the node that I already decommissioned yesterday.
The
I'm not sure if this is the absolute best advice, but perhaps
running clean on the data will help cleanup any data that isn't assigned
to this token - in case you've moved the cluster around before.
Any exceptions in the logs, eg EOF ? I experienced this and it caused the
repairs to trip
, Jonathan Colby wrote:
I recently removed a node (with decommission) from our cluster.
I added a couple new nodes and am now trying to rebalance the cluster
using nodetool move.
However, netstats shows that the node being moved is trying to stream
data to the node that I already
On each of our nodes we have an average of 80 - 100 GB actual cassandra data on
1 TB disks.There is normally plenty of capacity on the nodes. Swap is OFF.
OS is Debian 64 bit.
Every once in a while, the disk usage will skyrocket to 500+ GB, even once
filling up the 1 TB disk (at least
We use the Java Service Wrapper from Tanuki Software and are very happy
with it. It's a lot more robust than jsvc.
http://wrapper.tanukisoftware.com/doc/english/download.jsp
The free community version will be enough in most cases.
Jon
On May 11, 2011 10:30pm, Anton Belyaev
Your questions are pretty fundamental. I recommend reading through the
documentation to get a better understanding of how Cassandra works.
Here's good documentation from DataStax:
http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity
In a nutshell: you only bootstrap new
I've been struggling with these kinds of exceptions for some time now. I
thought it might have been a one-time thing, so on the 2 nodes where I saw this
problem I pulled in fresh data with a repair on an empty data directory.
Unfortunately, this problem is now coming up on a new node that has,
This is normal when you just add single nodes. When no token is assigned,
the new node takes a portion of the ring from the most heavily loaded node.
As a consequence of this, the nodes will be out of balance.
In other words, when you double the amount nodes you would not have this
?
Thanks.
On Tue, Apr 12, 2011 at 5:15 PM, Jonathan Colby jonathan.co...@gmail.com
wrote:
This is normal when you just add single nodes. When no token is assigned,
the new node takes a portion of the ring from the most heavily loaded node.
As a consequence of this, the nodes
There are a few other threads related to problems with the nodetool repair in
0.7.4. However I'm not seeing any errors, just never getting a message that
the repair completed successfully.
In my production and test cluster (with just a few MB data) the repair
nodetool prompt never returns
hang if a neighbour dies and fails to send a requested stream. It
will timeout after 24 hours (I think).
Aaron
On 12 Apr 2011, at 23:39, Karl Hiramoto wrote:
On 12/04/2011 13:31, Jonathan Colby wrote:
There are a few other threads related to problems with the nodetool repair
in 0.7.4
does a repair just compare the existing data from sstables on the node being
repaired, or will it figure out which data this node should have and copy it
in?
I'm trying to refresh all the data for a given node (without reassigning the
token) starting with an emptied out data directory.
I
/6003676843 - 0%
Pool NameActive Pending Completed
Commandsn/a 0 5765
Responses n/a 0 9811
On Apr 12, 2011, at 4:59 PM, Jonathan Colby wrote:
does a repair just compare the existing data
When the down data center comes back up, the Quorum reads will result in a
read-repair, so you will get valid data. Besides that, hinted handoff will
take care of getting data replicated to a previously down node.
You're example is a little unrealistic because you could theoretically have a
how long as it been in Leaving status? Is the cluster under stress test load
while you are doing the decommission?
On Apr 12, 2011, at 6:53 PM, Baskar Duraikannu wrote:
I have setup a 4 node cluster for testing. When I setup the cluster, I have
setup initial tokens in such a way that each
your jvm heap has reached 78% so cassandra automatically flushes its memtables.
you need to explain more about your configuration. 32 or 64 bit OS, what is
max heap, how much ram installed?
If this happens under stress test conditions its probably understandable. you
should look into
cool! and I thought I made that one up myself : )
On Apr 13, 2011, at 2:13 AM, Chris Burroughs wrote:
On 04/12/2011 11:11 AM, Jonathan Colby wrote:
I'm not sure if this is the kosher way to rebuild the sstable data, but it
seemed to work.
http://wiki.apache.org/cassandra/Operations
or Stored remote tree depending on which
returns first at DEBUG level
3) Queuing comparison
If we do not have the 3rd log then we did not get a replay from either local
or remote.
Aaron
On 13 Apr 2011, at 00:57, Jonathan Colby wrote:
There is no Repair session message either. It just
Thanks for the answer Aaron.
There are Data, Index, Filter, and Statistics files associated with SSTables.
What files must be physically moved/deleted?
I tried just moving the Data file and Cassandra would not start. I see this
exception:
WARN [WrapperSimpleAppMain] 2011-04-11
Seeing these exceptions on a node during the bootstrap phase of a move .
Cassandra 0.7.4. Anyone able to shed more light on what may be causing this?
btw - the move was done to assign a new token, decommission phase seemed to
have gone ok. bootstrapping is still in progress (i hope)
INFO
My seed node (1 of 4) having the wraparound range (token 0) needs to be
replaced.
Should I bootstrap the node with a new IP, then add it back as a seed?
Should I run remove token on another node to take over the range?
I shutdown cassandra, deleted (with a backup) the contents of the data
directory and did a nodetool move 0.It seems to be populating the node
with its range of data.Hope that was a good idea.
On Apr 11, 2011, at 10:38 PM, Jonathan Colby wrote:
My seed node (1 of 4) having
the earlier EOF error during bootstrap ?
Aaron
On 12 Apr 2011, at 08:42, Jonathan Colby wrote:
I shutdown cassandra, deleted (with a backup) the contents of the data
directory and did a nodetool move 0.It seems to be populating the node
with its range of data.Hope that was a good idea
It appears we have several unserializable or unreadable rows. These were not
fixed even after doing a scrub on all nodes - even though the scrub seemed
to have completed successfully.
I trying to fix these by doing a repair, but these exceptions are thrown
exactly when doing a repair.
I can't explain the technical reason why it's not advisable to bootstrap a
seed. However, from what I've read you would bootstrap the node as a non-seed
first, then add it as seed once it has finished bootstrapping.
On Apr 8, 2011, at 9:30 PM, mcasandra wrote:
in yaml:
# Set to true to
.
This is similar to https://issues.apache.org/jira/browse/CASSANDRA-2156 but
that ticket will not cover this case. I've added this use case to the
comments, please check there if you want to follow along.
Cheers
Aaron
On 6 Apr 2011, at 16:26, Jonathan Colby wrote:
thanks for the response Aaron
It seems on my cluster there are a few unserializable Rows. I'm trying to run
a repair on the nodes, but it also seems that the replica nodes have unreadable
or unserializable rows.The problem is, I cannot determine if the repair is
still going on, or if was interrupted because of these
nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 7 Apr 2011 00:10, Jonathan Colby jonathan.co...@gmail.com wrote:
Let's say you have RF of 3 and a write was written to 2 nodes. 1 was not
written because the node had a network hiccup (but came
These types of exceptions is seen sporadically in our cassandra logs. They
occur especially after running a repair with the nodetool.
I assume there are a few corrupt rows. Is this cause for panic?
Will a repair fix this, or is it best to do a decomission + bootstrap via a
move for
and whats the RF?
Aaron
On 6 Apr 2011, at 01:16, Jonathan Colby wrote:
When doing a move, decommission, loadbalance, etc. data is streamed to the
next node in such a way that it really strains the receiving node - to the
point where it has a problem serving requests.
Any way
good to see a discussion on this.
This also has practical use for business continuity where you can control that
the clients in a given data center first write replicas to its own data center,
then to the other data center for backup. If I understand correctly, a write
takes the token into
Let's say you have RF of 3 and a write was written to 2 nodes. 1 was not
written because the node had a network hiccup (but came back online again).
My question is, if you are reading a key with a CL of ONE, and you happen to
land on that node that didn't get the write, will the read fail
as previously.
Am I missing something or am I just reading the docs wrong ?
Cheers
Aaron
On 4 Apr 2011, at 22:20, Jonathan Colby wrote:
hi Aaron -
The Datastax documentation brought to light the fact that over time,
major compactions will be performed on bigger and bigger
when doing a nodetool move , after about 15 minutes I got the below
exception. The cassandra log seems to indicate that the move is still
ongoing. Is this anything to worry about?
Exception in thread main java.rmi.UnmarshalException: Error unmarshaling
return header; nested exception is:
Hi Jonathan -
Would you recommend to disable system swap as a rule? I'm running on Debian
64bit and am seeing light swapping:
total used free sharedbuffers cached
Mem: 8003 7969 33 0 0 4254
-/+ buffers/cache:
I've seen the other posts about memory consumption, but I'm seeing some weird
behavior with 0.7.4 with 5 GB heap size (64 bit system with 8 GB ram
total)...
note the virtual mem used 20.6 GB ?! and Shared 8.4 GB ?!
PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND
I added a node to the cluster and I am having a difficult time reassigning the
new tokens.
It seems after a while nothing shows up in the new node's logs and it just
stays in status Leaving. nodetool netstats on all nodes shows Nothing
streaming to/from.
There is no activity in the other
SSTableReader.java (line
154) Opening /var/lib/cassandra/data/DFS/main-f-129
INFO [CompactionExecutor:1] 2011-04-05 22:46:02,228 SSTableReader.java (line
154) Opening /var/lib/cassandra/data/DFS/main-f-130
On Apr 5, 2011, at 10:46 PM, Jonathan Colby wrote:
I added a node to the cluster and I am
thresholds are applied per bucket of files that share a similar
size, there is normally more smaller files and fewer larger files.
Aaron
On 2 Apr 2011, at 01:45, Jonathan Colby wrote:
I discovered that a Garbage collection cleans up the unused old SSTables.
But I still wonder whether cleanup
, but is a little messy.
Depending on your setup it may also be possible to copy / move the nodes
manually by moving sstable files.
I've not done it myself, are you able to run a test ?
Hope that helps.
Aaron
On 1 Apr 2011, at 02:04, Jonathan Colby wrote:
From my understanding of replica
I ran node cleanup on a node in my cluster and discovered the disk usage went
from 3.3 GB to 5.4 GB. Why is this?
I thought cleanup just removed hinted handoff information. I read that
*during* cleanup extra disk space will be used similar to a compaction. But I
was expecting the disk
I discovered that a Garbage collection cleans up the unused old SSTables. But
I still wonder whether cleanup really does a full compaction. This would be
undesirable if so.
On Apr 1, 2011, at 4:08 PM, Jonathan Colby wrote:
I ran node cleanup on a node in my cluster and discovered the disk
From my understanding of replica copies, cassandra picks which nodes to
replicate the data based on replication strategy, and those same replica
partner nodes are always used according to token ring distribution.
If you change the replication strategy, does cassandra pick new nodes to
silly question, would every cassandra installation need to have manual repairs
done on it?
It would seem cassandra's read repair and regular compaction would take care
of keeping the data clean.
Am I missing something?
On Mar 30, 2011, at 7:46 PM, Peter Schuller wrote:
I just wanted to
Peter -
Thanks a lot for elaborating on repairs.Still, it's a bit fuzzy to me why
it is so important to run a repair before the GCGraceSeconds kicks in. Does
this mean a delete does not get replicated ? In other words when I delete
something on a node, doesn't cassandra set tombstones
I'm a little unclear on the differences between the nodetool operations:
- compaction
- repair
- clean
I understand that compaction consolidates the SSTables and physically performs
deletes by taking into account the Tombstones. But what does clean and repair
do then?
Cacti and Munin are great for graphing, nagios is good for monitoring.
I wrote a very simple JMX proxy that you can send a request to and it retrieves
the desired JMX beans.
there are jmx proxys out there if you don't want to write your own, for example
Does anyone know how cassandra chooses the nodes for its other replicant copies?
The first node gets the first copy because its token is assigned for that key.
But what about the other copies of the data?
Do the replicant nodes stay the same based on the token range? Or are the
other
Hi -
Our cluster is spread between 2 datacenters. We have a straight-forward IP
assignment so that OldNetworkTopology (rackinferring snitch) works well.We
have cassandra clients written in Hector in each of those data centers. The
Hector clients all have a list of all cassandra nodes
, 2011, at 2:02 PM, Jonathan Colby wrote:
Hi -
Our cluster is spread between 2 datacenters. We have a straight-forward IP
assignment so that OldNetworkTopology (rackinferring snitch) works well.
We have cassandra clients written in Hector in each of those data centers.
The Hector
According to the Wiki Page on compaction: once compaction is finished, the old
SSTable files may be deleted*
* http://wiki.apache.org/cassandra/MemtableSSTable
I thought the old SSTables would be deleted automatically, but this wiki page
got me thinking otherwise.
Question is, if it is true
It seems some settings like memtable_throughput_in_mb are Keyspace-specific
(at least with 0.7.4).
How can these settings best be changed on a running cluster?
PS - preferable by a sysadmin using nodetool or cassandra-cli
Thanks!
Jon
itself if it detects that it
is low on space. A compaction marker is also added to obsolete
sstables so they can be deleted on startup if the server does not
perform a GC before being restarted.
On Tue, Mar 22, 2011 at 8:30 AM, Jonathan Colby
jonathan.co...@gmail.com wrote:
According
Hi -
On our recently live cassandra cluster of 5 nodes, we've noticed that the
latency readings, especially Reads have gone up drastically.
TotalReadLatencyMicros 5413483
TotalWriteLatencyMicros 1811824
I understand these are in microseconds, but what meaning do they have
This is a two part question ...
1. If you have cassandra nodes with different sized hard disks, how do you
deal with assigning the token ring such that the nodes with larger disks get
more data? In other words, given equally distributed token ranges, when the
smaller disk nodes run out of
We use Puppet to manage the cassandra.yaml in a different location from the
installation. Ours is in /etc/cassandra/cassandra.yaml
You can set the environment CASSANDRA_CONF (i believe it is. check the
cassandra.in.sh) and the startup script will pick up this as the configuration
file to
Hi -
If a seed crashes (i.e., suddenly unavailable due to HW problem), what is the
best way to replace the seed in the cluster?
I've read that you should not bootstrap a seed. Therefore I came up with this
procedure, but it seems pretty complicated. any better ideas?
1. update the seed
Hi -
I have a question. Obviously there is no purpose in running
OldNetworkTopologyStrategy in one data center. However, we want to
share the same configuration in our production (multiple data centers)
and pre-production (one data center) environments.
My question is will
According to the Cassandra Wiki and OReilly book supposedly there is a
contrib directory within the cassandra download containing the
Python Stress Test script stress.py. It's not in the binary tarball
of 0.7.3.
Anyone know where to find it?
Anyone know of other, maybe better stress testing
, but
not sufficient.
The real test is the JMX values.
Dave Viner
On Mon, Dec 20, 2010 at 6:25 AM, Jonathan Colby jonathan.co...@gmail.com
wrote:
I was unable to find example or documentation on my question. I'd like to
know what the best way to group a cluster of cassandra nodes behind
Hi cassandra experts -
We're planning a cassandra cluster across 2 datacenters
(datacenter-aware, random partitioning) with QUORUM consistency.
It seems to me that with 2 datacenters, if one datacenter is lost,
the reads/writes to cassandra will fail in the surviving datacenter
because of the
Thanks a lot Peter. So basically we would need to choose a
consistency other than QUORUM.I think in our case consistency is
not necessarily an issue since our data is write-once, read-many
(immutable data). I suppose having a replication factor of 4 would
result in two nodes in each
I have a very basic question which I have been unable to find in
online documentation on cassandra.
It seems like every node in a cassandra cluster contains all the data
ever stored in the cluster (i.e., all nodes are identical). I don't
understand how you can scale this on commodity servers
sense
for R to be close to N in which case cassandra is useful so the database
doesn't have a single a single point of failure but not so much b/c of the
size of the data. But for large clusters it rarely makes sense to have N=R,
usually N R.
On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby
awesome! Thank you guys for the really quick answers and the links to
the presentations.
On Thu, Dec 9, 2010 at 12:06 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
This helps a little but unfortunately I'm still a bit fuzzy for me. So is it
not true that each node contains all the data in the
96 matches
Mail list logo