I'm looking into performance issues on a 0.6.1 cluster. I see two symptoms:
1. Reads and writes are slow
2. One of the hosts is doing a lot of GC.
1 is slow in the sense that in normal state the cluster used to make around
3-5k read and writes per second (6-10k operations per second), but how
I think you may found the eventually in eventually consistent. With a
replication factor of 1, you are allowing the client thread to continue to
the read on node#2 before it is replicated to node 2. Try setting your
replication factor higher for different results.
Jonathan
On Tue, May 4, 2010 at
:) I think this is simpler and I am just stupid
I retried with clean data and commit log directories and everything works
well.
I should have missed something (maybe when I upgraded from 0.5.1 to 0.6) but
anyway, I am just in test.
On Tue, May 4, 2010 at 8:47 AM, Jonathan Shook
Only major compactions can clean out obsolete tombstones.
On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis jbel...@gmail.com wrote:
On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami kazuki.aran...@gmail.com
wrote:
Let me rephrase my question.
How does Cassandra deal with bloom filter's false
As you havent specified all the details pertaining to filters and your data
layout (structure) at a very high level what i can suggest is that you need
to create a seperate CF for each filter.
On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan rakes...@gmail.com wrote:
I am evaluating cassandra to
Hey,
History repeating itself a bit, here: one delay in getting Cassandra into
the open source world was removing its use of the Trove collections library,
as the license (LGPL) is not compatible with the Apache 2.0 license.
Later,
Jeff
On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta
I'm facing the same issue with swap. It only occurs when I perform read
operations (write are very fast :)). So I can't help you with the memory
probleme.
But to balance the load evenly between nodes in cluster just manually fix
their token.(the formula is i * 2^127 / nb_nodes).
Jordzn
On Tue,
Hi
In a cluster of cassandra if we are updating any key/value and perform the
fetch query on that same key, we get old/stale data. This can be because of
Read Repair.
Is there any way to fetch the latest updated data from the cluster, as old
data stands no significance and showing it to client
If R + W N, where R, W, and N are respectively the read replica count, the
write replica count, and the replication factor, all client reads will see
the most recent write.
On Tue, May 4, 2010 at 4:39 PM, vineet daniel vineetdan...@gmail.comwrote:
Hi
In a cluster of cassandra if we are
LGPL ia listed as a part of a forbidden licenses for apache projects
(see Excluded Licenses in http://www.apache.org/legal/3party.html)...
On Tue, May 4, 2010 at 12:34 PM, Jeff Hammerbacher ham...@cloudera.com wrote:
Hey,
History repeating itself a bit, here: one delay in getting Cassandra
Hi,
I am very new to Cassandra 0.6.1. I have setup the two node on two different
server. I would like to know how data distribution and replication work.
Node 1 IP:43.193.211.215Node 2 IP:43.193.213.160
Node 1: Configuraiton Seeds Seed43.193.211.215/Seed /Seeds
Node 2: ConfigurationSeeds
All other parameters are identical in both servers. I have added some data
from both node
but i am confused on which node data stores. Does it stores in both node
OR only stores in one node from where it has been added. I can retrieve data
from both nodes
but sometime can not. Not sure
I may be wrong here. Someone please correct me if I am.
There may be a race condition if you aren't increasing your replication
factor.
If you insert to node A with replication factor 1, and then get from node B
with replication factor 1, it should be possible (and even more likely in
uneven
Thanks for prompt reply.
As per your reply, my configuration should be like,
Node 1: Configuraiton
Seeds
Seed43.193.211.215/Seed Seed43.193.213.160/Seed
/Seeds
Node 2: Configuration
Seeds
Seed43.193.211.215/Seed Seed43.193.213.160/Seed
/Seeds
About replication - In my case
1. When initially startup your nodes, please plan your InitialToken of each
node evenly.
2. DiskAccessModestandard/DiskAccessMode
On Tue, May 4, 2010 at 9:09 PM, Boris Shulman shulm...@gmail.com wrote:
I think that the extra (more than 4GB) memory usage comes from the
mmaped io, that is why it
I canceled mmap and indeed memory usage is sane again. So far performance
hasn't been great, but I'll wait and see.
I'm also interested in a way to cap mmap so I can take advantage of it but
not swap the host to death...
On Tue, May 4, 2010 at 9:38 PM, Kyusik Chung kyu...@discovereads.comwrote:
You could try mmap_index_only - this would restrict mmap usage to the
index files.
-Nate
On Tue, May 4, 2010 at 11:57 AM, Ran Tavory ran...@gmail.com wrote:
I canceled mmap and indeed memory usage is sane again. So far performance
hasn't been great, but I'll wait and see.
I'm also interested
Are you using 32 bit hosts? If not don't be scared of mmap using a
lot of address space, you have plenty. It won't make you swap more
than using buffered i/o.
On Tue, May 4, 2010 at 1:57 PM, Ran Tavory ran...@gmail.com wrote:
I canceled mmap and indeed memory usage is sane again. So far
it's a 64bit host.
when I cancel mmap I see less memory used and zero swapping, but it's slowly
growing so I'll have to wait and see.
Performance isn't much better, not sure what's the bottleneck now (could
also be the application).
Now on the same host I see:
top - 15:43:59 up 12 days, 4:23, 1
I have a ColumnFamily with a small number of keys, but each key has a
large number of columns.
What's the best way to get just the keys back? I don't want to load all
the columns if I don't have to. There also isn't necessarily any column
names in common between the different rows.
Cheers,
Hello,
We stored about 47mil keys in one Cassandra node and what a memory dump
shows for one of the SStableReader:
SSTableReader: 386MB. Among this 386MB, IndexSummary takes about 231MB
but BloomFilter takes 155MB with an embedded huge array long[19.4mil].
It seems that BloomFilter is
I'll be running a day-long Cassandra training class on Friday, May 21.
I'll cover
- Installation and configuration
- Application design
- Basics of Cassandra internals
- Operations
- Tuning and troubleshooting
Details at http://riptanobayarea20100521.eventbrite.com/
--
Jonathan Ellis
Project
Thanks Jonathan.
After looking at the Lucandra code I realized my confusions has to do with
get_range_slices
and the RandomPartitioner. When I switched to the OPP I got the expected
behaviour.
I was noticing cases under the random partitioner where keys I expected to
be returned
were not.
On Tue, May 4, 2010 at 4:17 PM, aaron aa...@thelastpickle.com wrote:
I was noticing cases under the random partitioner where keys I expected to
be returned
were not. Can you give a little advice on the expected behaviour of
get_range_slices
with the RP and I'll try to write a JUnit for it.
Hi,
As of 0.6.1, I don't find sstable2jason.bat. I don't know if I missed
anything?
It will good if we can have one, which can help import/ export data in/ out
development machine.
Thanks,
Regards,
Dop
;) ya I it was painful
On Tue, May 4, 2010 at 10:53 AM, Avinash Lakshman
avinash.laksh...@gmail.com wrote:
Hahaha, Jeff - I remember scampering to remove those references to the
Trove maps, I think around 2 years ago.
Avinash
On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher
Ah! Thank you.
Explained better here:
http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency
On Tue, May 4, 2010 at 8:38 PM, Robert Coli rc...@digg.com wrote:
On 5/4/10 7:16 AM, Jonathan Shook wrote:
I may be wrong here. Someone please correct me if I
I want to export data from one cassandra cluster (production) to
another (development). This is not a case of replication, because I
just want a snapshot, not a continuous synchronization. I guess my
options include 'nodetool snapshot' and 'sstable2json'. In our case,
however, the development
get_range_slices with an empty list of column names should work
On Tue, May 4, 2010 at 3:02 PM, Chris Dean ctd...@sokitomi.com wrote:
I have a ColumnFamily with a small number of keys, but each key has a
large number of columns.
What's the best way to get just the keys back? I don't want to
I've been reading everything I can get my hands on about Cassandra and
it sounds like a possibly very good framework for our data needs; I'm
about to take the plunge and do some prototyping, but I thought I'd
see if I can get a reality check here on whether it makes sense.
Our schema should be
30 matches
Mail list logo