Cassandra Disk storage capacity

2014-04-07 Thread Hari Rajendhran
Hi Team, We have a 3 node Apache cassandra 2.0.4 setup installed in our lab setup.We have set data directory to /var/lib/cassandra/data.What would be the maximum  disk storage that will be used for cassandra data storage. Note : /var partition has a storage capacity of 40GB. My question is

Re: Cassandra Disk storage capacity

2014-04-07 Thread Jan Kesten
Hi Hari, C* will use your entire space - that is something one should monitor. Depending on your choose on compaction strategy your data_dir should not be filled up entirely - in the worst case compaction will need space as large as the sstables on disk, therefore 50% should be free space.

Re: Cassandra Disk storage capacity

2014-04-07 Thread Prem Yadav
you can specify multiple data directories in cassandra.yaml. ex: data_file_directories: - /var/lib.cass1 - /var/lib/cass2 -/some_other_mountpoint On Mon, Apr 7, 2014 at 12:10 PM, Jan Kesten j.kes...@enercast.de wrote: Hi Hari, C* will use your entire space - that is something one

RE: Cassandra Disk storage capacity

2014-04-07 Thread Romain HARDOUIN
Hi, See data_file_directories and commitlog_directory in the settings file cassandra.yaml. Cheers, Romain Hari Rajendhran hari.rajendh...@tcs.com a écrit sur 07/04/2014 12:56:37 : De : Hari Rajendhran hari.rajendh...@tcs.com A : user@cassandra.apache.org, Date : 07/04/2014 12:58 Objet

Re: Cassandra Disk storage capacity

2014-04-07 Thread Hari Rajendhran
Hi, Thanks for the update  Still i have few queries which needs to be clarified  1) I am confused why cassandra uses the entire disk space ( / Directory) even when we specify /var/lib/cassandra/data as the directory in Cassandra.yaml file 2) Is it only during compaction ,cassandra will use the

Re: Cassandra Disk storage capacity

2014-04-07 Thread Jan Kesten
Am 07.04.2014 13:24, schrieb Hari Rajendhran: 1) I am confused why cassandra uses the entire disk space ( / Directory) even when we specify /var/lib/cassandra/data as the directory in Cassandra.yaml file 2) Is it only during compaction ,cassandra will use the entire Disk space ? 3) What is the

Re: Cassandra Disk storage capacity

2014-04-07 Thread Bèrto ëd Sèra
I guess there is a misunderstanding here: I am confused why cassandra uses the entire disk space ( / Directory) even when we specify /var/lib/cassandra/data as the directory in Cassandra.yaml file C* will use the entire MOUNTPOINT, which is not necessarily your entire total disk space. If you

Why is my cluster imbalanced ?

2014-04-07 Thread Oleg Dulin
I added two more nodes on Friday, and moved tokens around. For four nodes, the tokesn should be:  Node #1:    0  Node #2:   42535295865117307932921825928971026432  Node #3:   85070591730234615865843651857942052864  Node #4:  

Re: Why is my cluster imbalanced ?

2014-04-07 Thread Tupshin Harper
Your us-east datacenter, has RF=2, and 2 racks, which is the right way to do it (I would rarely recommend using a different number of racks than your RF). But by having three nodes on one rack (1b) and only one on the other(1a), you are telling Cassandra to distribute the data so that no two

Inserting with large number of column

2014-04-07 Thread Fasika Daksa
We are running different workload test on Cassandra and Redis for benchmarking. We wrote a java client to read, write and evaluate the elapsed time of different test cases. Cassandra was doing great until we introduced 20'000 number of cols.. the insertion is running for a day and then i

Re: Inserting with large number of column

2014-04-07 Thread Tupshin Harper
More details would be helpful (exact schema), method of inserting data, etc) but you can try just doing dropping the indices and recreate them after the import is finished. -Tupshin On Apr 7, 2014 8:53 AM, Fasika Daksa cassandra.d...@gmail.com wrote: We are running different workload test on

Re: Why is my cluster imbalanced ?

2014-04-07 Thread Oleg Dulin
Excellent, thanks. On 2014-04-07 12:23:51 +, Tupshin Harper said: Your us-east datacenter, has RF=2, and 2 racks, which is the right way to do it (I would rarely recommend using a different number of racks than your RF). But by having three nodes on one rack (1b) and only one on the

Re: Inserting with large number of column

2014-04-07 Thread Fasika Daksa
Thanks for your response currently we are inserting the data line by line and soon we will implement the bulk insertion. the meta used to generate the data is No of Boolean cols: 20,000 .No of Int cols: 0 ...No of Rows = 100,000(we use only bool or integer variables). Attached you can find the

Re: Why is my cluster imbalanced ?

2014-04-07 Thread Oleg Dulin
Tupshin: For EC2, 3 us-east, would you recommend RF=3 ? That would make sense, wouldn't it... That's what I'll do for production. Oleg On 2014-04-07 12:23:51 +, Tupshin Harper said: Your us-east datacenter, has RF=2, and 2 racks, which is the right way to do it (I would rarely

Re: Why is my cluster imbalanced ?

2014-04-07 Thread Tupshin Harper
I recommend rf=3 for most situations, and it would certainly be appropriate here. Just remember to add a third rack, and maintain the able number of nodes in each rack. -Tupshin On Apr 7, 2014 9:49 AM, Oleg Dulin oleg.du...@gmail.com wrote: Tupshin: For EC2, 3 us-east, would you recommend

Re: Transaction Timeout on get_count

2014-04-07 Thread Lukas Steiblys
Deleting a column simply produces a tombstone for that column, as far as I know. It’s probably going through all the columns with tombstones and timing out. Compacting more often should help, but maybe Cassandra isn’t the best choice overall for what you’re trying to do. Lukas From: Yulian

Re: Drop in node replacements.

2014-04-07 Thread Robert Coli
On Sat, Apr 5, 2014 at 5:10 PM, Anand Somani meatfor...@gmail.com wrote: Have you tried nodetool rebuild for that node? I have seen that work when repair failed. While rebuild may work in cases when repair doesn't, they do different things and are not mutually substitutable. rebuild is

Re: Transaction Timeout on get_count

2014-04-07 Thread Tupshin Harper
Constant deletes and rewrites are a very poor pattern to use with Cassandra. It would be better to write to a new row and partition every minute and use a TTL to auto expire the old data. -Tupshin On Apr 6, 2014 2:55 PM, Yulian Oifa oifa.yul...@gmail.com wrote: Hello I am having raw in which

Migrating to new datacenter

2014-04-07 Thread Brandon McCauslin
We're currently running a small 5 node 2.0.5 cluster in a single datacenter using the SimpleStrategy replication strategy with replication factor of 3. We want to migrate our data from our current datacenter to a new datacenter, without incurring any downtime or data loss. There is no plan to

Re: Migrating to new datacenter

2014-04-07 Thread Mark Reddy
I would go with option 1. I think it is the safer of the two options, involves less work and if something were to go wrong mid migration you can remove the second DC from your keyspace replication and have a clean break. SimpleStrategy will work across DCs. It is generally advised to not use it

Re: Transaction Timeout on get_count

2014-04-07 Thread Yulian Oifa
Thank for you replies. 1) I can not create raw each X time , since it will not allow me to get a complete list of currently active records ( this is the only reason i keep this raw initially ). 2) As for compaction i thought that only raw ids are cached and not columns itself. I have completed

Re: Migrating to new datacenter

2014-04-07 Thread Brandon McCauslin
Thanks for the confirmation on the approach. The new dc is not yet ready, but while I'm waiting I was thinking about updating the existing dc's replication strategy from SimpleStrategy to NetworkTopologyStrategy. I also assume I'll need to update my snitch from the current SimpleSnitch to

Setting gc_grace_seconds to zero and skipping nodetool repair (was RE: Timeseries with TTL)

2014-04-07 Thread Donald Smith
This statement is significant: “BTW if you never delete and only ttl your values at a constant value, you can set gc=0 and forget about periodic repair of the table, saving some space, IO, CPU, and an operational step.” Setting gc_grace_seconds to zero has the effect of not storing hinted

Re: Migrating to new datacenter

2014-04-07 Thread Robert Coli
On Mon, Apr 7, 2014 at 10:48 AM, Brandon McCauslin bm3...@gmail.com wrote: Thanks for the confirmation on the approach. The new dc is not yet ready, but while I'm waiting I was thinking about updating the existing dc's replication strategy from SimpleStrategy to NetworkTopologyStrategy. I

Re: Setting gc_grace_seconds to zero and skipping nodetool repair (was RE: Timeseries with TTL)

2014-04-07 Thread Laing, Michael
Perhaps following this recent thread would help clarify things: http://mail-archives.apache.org/mod_mbox/cassandra-user/201401.mbox/%3ccakgmdnfk3pa-w+ltusm88a15jdg275o31p4ujwol1b7bkaj...@mail.gmail.com%3E Cheers, Michael On Mon, Apr 7, 2014 at 2:00 PM, Donald Smith

Re: Migrating to new datacenter

2014-04-07 Thread Brandon McCauslin
Rob, If I read your response in that 1st URL correctly, it seems changing both the snitch and replication strategy at the same time is not advisable and could lead to partial data loss. Is your suggestion of dumping an reloading the data into the new cluster still recommended for these

Re: Migrating to new datacenter

2014-04-07 Thread Robert Coli
On Mon, Apr 7, 2014 at 1:41 PM, Brandon McCauslin bm3...@gmail.com wrote: If I read your response in that 1st URL correctly, it seems changing both the snitch and replication strategy at the same time is not advisable and could lead to partial data loss. Is your suggestion of dumping an

Re: Auto-Bootstrap not Auto-Bootstrapping?

2014-04-07 Thread Greg Bone
If seed nodes do not auto bootstrap, what is the procedure for replacing a node in a three node cluster, with all of them identified as seed nodes? Here's what I am thinking: 1) Add a 4th node to the cluster which is not a seed node 2) Decommission one of the seed nodes when data

Re: Auto-Bootstrap not Auto-Bootstrapping?

2014-04-07 Thread Jonathan Lacefield
Hello Not sure I follow the auto bootstrap question, but seeds are only used on startup. Also, what do you mean by convert the node to a seed node? You could simply add the 4th node IP address to the seed list of the other nodes in the .yaml file. Hope that helps Jonathan On Apr 7,

Re: Auto-Bootstrap not Auto-Bootstrapping?

2014-04-07 Thread Paul Charles Leddy
Trick the seed node but removing itself from the yaml file, then start it up. On Mon, Apr 7, 2014 at 7:22 PM, Jonathan Lacefield jlacefi...@datastax.comwrote: Hello Not sure I follow the auto bootstrap question, but seeds are only used on startup. Also, what do you mean by convert the

Re: Fwd: using hadoop + cassandra for CF mutations (delete)

2014-04-07 Thread Suraj Nayak
Good way of experimenting Will. Share your observation :) Adding cassandra user group for the input of the community on num_tokens settings in cassandra.yaml. Thanks Suraj On 07-Apr-2014 6:20 PM, William Oberman ober...@civicscience.com wrote: If that works, it's a neat/fancy trick. But,

Re: Auto-Bootstrap not Auto-Bootstrapping?

2014-04-07 Thread Robert Coli
On Mon, Apr 7, 2014 at 6:17 PM, Greg Bone gbon...@gmail.com wrote: If seed nodes do not auto bootstrap, what is the procedure for replacing a node in a three node cluster, with all of them identified as seed nodes? No one seems to know the answer to your question [1], but the current

Opscenter usage in Development server

2014-04-07 Thread Hari Rajendhran
Hi Team, I need a clarification on Opscenter community version usage for Testing,Development and Production servers.Whether community version can be used without any license  for Production servers ?? Best Regards Hari Krishnan Rajendhran Hadoop Admin DESS-ABIM ,Chennai BIGDATA Galaxy Tata

Re: Opscenter usage in Development server

2014-04-07 Thread Michael Shuler
On 04/07/2014 11:59 PM, Hari Rajendhran wrote: I need a clarification on Opscenter community version usage for Testing,Development and Production servers.Whether community version can be used without any license for Production servers ?? When you are using it with a subscription to DSE, there