Re: Need help json2sstable

2011-07-20 Thread Sasha Dolgy
You are missing after On Wed, Jul 20, 2011 at 8:03 AM, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Hi All, Here Is my Json structure. {Fetch_CC :{                 cc:{ :1000,                     :ICICI,                          :,               

node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
at the beginning of using cassandra, I have no idea that I should run node repair frequently, so basically, I have 3 nodes with RF=3 and have not run node repair for months, the data size is 20G. the problem is when I start running node repair now, it eat up all disk io and the server load became

RE: How to keep only exactly column of key

2011-07-20 Thread Lior Golan
Thanks Sylvain Can you please point us to what interface should be implemented in order to write our own custom compaction. And how is it supposed to be configured? -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Tuesday, July 19, 2011 11:40 AM To:

best example of indexing

2011-07-20 Thread CASSANDRA learner
Hi Guys, Can you please give me the best example of creating index on a column family. As I am completely new to this, Can you please give me a simple and good example.

Re: Need help json2sstable

2011-07-20 Thread Nilabja Banerjee
Yes.Actually, I was just asking you guys to give me one example with one sample of small json structure. Thank you in advance :) On 20 July 2011 11:53, Sasha Dolgy sdo...@gmail.com wrote: You are missing after On Wed, Jul 20, 2011 at 8:03 AM, Nilabja Banerjee

Re: Need help json2sstable

2011-07-20 Thread Nilabja Banerjee
On 20 July 2011 11:33, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Hi All, Here Is my Json structure. {Fetch_CC :{ cc:{ :1000, :ICICI, :, city:{

Re: best example of indexing

2011-07-20 Thread Sasha Dolgy
Examples exist in the conf directory of the distribution... On Jul 20, 2011 11:48 AM, CASSANDRA learner cassandralear...@gmail.com wrote: Hi Guys, Can you please give me the best example of creating index on a column family. As I am completely new to this, Can you please give me a simple and

Re: best example of indexing

2011-07-20 Thread CASSANDRA learner
where can i get that. Can you please help me out On Wed, Jul 20, 2011 at 3:39 PM, Sasha Dolgy sdo...@gmail.com wrote: Examples exist in the conf directory of the distribution... On Jul 20, 2011 11:48 AM, CASSANDRA learner cassandralear...@gmail.com wrote: Hi Guys, Can you please give me

Cassandra CLOUD . How its related

2011-07-20 Thread CASSANDRA learner
Hi Guys, When we talk about cassandra, any how we connect it to cloud. I dont understand how it is connected to cloud. Whats this Cassandra Cloud.

2800 file descriptors?

2011-07-20 Thread cbert...@libero.it
Hi all, I wonder if is normal that Cassandra (5 nodes, 0.75) has more than 2800 fd open and growing. I still have the problem that during repair I get into the too many open files Best regards

What is the nodeId for?

2011-07-20 Thread Boris Yen
Hi, I think we might have screwed our data up. I saw multiple columns inside record: System.NodeIdInfo.CurrentLocal. It makes our cassandra dead forever. I was wondering if anyone could tell me what the NodeId is for? so that I might be able to duplicate this. Thanks in advance Boris

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
just found this: https://issues.apache.org/jira/browse/CASSANDRA-2156 but seems only available to 0.8 and people submitted a patch for 0.6, I am using 0.7.4, do I need to dig into the code and make my own patch? does add compaction throttle solve the io problem? thanks! On Wed, Jul 20, 2011 at

Re: 2800 file descriptors?

2011-07-20 Thread Boris Yen
For the too many open files issue, maybe you could try: ulimit -n 5000 path to cassandra executable. On Wed, Jul 20, 2011 at 6:47 PM, cbert...@libero.it cbert...@libero.itwrote: Hi all, I wonder if is normal that Cassandra (5 nodes, 0.75) has more than 2800 fd open and growing. I still

Re: What is the nodeId for?

2011-07-20 Thread Sam Overton
The NodeId is used in counter replication. Counters are stored on each replica as a set of shards, where each shard corresponds to the local count of one of the replicas for that counter, as identified by the NodeId. A NodeId is generated the first time cassandra starts, and might be renewed

network bandwidth question

2011-07-20 Thread Arijit Mukherjee
Hi All We're trying to set up a Cassandra cluster (initially with 3 nodes). Each node will generate data @ 32MB per second. What would be the likely network usage for this (say with a replication factor of 3)? I mean, if I use simple arithmetic, I can say 32MBps per node, and hence 96MBps in

R: Re: 2800 file descriptors?

2011-07-20 Thread cbert...@libero.it
For the too many open files issue, maybe you could try: ulimit -n 5000 amp;amp; path to cassandra executable. Ok, thanks for the tip but I get this error running nodetool repair and not during cassandra execution. I however wonder if this is normal or not ... in production do you get similar

Re: network bandwidth question

2011-07-20 Thread Jonathan Ellis
You can assume that's negligible compared to the data traffic. On Wed, Jul 20, 2011 at 7:02 AM, Arijit Mukherjee ariji...@gmail.com wrote: Hi All We're trying to set up a Cassandra cluster (initially with 3 nodes). Each node will generate data @ 32MB per second. What would be the likely

Re: Re: 2800 file descriptors?

2011-07-20 Thread Jonathan Ellis
Repair does normally stream lots of small sstables. It's normal to set open fd to unlimited, but a higher limit like 64K would also be reasonable. On Wed, Jul 20, 2011 at 7:02 AM, cbert...@libero.it cbert...@libero.it wrote: For the too many open files issue, maybe you could try:  ulimit -n

Re: Repair taking a long, long time

2011-07-20 Thread David Boxenhorn
I have this problem too, and I don't understand why. I can repair my nodes very quickly by looping though all my data (when you read your data it does read-repair), but nodetool repair takes forever. I understand that nodetool repair builds merkle trees, etc. etc., so it's a different algorithm,

Re: What is the nodeId for?

2011-07-20 Thread Boris Yen
Hi Sam, Thanks for the explanation. The NodeIds do appear in the Local row of NodeIdInfo, and after manually deleting two (I got three before I deleted them) of them from CurrentLocal row, the cassandra can be restarted now. I was just thinking what could be the possible cause for this? and

Re: Repair taking a long, long time

2011-07-20 Thread Maxim Potekhin
I can re-load all data that I have in the cluster, from a flat-file cache I have on NFS, many times faster than the nodetool repair takes. And that's not even accurate because as other noted nodetool repair eats up disk space for breakfast and takes more than 24hrs on 200GB data load, at which

Re: Repair taking a long, long time

2011-07-20 Thread Boris Yen
We also got the same problem when using 0.8.0. As far as I know, there are a few issues relative to 'repair' has been marked as resolved at 0.8.1. Hope this could really solve our problem. On Wed, Jul 20, 2011 at 8:47 PM, David Boxenhorn da...@citypath.com wrote: I have this problem too, and I

Re: Repair taking a long, long time

2011-07-20 Thread David Boxenhorn
As I indicated below (but didn't say specifically) another option is to set read repair chance to 1.0 for all your CFs and loop over all your data, since read triggers a read repair. On Wed, Jul 20, 2011 at 4:58 PM, Maxim Potekhin potek...@bnl.gov wrote: ** I can re-load all data that I have

disable compaction

2011-07-20 Thread Nikolai Kopylov
Hi everyone, finding out recently that cassandra have no upper limit for sstable files to grow, I decided to move to deletion of CF with obsolete data. So that I will not remove columns and there is no need in compaction at all. How can I completely disable the compaction process? Thanx for your

Re: disable compaction

2011-07-20 Thread Edward Capriolo
On Wed, Jul 20, 2011 at 11:13 AM, Nikolai Kopylov kopy...@gmail.com wrote: Hi everyone, finding out recently that cassandra have no upper limit for sstable files to grow, I decided to move to deletion of CF with obsolete data. So that I will not remove columns and there is no need in

Re: What is the nodeId for?

2011-07-20 Thread Sylvain Lebresne
Possibly, you've hitted this: https://issues.apache.org/jira/browse/CASSANDRA-2824 Should be fixed in next minor release. In the meantime, you fix should be alright. -- Sylvain On Wed, Jul 20, 2011 at 3:47 PM, Boris Yen yulin...@gmail.com wrote: Hi Sam, Thanks for the explanation. The

Re: best example of indexing

2011-07-20 Thread Konstantin Naryshkin
In the Cassandra CLI tutorial(http://wiki.apache.org/cassandra/CassandraCli), there is an example of creating a secondary index. Konstantin - Original Message - From: CASSANDRA learner cassandralear...@gmail.com To: user@cassandra.apache.org Sent: Wednesday, July 20, 2011 9:47:28 AM

My nodetool in Java

2011-07-20 Thread cbert...@libero.it
Hi all, I'd like to build something like nodetool to show the status of the ring (nodes up-down, info on single node) all via JAVA. Do you have any tip for this? (I don't want to run the nodetool through java and capture the output ...). I have really no idea on how to do it ... :-)

Re: My nodetool in Java

2011-07-20 Thread Jeremy Hanna
If you look at the bin/nodetool file, it's just a shell script to run org.apache.cassandra.tools.NodeCmd. You could probably call that directly from your code. On Jul 20, 2011, at 3:18 PM, cbert...@libero.it wrote: Hi all, I'd like to build something like nodetool to show the status of the

b-tree

2011-07-20 Thread Eldad Yamin
Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using

Re: b-tree

2011-07-20 Thread Jeffrey Kesselman
Im not sure if I have an answer for you, anyway, but I'm curious A b-tree and a binary tree are not the same thing. A binary tree is a basic fundamental data structure, A b-tree is an approach to storing and indexing data on disc for a database. Which do you mean? On Wed, Jul 20, 2011 at

Re: disable compaction

2011-07-20 Thread Nikolai Kopylov
Thanx a lot Edward, will follow your advice. On Wed, Jul 20, 2011 at 7:28 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jul 20, 2011 at 11:13 AM, Nikolai Kopylov kopy...@gmail.comwrote: Hi everyone, finding out recently that cassandra have no upper limit for sstable files to

Re: b-tree

2011-07-20 Thread aaron morton
Just throwing out a (half baked) idea, perhaps the Nested Set Model of trees would work http://en.wikipedia.org/wiki/Nested_set_model * Ever row would represent a set with a left and right encoded into the key * Members are inserted as columns into *every* set / row they are a member. So we

Re: Cassandra CLOUD . How its related

2011-07-20 Thread Sameer Farooqui
Are you talking about cloudsandra.com? Check out their website. Cassandra is a database. Cloud is just a fancy term for remote hosting. The two aren't really related. On Wed, Jul 20, 2011 at 3:19 AM, CASSANDRA learner cassandralear...@gmail.com wrote: Hi Guys, When we talk about cassandra,

Re: best example of indexing

2011-07-20 Thread Sameer Farooqui
More info: http://www.datastax.com/docs/0.8/data_model/secondary_indexes http://www.datastax.com/docs/0.8/data_model/cfs_as_indexes On Wed, Jul 20, 2011 at 10:49 AM, Konstantin Naryshkin konstant...@a-bb.net wrote: In the Cassandra CLI tutorial(

Re: Data Visualization Best Practices

2011-07-20 Thread aaron morton
This project may provide some inspiration https://github.com/driftx/chiton Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 Jul 2011, at 06:36, Selcuk Bozdag wrote: Hi, Cassandra provides a flexible scheme-less data

Re: Repair taking a long, long time

2011-07-20 Thread aaron morton
The first thing to do is understand what the server is doing. As Edward said, there are two phases to the repair first the differences are calculated and then they are shared between the neighbours. Lets an a third step, once the neighbour gets the data it has to rebuild the indexes and bloom

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Aaron Morton
If you have never run repair also check the section on repair on this page http://wiki.apache.org/cassandra/Operations About how frequently it should be run. There is an issue where repair can stream too much data, and this can lead to excessive disk use. My non scientific approach to the

Re: PHPCassa get number of rows

2011-07-20 Thread Aaron Morton
Cassandra does not provide a way to count the number of rows, the best you can do is a series of range calls and count them on the client side http://thobbs.github.com/phpcassa/tutorial.html If this is something you need in your app consider creating a custom secondary index to store the row

Re: What is the nodeId for?

2011-07-20 Thread Boris Yen
Not sure if this is the same. I saw exceptions like this: INFO 15:33:49,336 Finished reading /root/commitlog_tmp/CommitLog-1311135088656.log ERROR 15:33:49,336 Exception encountered during startup. java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError

with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu
this time it is another node, the node goes down during repair, and come back but never up, I change log level to DEBUG and found out it print out the following message infinitely DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123) collecting 0 of 2147483647:

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
thank you very much for the help, I will try to adjust minor compaction and also dealing with single CF at a time. On Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton aa...@thelastpickle.comwrote: If you have never run repair also check the section on repair on this page

Re: with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu
sorry for the misunderstanding. I saw many N of 2147483647 which N=0 and thought it was not doing anything. my node was very unbalanced and I was intend to rebalance it by nodetool move after a node repair, does that cause the slices much large? Address Status State Load

Re: with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread aaron morton
Personally I would do a repair first if you need to do one, just so you are confident everything is where is should be. Then do the move as described in the wiki. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 Jul 2011, at

Re: with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu
thans for the reply. now the problem is how can I get rid of the N of 2147483647 , it seems never ends, and the node never goes UP last time it happens I run node cleanup, turns out some data loss(not sure if caused by cleanup). On Thu, Jul 21, 2011 at 11:37 AM, aaron morton

Cassandra Storage Sizing

2011-07-20 Thread Todd Burruss
I put together a blog post on Cassandra Storage Sizing so I don't need to keep figuring it out again and again. Hope everyone finds it useful, and give feedback if you find errors. http://btoddb-cass-storage.blogspot.com/ ... enjoy