Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?
in addition, if you don't know how many rows will be needed - in each row, you can store the key of the next one. Just like in a linked list. OR have 1 row that will hold all the keys that combining your other rows. 1st select the main row (with the keys), then select the other rows. On Mon, Jul 23, 2012 at 3:40 PM, rohit bhatia rohit2...@gmail.com wrote: You should probably try to break the one row scheme to 2*Number_of_nodes rows scheme.. This should ensure proper distribution of rows and still allow u to query from a few fixed number of rows. How u do it depends on how are u gonna choose ur 200-500 columns during reading (try having them in the same row) Even if u r forced to put them in seperate rows, u can make the row key as some modulus of hash of column name, ensuring symmetry and easy access of columns... On Mon, Jul 23, 2012 at 6:02 PM, Ertio Lew ertio...@gmail.com wrote: Any ideas/suggestions please?
Re: Cassandra London: failure modes and HBase
HI Dave, unfortunately, me and some guys that are very interesting won't be able to get all the way to London. Can you please consider using a video streaming service? I recommend on using Watchitoo.com (I used to work there) At the moment its free. Thanks! On Tue, Aug 16, 2011 at 12:47 PM, Dave Gardner d...@cruft.co wrote: Hi all, I'm pleased to announce our next Cassandra meetup on 5th September in London. http://www.meetup.com/Cassandra-London/events/29668191/ We will be looking at failure modes in Cassandra (how it deals with nodes failing and returning etc..) as well as a comparison with HBase. It's a great opportunity to meet other users of Cassandra, so please come along! Dave
Re: Best practices when deploying upgrading a cassandra cluster
Is there any good reason why shouldn't we build the latest version from source? Thanks! On Fri, Aug 12, 2011 at 12:18 AM, aaron morton aa...@thelastpickle.comwrote: In a non dev system it's a lot easier to use the packages http://wiki.apache.org/cassandra/DebianPackaging http://www.datastax.com/docs/0.8/install/packaged_releases Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12 Aug 2011, at 02:30, Martin Lansler wrote: (Note: This is a repost from another thread which did not have a relevant subject, sorry for the spamming) Hi Eldad / All, On Wed, Aug 10, 2011 at 8:32 AM, Eldad Yamin elda...@gmail.com wrote: Can you please explain how did you upgraded. something like step-by-step. Thanks! I took the liberty of replying to the group as it would be interesting to hear how other folks out there are doing it... I'm *not* running a prod system, just a test system of three nodes on my laptop. So it would be nice to hear about real setups. Here is my test setup: apache-cassandra - apache-cassandra-0.8.3 apache-cassandra-0.8.2/ apache-cassandra-0.8.3/ node1/ node2/ node3/ All nodeX look like: bin - ../apache-cassandra/bin/ commitlog/ conf/ data/ interface - ../apache-cassandra/interface/ lib - ../apache-cassandra/lib/ saved_caches/ The 'conf' directory is copied into each node from the virgin cassandra distribution. I then create a local GIT repo and add the 'conf' directory so I can track any configuration changes on a node. Then relevant node specific configuration settings are set. The 'commitlog', 'data' and 'saved_caches' are created by cassandra and must be configured in 'cassandra.yaml' for each node. When I upgrade I do the following: 1. Make a diff of the new conf files from the new version so that get new parameters etc... I use emacs ediff-mode. 2. Remove the old apache-cassandra symlink and point it to the new cassandra dist 3. In a rolling fashion stop one node, and then restart it... as the symlink is changes it will then boot with the upgraded cassandra dist. (remember to cd out in of the bin/ dir otherwise you will still be in the old directory). (4). Should something break... just re-create the old symlink and restart the node (provided cassandra has not performed any non backwards compatible changes to the db files, should be noted in the README) That's pretty much it. On a prod setup one would probably use a tool such as puppet (www.puppetlabs.com/) to ease setting up on many nodes... But there are many ways to do this, for instance pssh (http://code.google.com/p/parallel-ssh/). Regards, -Martin
Re: Planet Cassandra (an aggregation site for Cassandra News)
Great! If possible, please blog about full-text-search options + how to use them (Solandra, Elastic Search, Sphinx etc). Thanks! On Sun, Aug 7, 2011 at 5:58 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Thu, Aug 4, 2011 at 5:12 AM, Boris Yen yulin...@gmail.com wrote: Looking forward to it. ^^ On Thu, Aug 4, 2011 at 1:56 PM, Eldad Yamin elda...@gmail.com wrote: Great! I hope it will be open soon! On Wed, Aug 3, 2011 at 10:33 PM, Ed Anuff e...@anuff.com wrote: Awesome, great news! On Wed, Aug 3, 2011 at 11:53 AM, Lynn Bender line...@gmail.com wrote: Greetings all, I just wanted to send a note out to let everyone know about Planet Cassandra -- an aggregation site for Cassandra news and blogs. Andrew Llavore from DataStax and I built the site. We are currently waiting for approval from the Apache Software Foundation before we publicly launch. However, in the meantime, we'd love to hear from you. If you have any favorite Cassandra-related blogs, or blogs that frequently contain quality Cassandra content, please send us the URL, so that we can contact the author about including a site feed. If you have any questions or comments, please send them to pla...@geekaustin.org. -Lynn Bender -- -Lynn Bender http://geekaustin.org http://linuxagainstpoverty.org http://twitter.com/linearb http://twitter.com/geekaustin I have started a blog to support the High Performance Cassandra Cookbook: http://www.jointhegrid.com/highperfcassandra/ I am going to use blog to continue writing about features and tips for Cassandra in the writing style used for the book. Lynn, please consider it for syndication. All others, please enjoy.
Re: Install Cassandra on EC2
HI Aaron, Thanks for your replay. I've already saw that, but at the moment I'm interesting in installing Cassandra from scratch - I want to learn. well, yesterday I've installed 1 node - now I'm looking on how to add more nodes and read more about Cassandra's tools (node reaper etc.) Thanks! On Thu, Aug 4, 2011 at 1:23 AM, aaron morton aa...@thelastpickle.comwrote: Pre build AMI here http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 4 Aug 2011, at 03:24, Jeremy Hanna wrote: Some quick thoughts that might be helpful: - use ephemeral instances and RAID0 over the local volumes for both cassandra's data as well as the log directory. The log directory because if you crash due to heap size, the heap dump will be stored in the log directory. you don't want that to go in your root/OS partition. - probably want to stripe across AZs so that a single AZ failure doesn't affect you as much. - for seeds, it's nice to use elastic ips so that your seed configuration doesn't have to change if a node is replaced. - the ec2snitch makes it so each AZ appears as a rack wrt topology - simpler as it inspects the ec2 metadata. if you need more than one DC in your cluster (we need a second virtual DC for analytics), you'll probably want to use the property file snitch. there's a cross region ec2snitch that's coming in 1.0. would probably be good to add some ec2 specific tips in the wiki. the page that dave mentioned is a good step-by-step, but there's been a lot of community knowledge accumulated about best practices in the year since that was done. On Aug 3, 2011, at 8:28 AM, Eldad Yamin wrote: Hi, Is there any manual or important notes I should know before I try to install Cassandra on EC2? Thanks!
Re: cassandra consistency level
So what you're saying is that no matter what consistency level I'm using, the data will be written to all CF nodes right away, the consistency level is just for making sure that all CF nodes are UP and all data is written. In other words, if one of the nodes is down - the write (or read) will fail. I'm asking that because I'm a bit worried with consistency, for example: Every action that my client is doing is stored in a CF.x in a specific column by his user_id. I'm doing that by de-serializing the data that already found in the column, adding new data (the action), serializing and storing the data. so I'm worrying that some of the user actions will drop due low-consistency when there are lots of changes to a specific column in a sort period of time. I know that I can solve this situation in a different way by storing each action in a new column etc... but this is just an example that explain my question in a simple way. Thanks! On Wed, Aug 3, 2011 at 3:21 AM, aaron morton aa...@thelastpickle.comwrote: Not sure I understand your question exactly, but will take a shot… Writes are sent to every UP node, the consistency level is how many nodes we require to complete before we say the request completed successfully. So we also make sure that CL nodes are UP before we start the request. If you run CL ALL then Replication Factor nodes must be up for each key you are writing. With the exception of CL ONE reads are also sent to all UP replicas. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 3 Aug 2011, at 09:32, Eldad Yamin wrote: Is consistency level All for write actually grenty that my data is updated in all of my node? is it apply to read actions as-well? I've read it on the wiki, I just want to make sure. Thanks!
Re: HOW TO select a column or all columns that start with X
Thanks! On Wed, Aug 3, 2011 at 3:03 PM, aaron morton aa...@thelastpickle.comwrote: and AsciiType - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 3 Aug 2011, at 16:35, eldad87 wrote: Thank you! Will this situation work only for UTF8Type comparator? On Wed, Aug 3, 2011 at 4:50 AM, Tyler Hobbs ty...@datastax.com wrote: A minor correction: To get all columns starting with ABC_, you would set column_start=ABC_ and column_finish=ABC` (the '`' character comes after '_'), and ignore the last column in your results if it happened to be ABC`. column_finish, or the slice end in other clients, is inclusive. You could of course use ABC_~ as column_finish and avoid the check if you know that you don't have column names like ABC_~FOO that you want to include. On Tue, Aug 2, 2011 at 7:17 PM, aaron morton aa...@thelastpickle.comwrote: Yup, thats a pretty common pattern. How exactly depends on the client you are using. Say you were using pycassam, you would do a get() http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get with column_start=ABC_ , count to whatever, and column_finish not provided. You can also provide a finish and use the highest encoded character, e.g. ascii 126 is ~ so if you used column_finish = ABC_~ you would get everything that starts with ABC_ Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 3 Aug 2011, at 09:28, Eldad Yamin wrote: Hello, I wonder if I can select a column or all columns that start with X. E.g I have columns ABC_1, ABC_2, ZZZ_1 and I want to select all columns that start with ABC_ - is that possible? Thanks! -- Tyler Hobbs Software Engineer, DataStax http://datastax.com/ Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra Python client library
Install Cassandra on EC2
Hi, Is there any manual or important notes I should know before I try to install Cassandra on EC2? Thanks!
Re: Install Cassandra on EC2
Thanks! But I prefer to learn how to Install first - if you have any good references (I didn't find any, even general installation for a EC2/regular machine) I'm also going to try and install Solandra, I hope that Whirr will support it in the near future. On Wed, Aug 3, 2011 at 5:43 PM, John Conwell j...@iamjohn.me wrote: One thing you might want to look at is the Apache Whirr project (which is awesome by the way!). It automagically handles spinning up a cluster of resources on EC2 (or rackspace for that matter), installing and configuring cassandra, and starting it. One thing to be aware of if you go this route. By default in the yaml file all data is written under the /var folder. But on a server started by Whirr, this folder only has something like 4gb. Most of the hard disk space is under the /mnt folder. So you'll either need to change what folders are pointed to what drives (not sure if you can or not...I'm sure you could), or change the yaml file to point the /mnt folder. On Wed, Aug 3, 2011 at 6:28 AM, Eldad Yamin elda...@gmail.com wrote: Hi, Is there any manual or important notes I should know before I try to install Cassandra on EC2? Thanks! -- Thanks, John C
Cassandra and Solandra Installation guid
Hi, I'd like to get tutorials on how to install Cassandra and Solandra - I couldn't find anything helpful. In addition, how to use (index/search) Solandra tutorials will be great. Thanks!
Installation Exception
Hi, I'm trying to install Cassandra on Amazon EC2 without success, this is what I did: 1. Created new Small EC2 instance (this is just for testing), running Ubuntu OS - custom AIM (ami-596f3c1c) from: http://uec-images.ubuntu.com/releases/11.04/release/ 2. Installed Java: # sudo add-apt-repository deb http://archive.canonical.com/ lucid partner # sudo apt-get update # sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts openjdk-6-jre 3. Upgraded: # sudo apt-get upgrade 4. Downloaded Cassandra: # cd /usr/src/ # sudo wget http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz # sudo tar xvfz apache-cassandra-* # cd apache-cassandra-* 5. Config (according to README.txt) # sudo mkdir -p /var/log/cassandra # sudo chown -R `whoami` /var/log/cassandra # sudo mkdir -p /var/lib/cassandra # sudo chown -R `whoami` /var/lib/cassandra 6. RUN CASSANDRA # bin/cassandra -f The I got Exception: ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$ bin/cassandra -f Exception in thread main java.lang.NoClassDefFoundError: org/apache/cassandra/thrift/CassandraDaemon Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.thrift.CassandraDaemon at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon. Program will exit. Any idea what is wrong? Thanks!
Re: Installation Exception
Thanks Jonathan, I saw the EC2 AMI that was made by datastax - I prefer not to use it becuse I want to learn how to install Cassandra first. On Wed, Aug 3, 2011 at 8:03 PM, Jonathan Ellis jbel...@gmail.com wrote: http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami On Wed, Aug 3, 2011 at 10:44 AM, Eldad Yamin elda...@gmail.com wrote: Hi, I'm trying to install Cassandra on Amazon EC2 without success, this is what I did: Created new Small EC2 instance (this is just for testing), running Ubuntu OS - custom AIM (ami-596f3c1c) from: http://uec-images.ubuntu.com/releases/11.04/release/ Installed Java: # sudo add-apt-repository deb http://archive.canonical.com/ lucid partner # sudo apt-get update # sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts openjdk-6-jre Upgraded: # sudo apt-get upgrade Downloaded Cassandra: # cd /usr/src/ # sudo wget http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz # sudo tar xvfz apache-cassandra-* # cd apache-cassandra-* Config (according to README.txt) # sudo mkdir -p /var/log/cassandra # sudo chown -R `whoami` /var/log/cassandra # sudo mkdir -p /var/lib/cassandra # sudo chown -R `whoami` /var/lib/cassandra RUN CASSANDRA # bin/cassandra -f The I got Exception: ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$ bin/cassandra -f Exception in thread main java.lang.NoClassDefFoundError: org/apache/cassandra/thrift/CassandraDaemon Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.thrift.CassandraDaemon at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon. Program will exit. Any idea what is wrong? Thanks! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Installation Exception
Thanks! I missed that lol! BTW, how do I compile it? Thanks! On Wed, Aug 3, 2011 at 6:51 PM, samal sa...@wakya.in wrote: did u compile source code? :) you have downloaded source code not binary. try with binary. On Wed, Aug 3, 2011 at 9:14 PM, Eldad Yamin elda...@gmail.com wrote: Hi, I'm trying to install Cassandra on Amazon EC2 without success, this is what I did: 1. Created new Small EC2 instance (this is just for testing), running Ubuntu OS - custom AIM (ami-596f3c1c) from: http://uec-images.ubuntu.com/releases/11.04/release/ 2. Installed Java: # sudo add-apt-repository deb http://archive.canonical.com/ lucid partner # sudo apt-get update # sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts openjdk-6-jre 3. Upgraded: # sudo apt-get upgrade 4. Downloaded Cassandra: # cd /usr/src/ # sudo wget http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz # sudo tar xvfz apache-cassandra-* # cd apache-cassandra-* 5. Config (according to README.txt) # sudo mkdir -p /var/log/cassandra # sudo chown -R `whoami` /var/log/cassandra # sudo mkdir -p /var/lib/cassandra # sudo chown -R `whoami` /var/lib/cassandra 6. RUN CASSANDRA # bin/cassandra -f The I got Exception: ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$ bin/cassandra -f Exception in thread main java.lang.NoClassDefFoundError: org/apache/cassandra/thrift/CassandraDaemon Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.thrift.CassandraDaemon at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon. Program will exit. Any idea what is wrong? Thanks!
Solandra
Hello, I have a cluster of 3 Cassandra nodes and I would like to start using Solandra. 1. How can I install Solandra and make use the existing nodes? 2. Will it be better to install Solandra on a new node and add it to the existing cluster? 3. How Solandra index, does it operate automatically or I need to tell Solandra to index CF.keys every time a new key is create or update? Thanks!
Question about eventually consistent in Cassandra
Hi, Let’s say that I have 2 datacenters, a key is changed on both of my datacenters in the exact same time (even in 1-2 seconds diff). Datacenter #1 add column abc with value X Datacenter #2 add column abc with value Y. What is the result of that situation? Is there any different if the changes will be made withing the same data center? Thanks! Eldad Yamin
HOW TO select a column or all columns that start with X
Hello, I wonder if I can select a column or all columns that start with X. E.g I have columns ABC_1, ABC_2, ZZZ_1 and I want to select all columns that start with ABC_ - is that possible? Thanks!
cassandra consistency level
Is consistency level All for write actually grenty that my data is updated in all of my node? is it apply to read actions as-well? I've read it on the wiki, I just want to make sure. Thanks!
geo-data in Cassandra
Hello, I'm trying to save geo-data in Cassandra, according to SimpleGeo they did that using nested tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php I wonder if someone already implement something like that and how he accomplished that without transaction supports (while the tree keep evolving)? In addition what consistency level he used? Thanks!
Question about eventually consistent
Hi, Let’s say that I have 2 datacenters, a key is changed on both of my datacenters in the exact same time (even in 1-2 seconds diff). Datacenter #1 remove a column and Datacenter #2 add 2 new columns. Is there any problem with consistency or Cassandra will handle this situation easily. Thanks!
Re: b-tree
In order order to split the nodes. SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if the number is 1,000 they split the node. In order to avoid that more then 1 process will edit/split the node - transaction is needed. On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote: But how will you be able to maintain it while it evolves and new data is added without transactions? What is the situation you think you need transactions for ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 00:06, Eldad Yamin wrote: Aaron, Nested set is exactly what I had in mind. But how will you be able to maintain it while it evolves and new data is added without transactions? Thanks! On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com wrote: Just throwing out a (half baked) idea, perhaps the Nested Set Model of trees would work http://en.wikipedia.org/wiki/Nested_set_model * Ever row would represent a set with a left and right encoded into the key * Members are inserted as columns into *every* set / row they are a member. So we are de-normalising and trading space for time. * May need to maintain a custom secondary index of the materialised sets. e.g. slice a row to get the first column = the left value you are interested in, that is the key for the set. I've not thought it through much further than that, a lot would depend on your data. The top sets may get very big, . Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote: Im not sure if I have an answer for you, anyway, but I'm curious A b-tree and a binary tree are not the same thing. A binary tree is a basic fundamental data structure, A b-tree is an approach to storing and indexing data on disc for a database. Which do you mean? On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote: Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using b-tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php Thanks! -- It's always darkest just before you are eaten by a grue.
Re: b-tree
Aaron, Nested set is exactly what I had in mind. But how will you be able to maintain it while it evolves and new data is added without transactions? Thanks! On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.comwrote: Just throwing out a (half baked) idea, perhaps the Nested Set Model of trees would work http://en.wikipedia.org/wiki/Nested_set_model http://en.wikipedia.org/wiki/Nested_set_model* Ever row would represent a set with a left and right encoded into the key * Members are inserted as columns into *every* set / row they are a member. So we are de-normalising and trading space for time. * May need to maintain a custom secondary index of the materialised sets. e.g. slice a row to get the first column = the left value you are interested in, that is the key for the set. I've not thought it through much further than that, a lot would depend on your data. The top sets may get very big, . Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote: Im not sure if I have an answer for you, anyway, but I'm curious A b-tree and a binary tree are not the same thing. A binary tree is a basic fundamental data structure, A b-tree is an approach to storing and indexing data on disc for a database. Which do you mean? On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote: Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using b-tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php Thanks! -- It's always darkest just before you are eaten by a grue.
Re: b-tree
Hi Jeffery, I meant for binary tree. go an watch the video (in my first email), it will give you a better understanding. Eldad On Wed, Jul 20, 2011 at 11:33 PM, Jeffrey Kesselman jef...@gmail.comwrote: Im not sure if I have an answer for you, anyway, but I'm curious A b-tree and a binary tree are not the same thing. A binary tree is a basic fundamental data structure, A b-tree is an approach to storing and indexing data on disc for a database. Which do you mean? On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote: Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using b-tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php Thanks! -- It's always darkest just before you are eaten by a grue.
Re: how to stop the whole cluster, start the whole cluster like in hadoop/hbase?
I wonder if it wont make problems... Anyine did it already? On Jul 21, 2011 10:39 PM, Jonathan Ellis jbel...@gmail.com wrote: dsh -c -g cassandra /etc/init.d/cassandra stop http://www.netfort.gr.jp/~dancer/software/dsh.html.en P.S. mostly people are concerned about making sure their entire cluster does NOT stop at the same time :) On Thu, Jul 21, 2011 at 2:23 PM, Dean Hiller d...@alvazan.com wrote: Is there a framework for stopping all nodes/starting all nodes for cassandra? I am okay with something like password-less ssh setup that hadoop scripts did...just something that allows me to start and stop the whole cluster. thanks, Dean -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
b-tree
Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using b-tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php Thanks!
Re: Cassandra Secondary index/Twissandra
Hi Aaron, Thank you again for your response. I've read the article but I didn't understand everything. it would be great if the benchmark will include the actual CLI/Python comments (that way it will be easier to understand the query). in addition, an explanation about row pages - what is it?. Anyway, for a scale proportion, we can take as example the average Facebook/Twitter user which can get 100K columns per user (Userline). So what is needed is to take the first 50 columns (order by TimeUUID), then column 51 to 100, 101 to 150 etc. Any suggestion on fast will it be? or how you recommend on configuring Cassandra? or even a different way of achieving that goal? Thanks, Eldad. On Sun, Jul 10, 2011 at 8:31 PM, aaron morton aa...@thelastpickle.comwrote: Can you recommend on a better way of doing that or a way to tune Cassandra to support those 2 CF? A select with no start or finish column name, a column count and not in reversed order is about the fastest read query. You will need to do a reversed query, which will be a little slower. But may still be plenty fast enough, depending on scale and throughput and all those other things. see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 10 Jul 2011, at 00:14, Eldad Yamin wrote: Aaron - Thank you for the fast response! 1. Does performance decrease (significantly) if the uniqueness of the column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns? Depends on what sort of operations you are doing. Some read operations have to pay a constant cost to decode the row level column index, this can be tuned though. AFAIK the comparator type has very little to do with the performance. In Twissandra, the columns are used as alternative index for the Userline/Timeline. therefore the operation I'm going to do is slice_range. I'm going to get (for example) the first 50 columns (using comparator of TimeUUID/LONG). Can you recommend on a better way of doing that or a way to tune Cassandra to support those 2 CF? Thanks! On Sun, Jul 10, 2011 at 3:26 AM, aaron morton aa...@thelastpickle.comwrote: 1. Is there a limit on the number of columns in a single column family that serve as secondary indexes? AFAIK there is no coded limit, however every index is implemented as another (hidden) Column Family that inherits the settings of the parent CF. So under 0.7 you may run out of memory, under 0.8 you may flush a lot. Also, when an indexed column is updated there are potentially 3 operations that have to happen: read the old value, delete the old value, write the new value. More indexes == more index updating, just like any other database. 1. Does performance decrease (significantly) if the uniqueness of the column’s values is high? Low cardinality is recommended http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html 1. The CF for Userline/Uimeline - have comparator of LONG_TYPE and not TimeUUID? Probably just to make the demo easier. It's used to order tweets in the user and public timelines by the current time https://github.com/twissandra/twissandra/blob/master/cass.py#L204 1. Does performance decrease (significantly) if the uniqueness of the column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns? Depends on what sort of operations you are doing. Some read operations have to pay a constant cost to decode the row level column index, this can be tuned though. AFAIK the comparator type has very little to do with the performance. Hope that helps. - - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 9 Jul 2011, at 12:15, Eldad Yamin wrote: Hi, I have few questions: *Secondary index* 1. Is there a limit on the number of columns in a single column family that serve as secondary indexes? 2. Does performance decrease (significantly) if the uniqueness of the column’s values is high? *Twissandra* 1. Why in the source (or any tutorial I've read): The CF for Userline/Uimeline - have comparator of LONG_TYPE and not TimeUUID? https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py 2. Does performance decrease (significantly) if the uniqueness of the column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns? Thanks! Eldad
Re: Cassandra Secondary index/Twissandra
Aaron - Thank you for the fast response! 1. Does performance decrease (significantly) if the uniqueness of the column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns? Depends on what sort of operations you are doing. Some read operations have to pay a constant cost to decode the row level column index, this can be tuned though. AFAIK the comparator type has very little to do with the performance. In Twissandra, the columns are used as alternative index for the Userline/Timeline. therefore the operation I'm going to do is slice_range. I'm going to get (for example) the first 50 columns (using comparator of TimeUUID/LONG). Can you recommend on a better way of doing that or a way to tune Cassandra to support those 2 CF? Thanks! On Sun, Jul 10, 2011 at 3:26 AM, aaron morton aa...@thelastpickle.comwrote: 1. Is there a limit on the number of columns in a single column family that serve as secondary indexes? AFAIK there is no coded limit, however every index is implemented as another (hidden) Column Family that inherits the settings of the parent CF. So under 0.7 you may run out of memory, under 0.8 you may flush a lot. Also, when an indexed column is updated there are potentially 3 operations that have to happen: read the old value, delete the old value, write the new value. More indexes == more index updating, just like any other database. 1. Does performance decrease (significantly) if the uniqueness of the column’s values is high? Low cardinality is recommended http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html 1. The CF for Userline/Uimeline - have comparator of LONG_TYPE and not TimeUUID? Probably just to make the demo easier. It's used to order tweets in the user and public timelines by the current time https://github.com/twissandra/twissandra/blob/master/cass.py#L204 1. Does performance decrease (significantly) if the uniqueness of the column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns? Depends on what sort of operations you are doing. Some read operations have to pay a constant cost to decode the row level column index, this can be tuned though. AFAIK the comparator type has very little to do with the performance. Hope that helps. - - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 9 Jul 2011, at 12:15, Eldad Yamin wrote: Hi, I have few questions: *Secondary index* 1. Is there a limit on the number of columns in a single column family that serve as secondary indexes? 2. Does performance decrease (significantly) if the uniqueness of the column’s values is high? *Twissandra* 1. Why in the source (or any tutorial I've read): The CF for Userline/Uimeline - have comparator of LONG_TYPE and not TimeUUID? https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py 2. Does performance decrease (significantly) if the uniqueness of the column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns? Thanks! Eldad
Re: Pre-CassandraSF Happy Hour on Sunday
Can you please Watchitoo.com (its' free) and broadcast the event? On Fri, Jul 8, 2011 at 8:54 PM, Richard Low r...@acunu.com wrote: Hi all, If you're in San Francisco for CassandraSF on Monday 11th, then come and join fellow Cassandra users and committers on Sunday evening. Starting at 6:30pm at ThirstyBear, the famous brewing company. We'll have drinks, food and more. RSVP at Eventbrite: http://pre-cassandrasf-happyhour.eventbrite.com/ Hope you can join us! -- Richard Low Acunu | http://www.acunu.com | @acunu
Cassandra Secondary index/Twissandra
Hi, I have few questions: *Secondary index* 1. Is there a limit on the number of columns in a single column family that serve as secondary indexes? 2. Does performance decrease (significantly) if the uniqueness of the column’s values is high? *Twissandra* 1. Why in the source (or any tutorial I've read): The CF for Userline/Uimeline - have comparator of LONG_TYPE and not TimeUUID? https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py 2. Does performance decrease (significantly) if the uniqueness of the column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of columns? Thanks! Eldad
Re: Any meet ups in southern california
You can use Watchitoo.com (LIke GoToMeeting/WebEX) to host an event. using that tool, everyone around the world can join and take action. the great thing about is that it's FREE! On Wed, Jul 6, 2011 at 10:25 PM, Mike Rapuano mikerapu...@gmail.com wrote: Hi all Are there any active cassandra meet ups in southern California? Thanks Mike