Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Eldad Yamin
in addition, if you don't know how many rows will be needed - in each row,
you can store the key of the next one.
Just like in a linked list.

OR

have 1 row that will hold all the keys that combining your other rows.
1st select the main row (with the keys), then select the other rows.



On Mon, Jul 23, 2012 at 3:40 PM, rohit bhatia rohit2...@gmail.com wrote:

 You should probably try to break the one row scheme to
 2*Number_of_nodes rows scheme.. This should ensure proper distribution
 of rows and still allow u to query from a few fixed number of rows.
 How u do it depends on how are u gonna choose ur 200-500 columns
 during reading (try having them in the same row)

 Even if u r forced to put them in seperate rows, u can make the row
 key as some modulus of hash of column name, ensuring symmetry and
 easy access of columns...

 On Mon, Jul 23, 2012 at 6:02 PM, Ertio Lew ertio...@gmail.com wrote:
  Any ideas/suggestions please?



Re: Cassandra London: failure modes and HBase

2011-08-17 Thread Eldad Yamin
HI Dave,
unfortunately, me and some guys that are very interesting won't be able to
get all the way to London.
Can you please consider using a video streaming service?

I recommend on using Watchitoo.com (I used to work there)
At the moment its free.

Thanks!

On Tue, Aug 16, 2011 at 12:47 PM, Dave Gardner d...@cruft.co wrote:

 Hi all,

 I'm pleased to announce our next Cassandra meetup on 5th September in
 London.

 http://www.meetup.com/Cassandra-London/events/29668191/

 We will be looking at failure modes in Cassandra (how it deals with nodes
 failing and returning etc..) as well as a comparison with HBase.  It's a
 great opportunity to meet other users of Cassandra, so please come along!


 Dave



Re: Best practices when deploying upgrading a cassandra cluster

2011-08-14 Thread Eldad Yamin
Is there any good reason why shouldn't we build the latest version from
source?

Thanks!
On Fri, Aug 12, 2011 at 12:18 AM, aaron morton aa...@thelastpickle.comwrote:

 In a non dev system it's a lot easier to use the packages
 http://wiki.apache.org/cassandra/DebianPackaging
 http://www.datastax.com/docs/0.8/install/packaged_releases

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 12 Aug 2011, at 02:30, Martin Lansler wrote:

 (Note: This is a repost from another thread which did not have a
 relevant subject, sorry for the spamming)

 Hi Eldad / All,

 On Wed, Aug 10, 2011 at 8:32 AM, Eldad Yamin elda...@gmail.com wrote:

 Can you please explain how did you upgraded. something like step-by-step.

 Thanks!


 I took the liberty of replying to the group as it would be interesting
 to hear how other folks out there are doing it...

 I'm *not* running a prod system, just a test system of three nodes on
 my laptop. So it would be nice to hear about real setups. Here is my
 test setup:

 apache-cassandra - apache-cassandra-0.8.3
 apache-cassandra-0.8.2/
 apache-cassandra-0.8.3/
 node1/
 node2/
 node3/

 All nodeX look like:
 bin - ../apache-cassandra/bin/
 commitlog/
 conf/
 data/
 interface - ../apache-cassandra/interface/
 lib - ../apache-cassandra/lib/
 saved_caches/

 The 'conf' directory is copied into each node from the virgin
 cassandra distribution. I then create a local GIT repo and add the
 'conf' directory so I can track any configuration changes on a node.
 Then relevant node specific configuration settings are set. The
 'commitlog', 'data' and 'saved_caches' are created by cassandra and
 must be configured in 'cassandra.yaml' for each node.

 When I upgrade I do the following:

 1.
 Make a diff of the new conf files from the new version so that  get
 new parameters etc... I use emacs ediff-mode.
 2.
 Remove the old apache-cassandra symlink and point it to the new cassandra
 dist
 3.
 In a rolling fashion stop one node, and then restart it... as the
 symlink is changes it will then boot with the upgraded cassandra dist.
 (remember to cd out  in of the bin/ dir otherwise you will still be
 in the old directory).
 (4).
 Should something break... just re-create the old symlink and restart
 the node (provided cassandra has not performed any non backwards
 compatible changes to the db files, should be noted in the README)

 That's pretty much it.

 On a prod setup one would probably use a tool such as puppet
 (www.puppetlabs.com/) to ease setting up on many nodes... But there
 are many ways to do this, for instance pssh
 (http://code.google.com/p/parallel-ssh/).

 Regards,
 -Martin





Re: Planet Cassandra (an aggregation site for Cassandra News)

2011-08-07 Thread Eldad Yamin
Great!
If possible, please blog about full-text-search options + how to use
them (Solandra, Elastic Search, Sphinx etc).

Thanks!

On Sun, Aug 7, 2011 at 5:58 AM, Edward Capriolo edlinuxg...@gmail.comwrote:



 On Thu, Aug 4, 2011 at 5:12 AM, Boris Yen yulin...@gmail.com wrote:

 Looking forward to it. ^^

 On Thu, Aug 4, 2011 at 1:56 PM, Eldad Yamin elda...@gmail.com wrote:

 Great! I hope it will be open soon!


 On Wed, Aug 3, 2011 at 10:33 PM, Ed Anuff e...@anuff.com wrote:

 Awesome, great news!


 On Wed, Aug 3, 2011 at 11:53 AM, Lynn Bender line...@gmail.com wrote:

 Greetings all,

 I just wanted to send a note out to let everyone know about Planet
 Cassandra -- an aggregation site for Cassandra news and blogs. Andrew
 Llavore from DataStax and I built the site.

 We are currently waiting for approval from the Apache Software
 Foundation before we publicly launch. However, in the meantime, we'd love 
 to
 hear from you. If you have any favorite Cassandra-related blogs, or blogs
 that frequently contain quality Cassandra content, please send us the URL,
 so that we can contact the author about including a site feed.

 If you have any questions or comments, please send them to
 pla...@geekaustin.org.

 -Lynn Bender

 --
 -Lynn Bender
 http://geekaustin.org
 http://linuxagainstpoverty.org
 http://twitter.com/linearb
 http://twitter.com/geekaustin







 I have started a blog to support the High Performance Cassandra Cookbook:

 http://www.jointhegrid.com/highperfcassandra/

 I am going to use blog to continue writing about features and tips for
 Cassandra in the writing style used for the book.

 Lynn, please consider it for syndication. All others, please enjoy.




Re: Install Cassandra on EC2

2011-08-04 Thread Eldad Yamin
HI Aaron,
Thanks for your replay.

I've already saw that, but at the moment I'm interesting in installing
Cassandra from scratch - I want to learn.
well, yesterday I've installed 1 node - now I'm looking on how to add more
nodes and read more about Cassandra's tools (node reaper etc.)

Thanks!

On Thu, Aug 4, 2011 at 1:23 AM, aaron morton aa...@thelastpickle.comwrote:

 Pre build AMI here

 http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 4 Aug 2011, at 03:24, Jeremy Hanna wrote:

 Some quick thoughts that might be helpful:

 - use ephemeral instances and RAID0 over the local volumes for both
 cassandra's data as well as the log directory.  The log directory because if
 you crash due to heap size, the heap dump will be stored in the log
 directory.  you don't want that to go in your root/OS partition.

 - probably want to stripe across AZs so that a single AZ failure doesn't
 affect you as much.

 - for seeds, it's nice to use elastic ips so that your seed configuration
 doesn't have to change if a node is replaced.

 - the ec2snitch makes it so each AZ appears as a rack wrt topology -
 simpler as it inspects the ec2 metadata.  if you need more than one DC in
 your cluster (we need a second virtual DC for analytics), you'll probably
 want to use the property file snitch.  there's a cross region ec2snitch
 that's coming in 1.0.

 would probably be good to add some ec2 specific tips in the wiki.  the page
 that dave mentioned is a good step-by-step, but there's been a lot of
 community knowledge accumulated about best practices in the year since that
 was done.

 On Aug 3, 2011, at 8:28 AM, Eldad Yamin wrote:

 Hi,

 Is there any manual or important notes I should know before I try to
 install Cassandra on EC2?


 Thanks!






Re: cassandra consistency level

2011-08-03 Thread Eldad Yamin
So what you're saying is that no matter what consistency level I'm using,
the data will be written to all CF nodes right away, the consistency level
is just for making sure that all CF nodes are UP and all data is written.
In other words, if one of the nodes is down - the write (or read) will fail.

I'm asking that because I'm a bit worried with consistency, for example:
Every action that my client is doing is stored in a CF.x in a specific
column by his user_id.
I'm doing that by de-serializing the data that already found in the column,
adding new data (the action), serializing and storing the data.
so I'm worrying that some of the user actions will drop due
low-consistency when there are lots of changes to a specific column in a
sort period of time.
I know that I can solve this situation in a different way by storing each
action in a new column etc... but this is just an example that explain my
question in a simple way.

Thanks!



On Wed, Aug 3, 2011 at 3:21 AM, aaron morton aa...@thelastpickle.comwrote:

 Not sure I understand your question exactly, but will take a shot…

 Writes are sent to every UP node, the consistency level is how many nodes
 we require to complete before we say the request completed successfully. So
 we also make sure that CL nodes are UP before we start the request. If you
 run CL ALL then Replication Factor nodes must be up for each key you are
 writing.

 With the exception of CL ONE reads are also sent to all UP replicas.

 Hope that helps.

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 3 Aug 2011, at 09:32, Eldad Yamin wrote:

  Is consistency level All for write actually grenty that my data is
 updated in all of my node?
  is it apply to read actions as-well?
 
  I've read it on the wiki, I just want to make sure.
  Thanks!




Re: HOW TO select a column or all columns that start with X

2011-08-03 Thread Eldad Yamin
Thanks!

On Wed, Aug 3, 2011 at 3:03 PM, aaron morton aa...@thelastpickle.comwrote:

 and AsciiType


 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 3 Aug 2011, at 16:35, eldad87 wrote:

 Thank you!
 Will this situation work only for UTF8Type comparator?


 On Wed, Aug 3, 2011 at 4:50 AM, Tyler Hobbs ty...@datastax.com wrote:

 A minor correction:

 To get all columns starting with ABC_, you would set column_start=ABC_
 and column_finish=ABC` (the '`' character comes after '_'), and ignore the
 last column in your results if it happened to be ABC`.

 column_finish, or the slice end in other clients, is inclusive.  You
 could of course use ABC_~ as column_finish and avoid the check if you know
 that you don't have column names like ABC_~FOO that you want to include.


 On Tue, Aug 2, 2011 at 7:17 PM, aaron morton aa...@thelastpickle.comwrote:

 Yup, thats a pretty common pattern. How exactly depends on the client you
 are using.

 Say you were using pycassam, you would do a get()
 http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get

 with column_start=ABC_ , count to whatever, and column_finish not
 provided.

 You can also provide a finish and use the highest encoded character, e.g.
 ascii 126 is ~ so if you used column_finish = ABC_~ you would get
 everything that starts with ABC_

 Cheers

  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 3 Aug 2011, at 09:28, Eldad Yamin wrote:

 Hello,
 I wonder if I can select a column or all columns that start with X.
 E.g I have columns ABC_1, ABC_2, ZZZ_1 and I want to select all columns
 that start with ABC_ - is that possible?



 Thanks!





 --
 Tyler Hobbs
 Software Engineer, DataStax http://datastax.com/
 Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
 Python client library






Install Cassandra on EC2

2011-08-03 Thread Eldad Yamin
Hi,
Is there any manual or important notes I should know before I try to install
Cassandra on EC2?

Thanks!


Re: Install Cassandra on EC2

2011-08-03 Thread Eldad Yamin
Thanks!
But I prefer to learn how to Install first - if you have any good references
(I didn't find any, even general installation for a EC2/regular machine)
I'm also going to try and install Solandra, I hope that Whirr will support
it in the near future.

On Wed, Aug 3, 2011 at 5:43 PM, John Conwell j...@iamjohn.me wrote:

 One thing you might want to look at is the Apache Whirr project (which is
 awesome by the way!).  It automagically handles spinning up a cluster of
 resources on EC2 (or rackspace for that matter), installing and configuring
 cassandra, and starting it.

 One thing to be aware of if you go this route.  By default in the yaml file
 all data is written under the /var folder.  But on a server started by
 Whirr, this folder only has something like 4gb.  Most of the  hard disk
 space is under the /mnt folder.  So you'll either need to change what
 folders are pointed to what drives (not sure if you can or not...I'm sure
 you could), or change the yaml file to point the /mnt folder.


 On Wed, Aug 3, 2011 at 6:28 AM, Eldad Yamin elda...@gmail.com wrote:

 Hi,
 Is there any manual or important notes I should know before I try to
 install Cassandra on EC2?

 Thanks!




 --

 Thanks,
 John C




Cassandra and Solandra Installation guid

2011-08-03 Thread Eldad Yamin
Hi,
I'd like to get tutorials on how to install Cassandra and Solandra - I
couldn't find anything helpful.
In addition, how to use (index/search) Solandra tutorials will be great.


Thanks!


Installation Exception

2011-08-03 Thread Eldad Yamin
Hi,
I'm trying to install Cassandra on Amazon EC2 without success, this is what
I did:

   1. Created new Small EC2 instance (this is just for testing), running
   Ubuntu OS - custom AIM (ami-596f3c1c) from:
   http://uec-images.ubuntu.com/releases/11.04/release/
   2. Installed Java:
   # sudo add-apt-repository deb http://archive.canonical.com/ lucid
   partner
   # sudo apt-get update
   # sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts
   openjdk-6-jre
   3. Upgraded:
   # sudo apt-get upgrade
   4. Downloaded Cassandra:
   # cd /usr/src/
   # sudo wget
   http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz

   # sudo tar xvfz apache-cassandra-*
   # cd apache-cassandra-*
   5. Config (according to README.txt)
   # sudo mkdir -p /var/log/cassandra
   # sudo chown -R `whoami` /var/log/cassandra
   # sudo mkdir -p /var/lib/cassandra
   # sudo chown -R `whoami` /var/lib/cassandra
   6. RUN CASSANDRA
   # bin/cassandra -f

The I got Exception:
ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$ bin/cassandra
-f
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/cassandra/thrift/CassandraDaemon
Caused by: java.lang.ClassNotFoundException:
org.apache.cassandra.thrift.CassandraDaemon
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon.
Program will exit.


Any idea what is wrong?
Thanks!


Re: Installation Exception

2011-08-03 Thread Eldad Yamin
Thanks Jonathan,
I saw the EC2 AMI that was made by datastax - I prefer not to use it becuse
I want to learn how to install Cassandra first.

On Wed, Aug 3, 2011 at 8:03 PM, Jonathan Ellis jbel...@gmail.com wrote:


 http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami

 On Wed, Aug 3, 2011 at 10:44 AM, Eldad Yamin elda...@gmail.com wrote:
  Hi,
  I'm trying to install Cassandra on Amazon EC2 without success, this is
 what
  I did:
 
  Created new Small EC2 instance (this is just for testing), running
 Ubuntu
  OS - custom AIM (ami-596f3c1c) from:
  http://uec-images.ubuntu.com/releases/11.04/release/
  Installed Java:
  # sudo add-apt-repository deb http://archive.canonical.com/ lucid
 partner
  # sudo apt-get update
  # sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts
  openjdk-6-jre
  Upgraded:
  # sudo apt-get upgrade
  Downloaded Cassandra:
  # cd /usr/src/
  # sudo wget
 
 http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz
  # sudo tar xvfz apache-cassandra-*
  # cd apache-cassandra-*
  Config (according to README.txt)
  # sudo mkdir -p /var/log/cassandra
  # sudo chown -R `whoami` /var/log/cassandra
  # sudo mkdir -p /var/lib/cassandra
  # sudo chown -R `whoami` /var/lib/cassandra
  RUN CASSANDRA
  # bin/cassandra -f
 
  The I got Exception:
  ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$
 bin/cassandra
  -f
  Exception in thread main java.lang.NoClassDefFoundError:
  org/apache/cassandra/thrift/CassandraDaemon
  Caused by: java.lang.ClassNotFoundException:
  org.apache.cassandra.thrift.CassandraDaemon
  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
  Could not find the main class:
 org.apache.cassandra.thrift.CassandraDaemon.
  Program will exit.
 
  Any idea what is wrong?
  Thanks!



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Installation Exception

2011-08-03 Thread Eldad Yamin
Thanks! I missed that lol!
BTW, how do I compile it?

Thanks!

On Wed, Aug 3, 2011 at 6:51 PM, samal sa...@wakya.in wrote:

 did u compile source code? :)
 you have downloaded source code not binary.

 try with binary.

 On Wed, Aug 3, 2011 at 9:14 PM, Eldad Yamin elda...@gmail.com wrote:

 Hi,
 I'm trying to install Cassandra on Amazon EC2 without success, this is
 what I did:

1. Created new Small EC2 instance (this is just for testing),
running Ubuntu OS - custom AIM (ami-596f3c1c) from:
http://uec-images.ubuntu.com/releases/11.04/release/
2. Installed Java:
# sudo add-apt-repository deb http://archive.canonical.com/ lucid
partner
# sudo apt-get update
# sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts
openjdk-6-jre
3. Upgraded:
# sudo apt-get upgrade
4. Downloaded Cassandra:
# cd /usr/src/
# sudo wget

 http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz

# sudo tar xvfz apache-cassandra-*
# cd apache-cassandra-*
5. Config (according to README.txt)
# sudo mkdir -p /var/log/cassandra
# sudo chown -R `whoami` /var/log/cassandra
# sudo mkdir -p /var/lib/cassandra
# sudo chown -R `whoami` /var/lib/cassandra
6. RUN CASSANDRA
# bin/cassandra -f

 The I got Exception:
 ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$
 bin/cassandra -f
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/cassandra/thrift/CassandraDaemon
 Caused by: java.lang.ClassNotFoundException:
 org.apache.cassandra.thrift.CassandraDaemon
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 Could not find the main class:
 org.apache.cassandra.thrift.CassandraDaemon. Program will exit.


 Any idea what is wrong?
 Thanks!





Solandra

2011-08-03 Thread Eldad Yamin
Hello,
I have a cluster of 3 Cassandra nodes and I would like to start using
Solandra.
1. How can I install Solandra and make use the existing nodes?
2. Will it be better to install Solandra on a new node and add it to the
existing cluster?
3. How Solandra index, does it operate automatically or I need to tell
Solandra to index CF.keys every time a new key is create or update?

Thanks!


Question about eventually consistent in Cassandra

2011-08-02 Thread Eldad Yamin
Hi,
Let’s say that I have 2 datacenters, a key is changed on both of my
datacenters in the exact same time (even in 1-2 seconds diff).
Datacenter #1 add column abc with value X Datacenter #2 add column abc
with value Y.

What is the result of that situation?
Is there any different if the changes will be made withing the same data
center?

Thanks!
Eldad Yamin


HOW TO select a column or all columns that start with X

2011-08-02 Thread Eldad Yamin
Hello,
I wonder if I can select a column or all columns that start with X.
E.g I have columns ABC_1, ABC_2, ZZZ_1 and I want to select all columns that
start with ABC_ - is that possible?



Thanks!


cassandra consistency level

2011-08-02 Thread Eldad Yamin
Is consistency level All for write actually grenty that my data is updated
in all of my node?
is it apply to read actions as-well?

I've read it on the wiki, I just want to make sure.
Thanks!


geo-data in Cassandra

2011-08-02 Thread Eldad Yamin
Hello,
I'm trying to save geo-data in Cassandra,
according to SimpleGeo they did that using nested tree:
http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php


I wonder if someone already implement something like that and how he
accomplished that without transaction supports (while the tree keep
evolving)?
In addition what consistency level he used?

Thanks!


Question about eventually consistent

2011-07-31 Thread Eldad Yamin
Hi,

Let’s say that I have 2 datacenters, a key is changed on both of my
datacenters in the exact same time (even in 1-2 seconds diff).

Datacenter #1 remove a column and Datacenter #2 add 2 new columns.
Is there any problem with consistency or Cassandra will handle this
situation easily.



Thanks!


Re: b-tree

2011-07-22 Thread Eldad Yamin
In order order to split the nodes.
SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if
the number is 1,000 they split the node.
In order to avoid that more then 1 process will edit/split the node -
transaction is needed.
On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote:
 But how will you be able to maintain it while it evolves and new data is
added without transactions?

 What is the situation you think you need transactions for ?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22 Jul 2011, at 00:06, Eldad Yamin wrote:

 Aaron,
 Nested set is exactly what I had in mind.
 But how will you be able to maintain it while it evolves and new data is
added without transactions?

 Thanks!

 On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com
wrote:
 Just throwing out a (half baked) idea, perhaps the Nested Set Model of
trees would work http://en.wikipedia.org/wiki/Nested_set_model

 * Ever row would represent a set with a left and right encoded into the
key
 * Members are inserted as columns into *every* set / row they are a
member. So we are de-normalising and trading space for time.
 * May need to maintain a custom secondary index of the materialised sets.
e.g. slice a row to get the first column = the left value you are
interested in, that is the key for the set.

 I've not thought it through much further than that, a lot would depend on
your data. The top sets may get very big, .

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:

 Im not sure if I have an answer for you, anyway, but I'm curious

 A b-tree and a binary tree are not the same thing. A binary tree is a
basic fundamental data structure, A b-tree is an approach to storing and
indexing data on disc for a database.

 Which do you mean?

 On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote:
 Hello,
 Is there any good way of storing a binary-tree in Cassandra?
 I wonder if someone already implement something like that and how
accomplished that without transaction supports (while the tree keep
evolving)?

 I'm asking that becouse I want to save geospatial-data, and SimpleGeo
did it using b-tree:
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php

 Thanks!



 --
 It's always darkest just before you are eaten by a grue.





Re: b-tree

2011-07-21 Thread Eldad Yamin
Aaron,
Nested set is exactly what I had in mind.
But how will you be able to maintain it while it evolves and new data is
added without transactions?

Thanks!

On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.comwrote:

 Just throwing out a (half baked) idea, perhaps the Nested Set Model of
 trees would work  http://en.wikipedia.org/wiki/Nested_set_model

 http://en.wikipedia.org/wiki/Nested_set_model* Ever row would represent
 a set with a left and right encoded into the key
 * Members are inserted as columns into *every* set / row they are a member.
 So we are de-normalising and trading space for time.
 * May need to maintain a custom secondary index of the materialised sets.
 e.g. slice a row to get the first column = the left value you are
 interested in, that is the key for the set.

 I've not thought it through much further than that, a lot would depend on
 your data. The top sets may get very big, .

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:

 Im not sure if I have an answer for you, anyway, but I'm curious

 A b-tree and a binary tree are not the same thing.  A binary tree is a
 basic fundamental data structure,  A b-tree is an approach to storing and
 indexing data on disc for a database.

 Which do you mean?

 On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote:

 Hello,
 Is there any good way of storing a binary-tree in Cassandra?
 I wonder if someone already implement something like that and how
 accomplished that without transaction supports (while the tree keep
 evolving)?

 I'm asking that becouse I want to save geospatial-data, and SimpleGeo did
 it using b-tree:
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php

 Thanks!




 --
 It's always darkest just before you are eaten by a grue.





Re: b-tree

2011-07-21 Thread Eldad Yamin
Hi Jeffery,
I meant for binary tree. go an watch the video (in my first email), it will
give you a better understanding.

Eldad

On Wed, Jul 20, 2011 at 11:33 PM, Jeffrey Kesselman jef...@gmail.comwrote:

 Im not sure if I have an answer for you, anyway, but I'm curious

 A b-tree and a binary tree are not the same thing.  A binary tree is a
 basic fundamental data structure,  A b-tree is an approach to storing and
 indexing data on disc for a database.

 Which do you mean?


 On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote:

 Hello,
 Is there any good way of storing a binary-tree in Cassandra?
 I wonder if someone already implement something like that and how
 accomplished that without transaction supports (while the tree keep
 evolving)?

 I'm asking that becouse I want to save geospatial-data, and SimpleGeo did
 it using b-tree:
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php

 Thanks!




 --
 It's always darkest just before you are eaten by a grue.



Re: how to stop the whole cluster, start the whole cluster like in hadoop/hbase?

2011-07-21 Thread Eldad Yamin
I wonder if it wont make problems...
Anyine did it already?
 On Jul 21, 2011 10:39 PM, Jonathan Ellis jbel...@gmail.com wrote:
 dsh -c -g cassandra /etc/init.d/cassandra stop

 http://www.netfort.gr.jp/~dancer/software/dsh.html.en

 P.S. mostly people are concerned about making sure their entire
 cluster does NOT stop at the same time :)

 On Thu, Jul 21, 2011 at 2:23 PM, Dean Hiller d...@alvazan.com wrote:
 Is there a framework for stopping all nodes/starting all nodes for
 cassandra?  I am okay with something like password-less ssh setup that
 hadoop scripts did...just something that allows me to start and stop the
 whole cluster.

 thanks,
 Dean




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com


b-tree

2011-07-20 Thread Eldad Yamin
Hello,
Is there any good way of storing a binary-tree in Cassandra?
I wonder if someone already implement something like that and how
accomplished that without transaction supports (while the tree keep
evolving)?

I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it
using b-tree:
http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php

Thanks!


Re: Cassandra Secondary index/Twissandra

2011-07-11 Thread Eldad Yamin
Hi Aaron,
Thank you again for your response.

I've read the article but I didn't understand everything. it would be great
if the benchmark will include the actual CLI/Python comments (that way it
will be easier to understand the query). in addition, an explanation about
row pages - what is it?.

Anyway, for a scale proportion, we can take as example
the average Facebook/Twitter user which can get 100K columns per user
(Userline).
So what is needed is to take the first 50 columns (order by TimeUUID), then
column 51 to 100, 101 to 150 etc.
Any suggestion on fast will it be? or how you recommend on configuring
Cassandra? or even a different way of achieving that goal?

Thanks,
Eldad.

On Sun, Jul 10, 2011 at 8:31 PM, aaron morton aa...@thelastpickle.comwrote:

 Can you recommend on a better way of doing that or a way to tune Cassandra
 to support those 2 CF?

 A select with no start or finish column name, a column count and not in
 reversed order is about the fastest read query.

 You will need to do a reversed query, which will be a little slower. But
 may still be plenty fast enough, depending on scale and throughput and all
 those other things. see
 http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

 Cheers


 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 10 Jul 2011, at 00:14, Eldad Yamin wrote:

 Aaron - Thank you for the fast response!


1. Does performance decrease (significantly) if the uniqueness of the
column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
 has
lots of columns?

 Depends on what sort of operations you are doing. Some read operations
 have to pay a constant cost to decode the row level column index, this can
 be tuned though. AFAIK the comparator type has very little to do with the
 performance.

 In Twissandra, the columns are used as alternative index for the
 Userline/Timeline. therefore the operation I'm going to do is slice_range.
 I'm going to get (for example) the first 50  columns (using comparator of
 TimeUUID/LONG).
 Can you recommend on a better way of doing that or a way to tune Cassandra
 to support those 2 CF?


 Thanks!

 On Sun, Jul 10, 2011 at 3:26 AM, aaron morton aa...@thelastpickle.comwrote:


1. Is there a limit on the number of columns in a single column family
that serve as secondary indexes?

 AFAIK there is no coded limit, however every index is implemented as
 another (hidden) Column Family that inherits the settings of the parent CF.
 So under 0.7 you may run out of memory, under 0.8 you may flush  a lot.
 Also, when an indexed column is updated there are potentially 3 operations
 that have to happen: read the old value, delete the old value, write the new
 value. More indexes == more index updating, just like any other database.


1. Does performance decrease (significantly) if the uniqueness of the
column’s values is high?

 Low cardinality is recommended

 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html


1. The CF for Userline/Uimeline - have comparator of LONG_TYPE
and not TimeUUID?

 Probably just to make the demo easier. It's used to order tweets in the
 user and public timelines by the current time
 https://github.com/twissandra/twissandra/blob/master/cass.py#L204


1. Does performance decrease (significantly) if the uniqueness of the
column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
 has
lots of columns?

 Depends on what sort of operations you are doing. Some read operations
 have to pay a constant cost to decode the row level column index, this can
 be tuned though. AFAIK the comparator type has very little to do with the
 performance.

 Hope that helps.

 -
  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 9 Jul 2011, at 12:15, Eldad Yamin wrote:

 Hi,
 I have few questions:

 *Secondary index*

1. Is there a limit on the number of columns in a single column family
that serve as secondary indexes?
2. Does performance decrease (significantly) if the uniqueness of the
column’s values is high?


 *Twissandra*

1. Why in the source (or any tutorial I've read):
The CF for Userline/Uimeline - have comparator of LONG_TYPE and
not TimeUUID?


 https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
2. Does performance decrease (significantly) if the uniqueness of the
column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
 has
lots of columns?


 Thanks!
 Eldad







Re: Cassandra Secondary index/Twissandra

2011-07-10 Thread Eldad Yamin
Aaron - Thank you for the fast response!


   1. Does performance decrease (significantly) if the uniqueness of the
   column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
   lots of columns?

Depends on what sort of operations you are doing. Some read operations have
to pay a constant cost to decode the row level column index, this can be
tuned though. AFAIK the comparator type has very little to do with the
performance.

In Twissandra, the columns are used as alternative index for the
Userline/Timeline. therefore the operation I'm going to do is slice_range.
I'm going to get (for example) the first 50  columns (using comparator of
TimeUUID/LONG).
Can you recommend on a better way of doing that or a way to tune Cassandra
to support those 2 CF?


Thanks!

On Sun, Jul 10, 2011 at 3:26 AM, aaron morton aa...@thelastpickle.comwrote:


1. Is there a limit on the number of columns in a single column family
that serve as secondary indexes?

 AFAIK there is no coded limit, however every index is implemented as
 another (hidden) Column Family that inherits the settings of the parent CF.
 So under 0.7 you may run out of memory, under 0.8 you may flush  a lot.
 Also, when an indexed column is updated there are potentially 3 operations
 that have to happen: read the old value, delete the old value, write the new
 value. More indexes == more index updating, just like any other database.


1. Does performance decrease (significantly) if the uniqueness of the
column’s values is high?

 Low cardinality is recommended

 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html


1. The CF for Userline/Uimeline - have comparator of LONG_TYPE
and not TimeUUID?

 Probably just to make the demo easier. It's used to order tweets in the
 user and public timelines by the current time
 https://github.com/twissandra/twissandra/blob/master/cass.py#L204


1. Does performance decrease (significantly) if the uniqueness of the
column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
 has
lots of columns?

 Depends on what sort of operations you are doing. Some read operations have
 to pay a constant cost to decode the row level column index, this can be
 tuned though. AFAIK the comparator type has very little to do with the
 performance.

 Hope that helps.

 -
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 9 Jul 2011, at 12:15, Eldad Yamin wrote:

 Hi,
 I have few questions:

 *Secondary index*

1. Is there a limit on the number of columns in a single column family
that serve as secondary indexes?
2. Does performance decrease (significantly) if the uniqueness of the
column’s values is high?


 *Twissandra*

1. Why in the source (or any tutorial I've read):
The CF for Userline/Uimeline - have comparator of LONG_TYPE and
not TimeUUID?


 https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
2. Does performance decrease (significantly) if the uniqueness of the
column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
 has
lots of columns?


 Thanks!
 Eldad





Re: Pre-CassandraSF Happy Hour on Sunday

2011-07-09 Thread Eldad Yamin
Can you please Watchitoo.com (its' free) and broadcast the event?

On Fri, Jul 8, 2011 at 8:54 PM, Richard Low r...@acunu.com wrote:

 Hi all,

 If you're in San Francisco for CassandraSF on Monday 11th, then come
 and join fellow Cassandra users and committers on Sunday evening.
 Starting at 6:30pm at ThirstyBear, the famous brewing company.  We'll
 have drinks, food and more.

 RSVP at Eventbrite: http://pre-cassandrasf-happyhour.eventbrite.com/

 Hope you can join us!

 --
 Richard Low
 Acunu | http://www.acunu.com | @acunu



Cassandra Secondary index/Twissandra

2011-07-09 Thread Eldad Yamin
Hi,
I have few questions:

*Secondary index*

   1. Is there a limit on the number of columns in a single column family
   that serve as secondary indexes?
   2. Does performance decrease (significantly) if the uniqueness of the
   column’s values is high?


*Twissandra*

   1. Why in the source (or any tutorial I've read):
   The CF for Userline/Uimeline - have comparator of LONG_TYPE and not
   TimeUUID?

   
https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
   2. Does performance decrease (significantly) if the uniqueness of the
   column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
   lots of columns?


Thanks!
Eldad


Re: Any meet ups in southern california

2011-07-07 Thread Eldad Yamin
You can use Watchitoo.com (LIke GoToMeeting/WebEX) to host an event.
using that tool, everyone around the world can join and take action.

the great thing about is that it's FREE!

On Wed, Jul 6, 2011 at 10:25 PM, Mike Rapuano mikerapu...@gmail.com wrote:

 Hi all

 Are there any active cassandra meet ups in southern California?

 Thanks
 Mike