Re: Understanding Indexes

2011-02-24 Thread Javier Canillas
I dont think i got the point in your question. But if you are thinking about key indexes (like PKs), take in mind that cassandra will manage keys using the partition strategy. By doing so, it will be able to determine on which node the row with such key should be hold. So, in another words, inside

Re: Understanding Indexes

2011-02-24 Thread mcasandra
Generally no. But yes if retrieving the key through index is faster than going through the hash buckets. Currently I am thinking there could be 100s of million or billion of rows and in that case if we have to retrieve a row which one will be fast going through hash bucket or index? I am

Re: Understanding Indexes

2011-02-24 Thread Ed Anuff
If you mean does it make sense to have a CF where each row contains a set of keys to other rows in another CF, then yes, that's a common design pattern, although usually it's because you're creating collections of those rows (i.e. a Groups CF where each row consists of a set of keys to rows in the

Re: Understanding Indexes

2011-02-24 Thread mcasandra
What I am trying to ask is that what if there are billions of row keys (eg: abc, def, xyz in below eg.) and then client does a lookup/query on a row say xyz (get all cols for row xyz). Now since there are billions of rows look up using Hash mechanism, is it going to be slow? What algorithm will

Re: Understanding Indexes

2011-02-24 Thread Javier Canillas
I really don't see the point.. Again, suppose a cluster with 3 nodes, where there is a ColumnFamily that will hold data which key is basically consisted on a word of 2 letters (pretty simple). That's make a total of 729 posible keys. RandomPartitioner then will tokenize each key and assign them

Re: Understanding Indexes

2011-02-24 Thread mcasandra
Thanks! I am thinking more in terms where you have millions of keys (rows). For eg: UUID as a row key. or there could millions of users. So are we saying that we should NOT create column families with these many keys? What are the other options in such cases? UserProfile = { // this is a

Re: Understanding Indexes

2011-02-24 Thread Javier Canillas
I don't say you shouldn't. In case you feel like there is a problem, you may think of splitting column families into N. But I think you won't get that problem. You can read about RowCacheSize and KeyCache support on 0.7.X of Cassandra, if you rows are small, you may cache a lot of them and avoid a

Re: Understanding Indexes

2011-02-24 Thread Ed Anuff
It all depends on what you're trying to do. What you're proposing doing, by defintion, is creating a secondary index. The primary index is your row key. Depending on the partitioner, it might or might not be a conveniently iterable index or sorted index. If you need your keys sorted in a

Re: Understanding Indexes

2011-02-24 Thread mcasandra
I wasn't aware that there is an index on primary key (that is row keys). So from what I understand there is by default an index on for eg: , in below example? Where can I read more about it? UserProfile = { // this is a ColumnFamily { // this is the key to this Row inside the

Re: Understanding Indexes

2011-02-24 Thread Edward Capriolo
On Thu, Feb 24, 2011 at 3:34 PM, mcasandra mohitanch...@gmail.com wrote: I wasn't aware that there is an index on primary key (that is row keys). So from what I understand there is by default an index on for eg: , in below example? Where can I read more about it? UserProfile = { //

Re: Understanding Indexes

2011-02-24 Thread mcasandra
Either I am not explaning properly or I don't understand the data model just yet. Please check again: In below example this is what I understand: 1) UserProfile is a CF 2) is a row key 3) username is a column. Each row (eg ) has username column My understanding is that secondary

Re: Understanding Indexes

2011-02-24 Thread Edward Capriolo
On Thu, Feb 24, 2011 at 3:55 PM, mcasandra mohitanch...@gmail.com wrote: Either I am not explaning properly or I don't understand the data model just yet. Please check again: In below example this is what I understand: 1) UserProfile is a CF 2) is a row key 3) username is a column.

Re: Understanding Indexes

2011-02-24 Thread mcasandra
Thanks! I just started reading about Bloom Filter. Is this something that is inbuilt by default or is it something that need to be explicitly configured? -- View this message in context:

Re: Understanding Indexes

2011-02-24 Thread Tyler Hobbs
On Thu, Feb 24, 2011 at 3:07 PM, mcasandra mohitanch...@gmail.com wrote: Thanks! I just started reading about Bloom Filter. Is this something that is inbuilt by default or is it something that need to be explicitly configured? It's built in, no configuration needed. -- Tyler Hobbs

Re: Understanding Indexes

2011-02-24 Thread Michal Augustýn
Retrieving data using row key is the primary way how to get data from Cassandra, so it's highly optimized. Firstly, node responsible for the row is computed using partitioner. You can use RandomPartitioner (distributes md5 of keys) or OrderPreservingPartitioner (key must be UTF8 string). Then the