I dont think i got the point in your question. But if you are thinking
about key indexes (like PKs), take in mind that cassandra will manage
keys using the partition strategy. By doing so, it will be able to
determine on which node the row with such key should be hold.
So, in another words, inside
Generally no. But yes if retrieving the key through index is faster than
going through the hash buckets.
Currently I am thinking there could be 100s of million or billion of rows
and in that case if we have to retrieve a row which one will be fast going
through hash bucket or index? I am
If you mean does it make sense to have a CF where each row contains a set of
keys to other rows in another CF, then yes, that's a common design pattern,
although usually it's because you're creating collections of those rows
(i.e. a Groups CF where each row consists of a set of keys to rows in the
What I am trying to ask is that what if there are billions of row keys (eg:
abc, def, xyz in below eg.) and then client does a lookup/query on a row say
xyz (get all cols for row xyz). Now since there are billions of rows look up
using Hash mechanism, is it going to be slow? What algorithm will
I really don't see the point.. Again, suppose a cluster with 3 nodes, where
there is a ColumnFamily that will hold data which key is basically consisted
on a word of 2 letters (pretty simple). That's make a total of 729 posible
keys.
RandomPartitioner then will tokenize each key and assign them
Thanks! I am thinking more in terms where you have millions of keys (rows).
For eg: UUID as a row key. or there could millions of users.
So are we saying that we should NOT create column families with these many
keys? What are the other options in such cases?
UserProfile = { // this is a
I don't say you shouldn't. In case you feel like there is a problem, you may
think of splitting column families into N. But I think you won't get that
problem. You can read about RowCacheSize and KeyCache support on 0.7.X of
Cassandra, if you rows are small, you may cache a lot of them and avoid a
It all depends on what you're trying to do. What you're proposing doing, by
defintion, is creating a secondary index. The primary index is your row
key. Depending on the partitioner, it might or might not be a conveniently
iterable index or sorted index. If you need your keys sorted in a
I wasn't aware that there is an index on primary key (that is row keys). So
from what I understand there is by default an index on for eg: , in
below example? Where can I read more about it?
UserProfile = { // this is a ColumnFamily
{ // this is the key to this Row inside the
On Thu, Feb 24, 2011 at 3:34 PM, mcasandra mohitanch...@gmail.com wrote:
I wasn't aware that there is an index on primary key (that is row keys). So
from what I understand there is by default an index on for eg: , in
below example? Where can I read more about it?
UserProfile = { //
Either I am not explaning properly or I don't understand the data model just
yet. Please check again:
In below example this is what I understand:
1) UserProfile is a CF
2) is a row key
3) username is a column. Each row (eg ) has username column
My understanding is that secondary
On Thu, Feb 24, 2011 at 3:55 PM, mcasandra mohitanch...@gmail.com wrote:
Either I am not explaning properly or I don't understand the data model just
yet. Please check again:
In below example this is what I understand:
1) UserProfile is a CF
2) is a row key
3) username is a column.
Thanks! I just started reading about Bloom Filter. Is this something that is
inbuilt by default or is it something that need to be explicitly configured?
--
View this message in context:
On Thu, Feb 24, 2011 at 3:07 PM, mcasandra mohitanch...@gmail.com wrote:
Thanks! I just started reading about Bloom Filter. Is this something that
is
inbuilt by default or is it something that need to be explicitly
configured?
It's built in, no configuration needed.
--
Tyler Hobbs
Retrieving data using row key is the primary way how to get data from
Cassandra, so it's highly optimized.
Firstly, node responsible for the row is computed using partitioner. You can
use RandomPartitioner (distributes md5 of keys) or
OrderPreservingPartitioner (key must be UTF8 string).
Then the
15 matches
Mail list logo