Re: Correct way to design a cassandra database

2012-12-21 Thread Hiller, Dean
I you have a way to partition tables, relational can be ok.  Thing of a 
business that has trillions of clients as customers and clients have a whole 
slew of things they are related to.  Partitioning by client can be a good way 
to go.  Here are some patterns we have seen in nosql and perhaps they can help 
your situation….

https://github.com/deanhiller/playorm/wiki/Patterns-Page

Later,
Dean

From: David Mohl d...@dave.cxmailto:d...@dave.cx
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, December 21, 2012 4:49 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Correct way to design a cassandra database

Hello!

I've recently started learning cassandra but still have troubles understanding 
the best way to design a cassandra database.
I've posted my question already on stackoverflow but because this would very 
likely result in a discussion, it got closed. Orginal question here: 
http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


Assuming you have 3 types of objects: User, Photo and Album. Obviously a photo 
belongs to a user and can be part of a album. For querying, assume we just want 
to order by last goes first. Paging by 10 elements should be possible.

Would you go like every document has all the informations needed for a correct 
output. Something like this:

-- User
   | -- Name
   | -- ...
   | -- Photos
| -- Photoname
| -- Uploaded at

Or go a more relational way (while having a secondary index on the belongs_to 
columns:

-- User (userid is the row key)
   | -- Name
   | -- ...

-- Photoid
   | -- belongs_to (userid)
   | -- belongs_to_album (albumid)
   | -- ...

-- Albumid
   | -- belongs_to (userid)
   | -- ...

Another way that came in my mind would be kind of a mix:

-- User
   | -- Name
   | -- ...
   | -- Photoids (e.g. 1,2,3,4,5)
   | -- Albumids (e.g. 1,2,3,4,5)

-- Photoid (photoid is the row key)
   | -- Name
   | -- Uploaded at
   | -- ...

-- Albumid (albumid is the row key)
   | -- Name
   | -- Photoids (e.g. 1,2,3,4,5)
   | -- ...

When using a random partitioner, the last example would be (IMO) the way to go. 
I can query the user object (out of a session id or something) and would get 
all the row keys I need for fetching photo / album data. However this would 
result in veeery large columns. Another down point would be inconsistency and 
identification problems. A photo (or a album) could not be identified by the 
row itself.
Example: If I fetch a photo with ID 3456, I don't know in which albums it is 
part nor which user owns it. Adding this kind of information would result in a 
fairly large stack of points I have to alter on creation / update.

The second example has all the information needed. However, if I want to fetch 
all photos that are part of album x, I have to query by a secondary index that 
COULD contain millions of entries over the whole cluster. And I guess I can 
forget the random partitioner on this example.

Am I thinking to relational?
It'd be great to hear some other opinions on this topic

---
David



Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
.







On Fri, Dec 21, 2012 at 5:07 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 I you have a way to partition tables, relational can be ok.  Thing of a
 business that has trillions of clients as customers and clients have a
 whole slew of things they are related to.  Partitioning by client can be a
 good way to go.  Here are some patterns we have seen in nosql and perhaps
 they can help your situation….

 https://github.com/deanhiller/playorm/wiki/Patterns-Page

 Later,
 Dean

 From: David Mohl d...@dave.cxmailto:d...@dave.cx
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Friday, December 21, 2012 4:49 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Correct way to design a cassandra database

 Hello!

 I've recently started learning cassandra but still have troubles
 understanding the best way to design a cassandra database.
 I've posted my question already on stackoverflow but because this would
 very likely result in a discussion, it got closed. Orginal question here:
 http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


 Assuming you have 3 types of objects: User, Photo and Album. Obviously a
 photo belongs to a user and can be part of a album. For querying, assume we
 just want to order by last goes first. Paging by 10 elements should be
 possible.

 Would you go like every document has all the informations needed for a
 correct output. Something like this:

 -- User
| -- Name
| -- ...
| -- Photos
 | -- Photoname
 | -- Uploaded at

 Or go a more relational way (while having a secondary index on the
 belongs_to columns:

 -- User (userid is the row key)
| -- Name
| -- ...

 -- Photoid
| -- belongs_to (userid)
| -- belongs_to_album (albumid)
| -- ...

 -- Albumid
| -- belongs_to (userid)
| -- ...

 Another way that came in my mind would be kind of a mix:

 -- User
| -- Name
| -- ...
| -- Photoids (e.g. 1,2,3,4,5)
| -- Albumids (e.g. 1,2,3,4,5)

 -- Photoid (photoid is the row key)
| -- Name
| -- Uploaded at
| -- ...

 -- Albumid (albumid is the row key)
| -- Name
| -- Photoids (e.g. 1,2,3,4,5)
| -- ...

 When using a random partitioner, the last example would be (IMO) the way
 to go. I can query the user object (out of a session id or something) and
 would get all the row keys I need for fetching photo / album data. However
 this would result in veeery large columns. Another down point would be
 inconsistency and identification problems. A photo (or a album) could not
 be identified by the row itself.
 Example: If I fetch a photo with ID 3456, I don't know in which albums it
 is part nor which user owns it. Adding this kind of information would
 result in a fairly large stack of points I have to alter on creation /
 update.

 The second example has all the information needed. However, if I want to
 fetch all photos that are part of album x, I have to query by a secondary
 index that COULD contain millions of entries over the whole cluster. And I
 guess I can forget the random partitioner on this example.

 Am I thinking to relational?
 It'd be great to hear some other opinions on this topic

 ---
 David




Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
Ok.. So here is my latest thinking... Including that index:

CREATE TABLE Users (
user_name text,
password text,
PRIMARY KEY (user_name)
);

^ Same as before

CREATE TABLE Photos(
user_name text,
photo_id uuid,
created_time timestamp,
data text,
PRIMARY KEY (user_name, photo_id, created_time)
) WITH CLUSTERING ORDER BY (created_time DESC);

^ Note the addition of a photo id and using that in the PK def with the
created_time
Data is a JSON like this:
{
thumbnail: url,
standard_resolution:url
}


CREATE TABLE PhotosAlbums (
user_name text,
album_name text,
poster_image_url text,
data text
PRIMARY KEY (user_name, album_name)
);

^ Same as before, data represents a JSON array of the photos:
[{photo_id:..., thumbnail:url, standard_resolution:url},
{photo_id:..., thumbnail:url, standard_resolution:url},
{photo_id:..., thumbnail:url, standard_resolution:url},
{photo_id:..., thumbnail:url, standard_resolution:url}]

CREATE TABLE PhotosAlbumsIndex (
user_name text,
photo_id uuid,
album_name text,
created_time timestamp
PRIMARY KEY (user_name, photo_id, album_name)
);

The create_time column here is because you need to have at least 1 column
that is not part of the PK. Or that's what it looks like in my quick test.

^ Each photo added to an album needs to be added to this index row


As before, your application will need to keep the order of the array in
tact as your users modify the order of things. Now however if they delete a
photo you need to fetch the PhotoAlbums the photo existed in and update
them accordingly:

SELECT * FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
photo_id=uuid

This should return to you all of the albums that the photo was a part of.
Now you need to:

SELECT * FROM PhotosAlbums where user_name = the_user and album_name IN
(name1, name2, name3 )

name1,2,3 are the album names you selected from the PhotosAlbumsIndex query

So now you have all of the photo albums, you would then iterate over those
in your application, deserializing the JSON data locating the photo ID was
was removed and taking it out of the array, then reserializing to JSON and
updating the record.

When that is complete you need to remove the Photo from the
PhotosAlbumsIndex. Now there is where I het stuck a little.. because this
will fail:

DELETE FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
photo_id=uuid;

It seems to want the album name as well since it's part of the PK.
Admittedly, I don't know how to get around that and just delete everything
where the first 2 components of the PK are true.

You would already possess the list of album names though, so it could be
BATCH that you need to perform for the deletes, specifying 1 delete per
album_name.


Anyway, that's my current thinking. I would love to know if it's possible
to get around the DELETE issue another way.



On Fri, Dec 21, 2012 at 7:15 AM, Adam Venturella aventure...@gmail.comwrote:

 Hmmm it just occurred to me that in my examples, there is no convenient
 way to delete a photo and also remove that photo from the albums it is a
 part of.

 As it stands, you would need to iterate over all of the users albums to
 locate the photo and remove it; that's no good.

 Probably need another table that holds just the photo / album identifiers,
 an index. So when the user deletes a photo, you ask the index which albums
 that photo belongs too and just fetch those to update the album with that
 photo removed.

 :: mobile emails ::

 On Dec 21, 2012, at 3:50, David Mohl d...@dave.cx wrote:

  Hello!

 I've recently started learning cassandra but still have troubles
 understanding the best way to design a cassandra database.
 I've posted my question already on stackoverflow but because this would
 very likely result in a discussion, it got closed. Orginal question here:
 http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


 Assuming you have 3 types of objects: User, Photo and Album. Obviously a
 photo belongs to a user and can be part of a album. For querying, assume we
 just want to order by last goes first. Paging by 10 elements should be
 possible.

 Would you go like every document has all the informations needed for a
 correct output. Something like this:

 -- User
| -- Name
| -- ...
| -- Photos
 | -- Photoname
 | -- Uploaded at

 Or go a more relational way (while having a secondary index on the
 belongs_to columns:

 -- User (userid is the row key)
| -- Name
| -- ...

 -- Photoid
| -- belongs_to (userid)
| -- belongs_to_album (albumid)
| -- ...

 -- Albumid
| -- belongs_to (userid)
| -- ...

 Another way that came in my mind would be kind of a mix:

 -- User
| -- Name
| -- ...
| -- Photoids (e.g. 1,2,3,4,5)
| -- Albumids (e.g. 1,2,3,4,5)

 -- Photoid (photoid is the row key

Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
One more link that might be helpful. It's a similar system to photo's but
instead of Photos/Albums it's Songs/Playlists:

http://www.datastax.com/dev/blog/cql3-for-cassandra-experts.

It's not exactly 1:1 but it covers related concepts in making it work.



On Fri, Dec 21, 2012 at 8:02 AM, Adam Venturella aventure...@gmail.comwrote:

 Ok.. So here is my latest thinking... Including that index:

 CREATE TABLE Users (
 user_name text,
 password text,
 PRIMARY KEY (user_name)
 );

 ^ Same as before

 CREATE TABLE Photos(
 user_name text,
 photo_id uuid,
 created_time timestamp,
 data text,
 PRIMARY KEY (user_name, photo_id, created_time)
 ) WITH CLUSTERING ORDER BY (created_time DESC);

 ^ Note the addition of a photo id and using that in the PK def with the
 created_time
 Data is a JSON like this:
 {
 thumbnail: url,
 standard_resolution:url
 }


 CREATE TABLE PhotosAlbums (
 user_name text,
 album_name text,
 poster_image_url text,
 data text
 PRIMARY KEY (user_name, album_name)
 );

 ^ Same as before, data represents a JSON array of the photos:
 [{photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url}]

 CREATE TABLE PhotosAlbumsIndex (
 user_name text,
 photo_id uuid,
 album_name text,
 created_time timestamp
 PRIMARY KEY (user_name, photo_id, album_name)
 );

 The create_time column here is because you need to have at least 1 column
 that is not part of the PK. Or that's what it looks like in my quick test.

 ^ Each photo added to an album needs to be added to this index row


 As before, your application will need to keep the order of the array in
 tact as your users modify the order of things. Now however if they delete a
 photo you need to fetch the PhotoAlbums the photo existed in and update
 them accordingly:

 SELECT * FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
 photo_id=uuid

 This should return to you all of the albums that the photo was a part of.
 Now you need to:

 SELECT * FROM PhotosAlbums where user_name = the_user and album_name IN
 (name1, name2, name3 )

 name1,2,3 are the album names you selected from the PhotosAlbumsIndex query

 So now you have all of the photo albums, you would then iterate over those
 in your application, deserializing the JSON data locating the photo ID was
 was removed and taking it out of the array, then reserializing to JSON and
 updating the record.

 When that is complete you need to remove the Photo from the
 PhotosAlbumsIndex. Now there is where I het stuck a little.. because this
 will fail:

 DELETE FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
 photo_id=uuid;

 It seems to want the album name as well since it's part of the PK.
 Admittedly, I don't know how to get around that and just delete everything
 where the first 2 components of the PK are true.

 You would already possess the list of album names though, so it could be
 BATCH that you need to perform for the deletes, specifying 1 delete per
 album_name.


 Anyway, that's my current thinking. I would love to know if it's possible
 to get around the DELETE issue another way.



 On Fri, Dec 21, 2012 at 7:15 AM, Adam Venturella aventure...@gmail.comwrote:

 Hmmm it just occurred to me that in my examples, there is no convenient
 way to delete a photo and also remove that photo from the albums it is a
 part of.

 As it stands, you would need to iterate over all of the users albums to
 locate the photo and remove it; that's no good.

 Probably need another table that holds just the photo / album
 identifiers, an index. So when the user deletes a photo, you ask the index
 which albums that photo belongs too and just fetch those to update the
 album with that photo removed.

 :: mobile emails ::

 On Dec 21, 2012, at 3:50, David Mohl d...@dave.cx wrote:

  Hello!

 I've recently started learning cassandra but still have troubles
 understanding the best way to design a cassandra database.
 I've posted my question already on stackoverflow but because this would
 very likely result in a discussion, it got closed. Orginal question here:
 http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


 Assuming you have 3 types of objects: User, Photo and Album. Obviously a
 photo belongs to a user and can be part of a album. For querying, assume we
 just want to order by last goes first. Paging by 10 elements should be
 possible.

 Would you go like every document has all the informations needed for a
 correct output. Something like this:

 -- User
| -- Name
| -- ...
| -- Photos
 | -- Photoname
 | -- Uploaded at

 Or go a more relational way (while having a secondary index on the
 belongs_to columns:

 -- User (userid is the row key)
| -- Name

Re: Correct way to design a cassandra database

2012-12-21 Thread Edward Capriolo
You could store the order as the first part of a composite string say first
picture as A and second as B. To insert one between call it AA. If you
shuffle alot the strings could get really long.

Might be better to store the order in a separate column.

Neither solution mentioned deals with concurrent access well.

On Friday, December 21, 2012, Adam Venturella aventure...@gmail.com wrote:
 One more link that might be helpful. It's a similar system to photo's but
instead of Photos/Albums it's Songs/Playlists:
 http://www.datastax.com/dev/blog/cql3-for-cassandra-experts.

 It's not exactly 1:1 but it covers related concepts in making it work.


 On Fri, Dec 21, 2012 at 8:02 AM, Adam Venturella aventure...@gmail.com
wrote:

 Ok.. So here is my latest thinking... Including that index:
 CREATE TABLE Users (
 user_name text,
 password text,
 PRIMARY KEY (user_name)
 );
 ^ Same as before
 CREATE TABLE Photos(
 user_name text,
 photo_id uuid,
 created_time timestamp,
 data text,
 PRIMARY KEY (user_name, photo_id, created_time)
 ) WITH CLUSTERING ORDER BY (created_time DESC);
 ^ Note the addition of a photo id and using that in the PK def with the
created_time
 Data is a JSON like this:
 {
 thumbnail: url,
 standard_resolution:url
 }

 CREATE TABLE PhotosAlbums (
 user_name text,
 album_name text,
 poster_image_url text,
 data text
 PRIMARY KEY (user_name, album_name)
 );
 ^ Same as before, data represents a JSON array of the photos:
 [{photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url}]

 CREATE TABLE PhotosAlbumsIndex (
 user_name text,
 photo_id uuid,
 album_name text,
 created_time timestamp
 PRIMARY KEY (user_name, photo_id, album_name)
 );
 The create_time column here is because you need to have at least 1 column
that is not part of the PK. Or that's what it looks like in my quick test.
 ^ Each photo added to an album needs to be added to this index row

 As before, your application will need to keep the order of the array in
tact as your users modify the order of things. Now however if they delete a
photo you need to fetch the PhotoAlbums the photo existed in and update
them accordingly:
 SELECT * FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
photo_id=uuid
 This should return to you all of the albums that the photo was a part of.
Now you need to:
 SELECT * FROM PhotosAlbums where user_name = the_user and album_name IN