One way to handle this is that both tables should be de-normalized. Take this:
SongsAndPlaylists PlaylistsAndSongs In this way your client software is charged with keeping data in sync. When you remove a song from a PlaylistsAndSongs you do a read for that song in SongsAnyPlaylists. If the number of people with that song is now 0 the song can be removed. On Wed, Feb 26, 2014 at 11:17 AM, Edward Capriolo <[email protected]>wrote: > Right the problem with building a list of counts in a batch is what > happens if song added as you are building the counts. > > > On Wed, Feb 26, 2014 at 10:32 AM, Green, John M (HP Education) < > [email protected]> wrote: > >> Edward, >> >> >> Thanks for your insight. >> >> >> >> One other thought I had was to store a reference count with the "song". >> When the last "playlist" referencing the "song" is deleted the "song" will >> also be deleted because the reference count decrements to zero. However, >> this would create some nastiness when it comes to reliably maintaining >> reference counts. I'm not sure if it would help to split the reference >> count into two monotonically increasing counters (number of references >> added, and number of references deleted). >> >> >> >> In my case, users cannot browse a repository of "songs" to build a >> playlist from scratch. They can only import "songs" themselves or create >> references to "songs" other users have explicitly made available to them. >> Once a "song" is not referred to by any "playlist" it will never be >> re-discovered so it should be deleted. This could be done in some sort of >> background data maintenance job that runs periodically. Even if it is a >> low-priority background job it look like it will create a lot overhead >> (scanning and producing counts). >> >> >> >> John >> >> *From:* Edward Capriolo [mailto:[email protected]] >> *Sent:* Wednesday, February 26, 2014 5:56 AM >> *To:* [email protected] >> *Subject:* Re: Naive question about orphan rows >> >> >> >> It is probably ok to have redundant songs in playlists, cassandra is >> about denormalization. >> >> Dealing with this issue is going to be hard since the only way to dwal >> with this would be scanning through the firsr cf and procing counts then >> using that information to delete in the second table. However that >> information can change rapidly and then will fall out of sink fast. >> >> The only ways yo handle this are >> >> 1) never delete songs >> 2) store copies of songs ib playlist >> >> On Friday, February 21, 2014, Green, John M (HP Education) < >> [email protected]> wrote: >> > I'm very much a newbie so this may be a silly question but ... >> > >> > >> > >> > I have a situation similar to the music service example ( >> http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html) >> of songs and playlists. However, in my case, the "songs" would be >> considered orphans that should be deleted when no "playlists" refer to >> them. Relational databases have mechanisms to manage this relationship so >> that a "song" could be deleted as soon as the last "playlist" referencing >> it is deleted. While I do NOT need to manage this as an atomic >> transaction, I'm wondering what is the best way to delete orphaned rows >> (i.e., "songs" not referenced by any "playlists") using Cassandra. >> > >> > >> > >> > I guess an alternative approach would be to store "songs" directly in >> the "playlists" but this could lead to many redundant copies of the same >> "song" which is something I'm hoping to avoid. I'm my case the "playlists" >> could have thousands of entries and the "songs" might be blobs of 10s of >> Mbytes. Maybe I'm just having a hard time abandoning my relational roots? >> > >> > >> > >> > John >> >> -- >> Sorry this was sent from mobile. Will do less grammar and spell check >> than usual. >> > >
