Re: Naive question about orphan rows

Edward Capriolo Wed, 26 Feb 2014 11:13:50 -0800

One way to handle this is that both tables should be de-normalized. Take
this:


SongsAndPlaylists
PlaylistsAndSongs

In this way your client software is charged with keeping data in sync.

When you remove a song from a PlaylistsAndSongs you do a read for that song
in SongsAnyPlaylists. If the number of people with that song is now 0 the
song can be removed.




On Wed, Feb 26, 2014 at 11:17 AM, Edward Capriolo <[email protected]>wrote:

> Right the problem with building a list of counts in a batch is what
> happens if song added as you are building the counts.
>
>
> On Wed, Feb 26, 2014 at 10:32 AM, Green, John M (HP Education) <
> [email protected]> wrote:
>
>>  Edward,
>>
>>
>> Thanks for your insight.
>>
>>
>>
>> One other thought I had was to store a reference count with the "song".
>> When the last "playlist" referencing the "song" is deleted the "song" will
>> also be deleted because the reference count decrements to zero.   However,
>> this would create some nastiness when it comes to reliably maintaining
>> reference counts.   I'm not sure if it would help to split the reference
>> count into two monotonically increasing counters (number of references
>> added, and number of references deleted).
>>
>>
>>
>> In my case, users cannot browse a repository of "songs" to build a
>> playlist from scratch.  They can only import "songs" themselves or create
>> references to "songs" other users have explicitly made available to them.
>> Once a "song" is not referred to by any "playlist" it will never be
>> re-discovered so it should be deleted.   This could be done in some sort of
>> background data maintenance job that runs periodically.   Even if it is a
>> low-priority background job it look like it will create a lot overhead
>> (scanning and producing counts).
>>
>>
>>
>> John
>>
>> *From:* Edward Capriolo [mailto:[email protected]]
>> *Sent:* Wednesday, February 26, 2014 5:56 AM
>> *To:* [email protected]
>> *Subject:* Re: Naive question about orphan rows
>>
>>
>>
>> It is probably ok to have redundant songs in playlists, cassandra is
>> about denormalization.
>>
>> Dealing with this issue is going to be hard since the only way to dwal
>> with this would be scanning through the firsr cf and procing counts then
>> using that information to delete in the second table. However that
>> information can change rapidly and then will fall out of sink fast.
>>
>> The only ways yo handle this are
>>
>> 1) never delete songs
>> 2) store copies of songs ib playlist
>>
>> On Friday, February 21, 2014, Green, John M (HP Education) <
>> [email protected]> wrote:
>> > I'm very much a newbie so this may be a silly question but ...
>> >
>> >
>> >
>> > I have a situation similar to the music service example (
>> http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html)
>> of songs and playlists.  However, in my case, the "songs" would be
>> considered orphans that should be deleted when no "playlists" refer to
>> them.  Relational databases have mechanisms to manage this relationship so
>> that a "song" could be deleted as soon as the last "playlist" referencing
>> it is deleted.    While I do NOT need to manage this as an atomic
>> transaction, I'm wondering what is the best way to delete orphaned rows
>> (i.e., "songs" not referenced by any "playlists")  using Cassandra.
>> >
>> >
>> >
>> > I guess an alternative approach would be to store "songs" directly in
>> the "playlists" but this could lead to many redundant copies of the same
>> "song" which is something I'm hoping to avoid.  I'm my case the "playlists"
>> could have thousands of entries and the "songs" might be blobs of 10s of
>> Mbytes.    Maybe I'm just having a hard time abandoning my relational roots?
>> >
>> >
>> >
>> > John
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>

Re: Naive question about orphan rows

Reply via email to