RE: Naive question about orphan rows

Green, John M (HP Education) Wed, 26 Feb 2014 07:34:50 -0800

Edward,

Thanks for your insight.

One other thought I had was to store a reference count with the "song".  When 
the last "playlist" referencing the "song" is deleted the "song" will also be 
deleted because the reference count decrements to zero.   However, this would 
create some nastiness when it comes to reliably maintaining reference counts.   
I'm not sure if it would help to split the reference count into two 
monotonically increasing counters (number of references added, and number of 
references deleted).

In my case, users cannot browse a repository of "songs" to build a playlist 
from scratch.  They can only import "songs" themselves or create references to 
"songs" other users have explicitly made available to them.  Once a "song" is 
not referred to by any "playlist" it will never be re-discovered so it should 
be deleted.   This could be done in some sort of background data maintenance 
job that runs periodically.   Even if it is a low-priority background job it 
look like it will create a lot overhead (scanning and producing counts).

John
From: Edward Capriolo [mailto:[email protected]]
Sent: Wednesday, February 26, 2014 5:56 AM
To: [email protected]
Subject: Re: Naive question about orphan rows

It is probably ok to have redundant songs in playlists, cassandra is about 
denormalization.

Dealing with this issue is going to be hard since the only way to dwal with 
this would be scanning through the firsr cf and procing counts then using that 
information to delete in the second table. However that information can change 
rapidly and then will fall out of sink fast.

The only ways yo handle this are

1) never delete songs
2) store copies of songs ib playlist

On Friday, February 21, 2014, Green, John M (HP Education) 
<[email protected]<mailto:[email protected]>> wrote:
> I'm very much a newbie so this may be a silly question but ...
>
>
>
> I have a situation similar to the music service example 
> (http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html)
>  of songs and playlists.  However, in my case, the "songs" would be 
> considered orphans that should be deleted when no "playlists" refer to them.  
> Relational databases have mechanisms to manage this relationship so that a 
> "song" could be deleted as soon as the last "playlist" referencing it is 
> deleted.    While I do NOT need to manage this as an atomic transaction, I'm 
> wondering what is the best way to delete orphaned rows (i.e., "songs" not 
> referenced by any "playlists")  using Cassandra.
>
>
>
> I guess an alternative approach would be to store "songs" directly in the 
> "playlists" but this could lead to many redundant copies of the same "song" 
> which is something I'm hoping to avoid.  I'm my case the "playlists" could 
> have thousands of entries and the "songs" might be blobs of 10s of Mbytes.    
> Maybe I'm just having a hard time abandoning my relational roots?
>
>
>
> John

--
Sorry this was sent from mobile. Will do less grammar and spell check than 
usual.

RE: Naive question about orphan rows

Reply via email to