A layer violation?  Seriously?  Technical solutions exist to solve business 
problems and I’m 100% fine with introducing former to solve the latter.

Look, if the goal is to purge information out of the DB as quickly as possible 
from a lot of accounts, the fastest way to do it is to hijack the fact that 
you’re constantly rewriting data through compaction and (ab)use it.  It avoids 
the overhead of tombstones, and can be implemented in a way that allows you to 
to perform a single write / edit a text file / some other trivial system and 
immediately start removing customer data.  It’s an incredibly efficient way of 
bulk removing customer data.  

The wording around "The Right To Be Forgotten” is a little vague [1], and I 
don’t know if "the right to be forgotten entitles the data subject to have the 
data controller erase his/her personal data” means that tombstones are OK.  If 
you tombstone some row using TWCS, it will literally *never* be deleted off 
disk, as opposed to using DeletingCompactionStrategy where it could easily be 
removed without leaving data laying around in SSTables.  I’ve done this already 
for this *exact* use case and know it works and works very well.

The debate around what is the “correct” way to solve the problem is a dogmatic 
one and I don’t have any interest in pursuing it any further.  I’ve simply 
offered a solution that I know works because I’ve done it, which is what the OP 
asked for.

[1] https://www.eugdpr.org/key-changes.html 
<https://www.eugdpr.org/key-changes.html>

> On Feb 9, 2018, at 10:33 AM, Dor Laor <d...@scylladb.com> wrote:
> 
> I think you're introducing a layer violation. GDPR is a business requirement 
> and
> compaction is an implementation detail. 
> 
> IMHO it's enough to delete the partition using regular CQL.
> It's true that it won't be deleted immedietly but it will be eventually 
> deleted (welcome to eventual consistency ;).
> 
> Even with user defined compaction, compaction may not be running instantly, 
> repair will be required,
> there are other nodes in the cluster, maybe partitioned nodes with the data. 
> There is data in snapshots
> and backups.
> 
> The business idea is to delete the data in a fast, reasonable time for humans 
> and make it
> first unreachable and later delete completely. 
> 
> On Fri, Feb 9, 2018 at 8:51 AM, Jonathan Haddad <j...@jonhaddad.com 
> <mailto:j...@jonhaddad.com>> wrote:
> That might be fine for a one off but is totally impractical at scale or when 
> using TWCS. 
> On Fri, Feb 9, 2018 at 8:39 AM DuyHai Doan <doanduy...@gmail.com 
> <mailto:doanduy...@gmail.com>> wrote:
> Or use the new user-defined compaction option recently introduced, provided 
> you can determine over which SSTables a partition is spread
> 
> On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad <j...@jonhaddad.com 
> <mailto:j...@jonhaddad.com>> wrote:
> Give this a read through:
> 
> https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy
>  
> <https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy>
> 
> Basically you write your own logic for how stuff gets forgotten, then you can 
> recompact every sstable with upgradesstables -a.  
> 
> Jon
> 
> 
>> On Feb 9, 2018, at 8:10 AM, Nicolas Guyomar <nicolas.guyo...@gmail.com 
>> <mailto:nicolas.guyo...@gmail.com>> wrote:
>> 
>> Hi everyone,
>> 
>> Because of GDPR we really face the need to support “Right to Be Forgotten” 
>> requests => https://gdpr-info.eu/art-17-gdpr/ 
>> <https://gdpr-info.eu/art-17-gdpr/>  stating that "the controller shall have 
>> the obligation to erase personal data without undue delay"
>> 
>> Because I usually meet customers that do not have that much clients, 
>> modeling one partition per client is almost always possible, easing deletion 
>> by partition key.
>> 
>> Then, appart from triggering a manual compaction on impacted tables using 
>> STCS, I do not see how I can be GDPR compliant.
>> 
>> I'm kind of surprised not to find any thread on that matter on the ML, do 
>> you guys have any modeling strategy that would make it easier to get rid of 
>> data ? 
>> 
>> Thank you for any given advice
>> 
>> Nicolas
> 
> 
> 

Reply via email to