Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-19 Thread Andrés de la Peña
>
> > This type of feature is very useful, but it may be easier to analyze
> this proposal if it’s compared with other DDM implementations from other
> databases? Would it be reasonable to add a table to the proposal comparing
> syntax and output from eg Azure SQL vs Cassandra vs whatever ?


Good idea. I have added a section at the end of the document briefly
describing how some other databases deal with data masking, and with links
to their documentation for the topic. I am not an expert in none of those
databases, so please take my comments there with a grain of salt.

On Fri, 19 Aug 2022 at 17:30, Jeff Jirsa  wrote:

> This type of feature is very useful, but it may be easier to analyze this
> proposal if it’s compared with other DDM implementations from other
> databases? Would it be reasonable to add a table to the proposal comparing
> syntax and output from eg Azure SQL vs Cassandra vs whatever ?
>
>
> On Aug 19, 2022, at 4:50 AM, Andrés de la Peña 
> wrote:
>
> 
> Hi everyone,
>
> I'd like to start a discussion about this proposal for dynamic data
> masking:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking
>
> Dynamic data masking allows to obscure sensitive information without
> changing the stored data. It would be based on a set of native CQL
> functions providing different types of masking, such as replacing the
> column value by "". These functions could be used as regular functions
> or attached to table columns with CREATE/ALTER table. There would be a new
> UNMASK permission, so only the users with this permissions would be able to
> see the unmasked column values. It would be possible to customize masking
> by using UDFs as masking functions.
>
> Thanks,
>
>


Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-19 Thread Jeff Jirsa
This type of feature is very useful, but it may be easier to analyze this 
proposal if it’s compared with other DDM implementations from other databases? 
Would it be reasonable to add a table to the proposal comparing syntax and 
output from eg Azure SQL vs Cassandra vs whatever ? 


> On Aug 19, 2022, at 4:50 AM, Andrés de la Peña  wrote:
> 
> 
> Hi everyone,
> 
> I'd like to start a discussion about this proposal for dynamic data masking: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking
> 
> Dynamic data masking allows to obscure sensitive information without changing 
> the stored data. It would be based on a set of native CQL functions providing 
> different types of masking, such as replacing the column value by "". 
> These functions could be used as regular functions or attached to table 
> columns with CREATE/ALTER table. There would be a new UNMASK permission, so 
> only the users with this permissions would be able to see the unmasked column 
> values. It would be possible to customize masking by using UDFs as masking 
> functions.
> 
> Thanks,


Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-19 Thread Dinesh Joshi
sounds interesting. I would like to understand a couple things here. If the 
column names are the same for masked and unmasked data, it would impact 
existing applications. I am curious what the transition plan look like for 
applications that expect unmasked data?

For example, let’s say you store SSNs and Birth dates. Upon enabling this 
feature, let’s say the app user is not given the UNMASK permission. Now the app 
is receiving masked values for these columns. This is fine for most read only 
applications. However, a lot of times these columns may be used as primary keys 
or part of primary keys in other tables. This would break existing applications.

How would this work in mixed mode when  ew nodes in the cluster are masking 
data and others aren’t? How would it impact the driver?

How would the application learn that the column values are masked? This is 
important in case a user has UNMASK permission and then later taken away. Again 
this would break a lot of applications.

Dinesh

> On Aug 19, 2022, at 4:50 AM, Andrés de la Peña  wrote:
> 
> 
> Hi everyone,
> 
> I'd like to start a discussion about this proposal for dynamic data masking: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking
> 
> Dynamic data masking allows to obscure sensitive information without changing 
> the stored data. It would be based on a set of native CQL functions providing 
> different types of masking, such as replacing the column value by "". 
> These functions could be used as regular functions or attached to table 
> columns with CREATE/ALTER table. There would be a new UNMASK permission, so 
> only the users with this permissions would be able to see the unmasked column 
> values. It would be possible to customize masking by using UDFs as masking 
> functions.
> 
> Thanks,


Re: Is this an MV bug?

2022-08-19 Thread Benedict
You mean entirely distinct CQL statements issued by the same client 
“concurrently”?

If they’re submitted to the same coordinator then M2 will have a higher 
timestamp than M1, so if M2 applies first then M1 will be a no-op and should 
not generate any view update.

If submitted to different coordinators with server-issued timestamps then 
unless timestamps clash, one of them will win, but it may not be M2.

> On 19 Aug 2022, at 11:14, Claude Warren, Jr via dev 
>  wrote:
> 
> Perhaps my diagram was not clear.  I am starting with mutations on the base 
> table.  I assume they are not bundled together so from separate CQL 
> statements.
> 
> On Fri, Aug 19, 2022 at 11:11 AM Claude Warren, Jr  
> wrote:
>> If each mutation comes from a separate CQL they would be separate, no?
>> 
>> 
>> On Fri, Aug 19, 2022 at 10:17 AM Benedict  wrote:
>>> If M1 and M2 both operate over the same partition key they won’t be 
>>> separate mutations, they should be combined into a single mutation before 
>>> submission to SP.mutate
>>> 
>>> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev 
>>> >  wrote:
>>> > 
>>> > 
>>> > 
>>> > # Table definitions
>>> > 
>>> > Table [ Primary key ] other data
>>> > base  [ A B C ] D E 
>>> > MV[ D C ] A B E
>>> > 
>>> > 
>>> > # Initial  data
>>> > base   -> MV 
>>> > [ a b c ] d e  -> [d c] a b e
>>> > [ a' b c ] d e -> [d c] a' b e
>>> > 
>>> > 
>>> > ## Mutations -> expected outcome
>>> > 
>>> > M1: base [ a b c ] d e'  -> MV [ d c ] a b e'
>>> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e
>>> > 
>>> > ## processing bug
>>> > Assume lock can not be obtained during processing of M1.
>>> > 
>>> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 )
>>> > 
>>> > Assume M2 obtains the lock and executes.
>>> > 
>>> > MV is now 
>>> > [ d' c ] a b e
>>> > 
>>> > M1 then obtains the lock and executes
>>> > 
>>> > MV is now 
>>> > [ d c ] a b e'
>>> > [ d' c] a b e
>>> > 
>>> > base is 
>>> > [ a b c ] d e'
>>> > 
>>> > MV entry "[ d' c ] a b e" is orphaned
>>> > 
>>> >


[DISCUSS] CEP-20: Dynamic Data Masking

2022-08-19 Thread Andrés de la Peña
Hi everyone,

I'd like to start a discussion about this proposal for dynamic data
masking:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking

Dynamic data masking allows to obscure sensitive information without
changing the stored data. It would be based on a set of native CQL
functions providing different types of masking, such as replacing the
column value by "". These functions could be used as regular functions
or attached to table columns with CREATE/ALTER table. There would be a new
UNMASK permission, so only the users with this permissions would be able to
see the unmasked column values. It would be possible to customize masking
by using UDFs as masking functions.

Thanks,


Re: Is this an MV bug?

2022-08-19 Thread Claude Warren, Jr via dev
Perhaps my diagram was not clear.  I am starting with mutations on the base
table.  I assume they are not bundled together so from separate CQL
statements.

On Fri, Aug 19, 2022 at 11:11 AM Claude Warren, Jr 
wrote:

> If each mutation comes from a separate CQL they would be separate, no?
>
>
> On Fri, Aug 19, 2022 at 10:17 AM Benedict  wrote:
>
>> If M1 and M2 both operate over the same partition key they won’t be
>> separate mutations, they should be combined into a single mutation before
>> submission to SP.mutate
>>
>> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>> >
>> > 
>> >
>> > # Table definitions
>> >
>> > Table [ Primary key ] other data
>> > base  [ A B C ] D E
>> > MV[ D C ] A B E
>> >
>> >
>> > # Initial  data
>> > base   -> MV
>> > [ a b c ] d e  -> [d c] a b e
>> > [ a' b c ] d e -> [d c] a' b e
>> >
>> >
>> > ## Mutations -> expected outcome
>> >
>> > M1: base [ a b c ] d e'  -> MV [ d c ] a b e'
>> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e
>> >
>> > ## processing bug
>> > Assume lock can not be obtained during processing of M1.
>> >
>> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 )
>> >
>> > Assume M2 obtains the lock and executes.
>> >
>> > MV is now
>> > [ d' c ] a b e
>> >
>> > M1 then obtains the lock and executes
>> >
>> > MV is now
>> > [ d c ] a b e'
>> > [ d' c] a b e
>> >
>> > base is
>> > [ a b c ] d e'
>> >
>> > MV entry "[ d' c ] a b e" is orphaned
>> >
>> >
>>
>>


Re: Is this an MV bug?

2022-08-19 Thread Claude Warren, Jr via dev
If each mutation comes from a separate CQL they would be separate, no?


On Fri, Aug 19, 2022 at 10:17 AM Benedict  wrote:

> If M1 and M2 both operate over the same partition key they won’t be
> separate mutations, they should be combined into a single mutation before
> submission to SP.mutate
>
> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
> >
> > 
> >
> > # Table definitions
> >
> > Table [ Primary key ] other data
> > base  [ A B C ] D E
> > MV[ D C ] A B E
> >
> >
> > # Initial  data
> > base   -> MV
> > [ a b c ] d e  -> [d c] a b e
> > [ a' b c ] d e -> [d c] a' b e
> >
> >
> > ## Mutations -> expected outcome
> >
> > M1: base [ a b c ] d e'  -> MV [ d c ] a b e'
> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e
> >
> > ## processing bug
> > Assume lock can not be obtained during processing of M1.
> >
> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 )
> >
> > Assume M2 obtains the lock and executes.
> >
> > MV is now
> > [ d' c ] a b e
> >
> > M1 then obtains the lock and executes
> >
> > MV is now
> > [ d c ] a b e'
> > [ d' c] a b e
> >
> > base is
> > [ a b c ] d e'
> >
> > MV entry "[ d' c ] a b e" is orphaned
> >
> >
>
>


Re: Is this an MV bug?

2022-08-19 Thread Benedict
If M1 and M2 both operate over the same partition key they won’t be separate 
mutations, they should be combined into a single mutation before submission to 
SP.mutate

> On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev 
>  wrote:
> 
> 
> 
> # Table definitions
> 
> Table [ Primary key ] other data
> base  [ A B C ] D E 
> MV[ D C ] A B E
> 
> 
> # Initial  data
> base   -> MV 
> [ a b c ] d e  -> [d c] a b e
> [ a' b c ] d e -> [d c] a' b e
> 
> 
> ## Mutations -> expected outcome
> 
> M1: base [ a b c ] d e'  -> MV [ d c ] a b e'
> M2: base [ a b c ] d' e -> MV [ d' c ] a b e
> 
> ## processing bug
> Assume lock can not be obtained during processing of M1.
> 
> The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 )
> 
> Assume M2 obtains the lock and executes.
> 
> MV is now 
> [ d' c ] a b e
> 
> M1 then obtains the lock and executes
> 
> MV is now 
> [ d c ] a b e'
> [ d' c] a b e
> 
> base is 
> [ a b c ] d e'
> 
> MV entry "[ d' c ] a b e" is orphaned
> 
> 



Is this an MV bug?

2022-08-19 Thread Claude Warren, Jr via dev
# Table definitions

Table [ Primary key ] other data
base  [ A B C ] D E
MV[ D C ] A B E


# Initial  data
base   -> MV
[ a b c ] d e  -> [d c] a b e
[ a' b c ] d e -> [d c] a' b e


## Mutations -> expected outcome

M1: base [ a b c ] d e'  -> MV [ d c ] a b e'
M2: base [ a b c ] d' e -> MV [ d' c ] a b e

## processing bug
Assume lock can not be obtained during processing of M1.

The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 )

Assume M2 obtains the lock and executes.

MV is now
[ d' c ] a b e

M1 then obtains the lock and executes

MV is now
[ d c ] a b e'
[ d' c] a b e

base is
[ a b c ] d e'

MV entry "[ d' c ] a b e" is orphaned