[jira] [Commented] (ATLAS-1955) Validation for Attributes
[ https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137157#comment-16137157 ] Richard Ding commented on ATLAS-1955: - [~davidrad] and [~mandy_chessell] suggested using _PrimitiveDefs_ instead of _attributeTypeDefs_. I think it is a better name for custom data types. > Validation for Attributes > - > > Key: ATLAS-1955 > URL: https://issues.apache.org/jira/browse/ATLAS-1955 > Project: Atlas > Issue Type: New Feature > Components: atlas-core >Affects Versions: 0.9-incubating >Reporter: Israel Varea >Assignee: Richard Ding > Fix For: 0.9-incubating > > > It would be very nice that Atlas model could contain a way to represent > attribute validation. > A simple example is that we would like to model a Person, with attributes > Name, Email and Country. Now we would like to specify that Email has to > follow a specific regular expression, so it would be nice if we could set > Email -> hasValidation -> EmailRegex, with EmailRegex having: > Name: Email Regular Expresion > Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/ > For more complex types of validation, e.g. checking card number validity, it > could be added some external validator function/service. > Name: Credit Card Number Validator > Validator: org.apache.atlas.validators.creditcard or > https://host:port/creditCardValidator > For validations from a reference table, for example a country name, it could > be: > Name: Country Name Ref Validator > Reference Column: > where would be an instance of type Hive_Column or > HBase_Column. > Since this is a kind of Standarization, it could be placed in [Area > 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards]. > A similar approach is followed in software > [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ATLAS-1955) Validation for Attributes
[ https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136014#comment-16136014 ] Richard Ding commented on ATLAS-1955: - Thanks [~davidrad] and [~ivarea] for your comments and suggestions. It seems that what we want here is a new custom attribute type defined as a top-level type in Atlas type system. For example, we can have email and credit card attribute types: {code} "attributeTypeDefs": [ { "name": "email", "typeVersion": "1.0", "baseType": "string" , "validationType": "regex", "validator": "[0-9a-z]@[0-9a-z].[0-9a-z]+" }, { "name": "credit_card", "typeVersion": "1.0", "baseType": "string", "validationType": "class", "validator": "org.apache.atlas.model.validataion.CreditCardValidator" } ] {code} And these custom attribute types then can be used in entity attribute definitions: {code} "entityDefs": [ { "name":"Person", "superTypes": [ "Referenceable" ], "typeVersion":"1.0", "attributeDefs":[ { "name":"emailAddresses", "typeName":"email", "cardinality":"SET", "isIndexable":true, "isOptional":false, "isUnique":false }, { "name":"creditCardNumbers", "typeName":"credit_card", "cardinality":"SET", "isIndexable":true, "isOptional":true, "isUnique":false }, …… } ] {code} Here _attributeTypeDefs_ is used to avoid confusion from _attributeDefs_ defined inside _entityDefs_. > Validation for Attributes > - > > Key: ATLAS-1955 > URL: https://issues.apache.org/jira/browse/ATLAS-1955 > Project: Atlas > Issue Type: New Feature > Components: atlas-core >Affects Versions: 0.9-incubating >Reporter: Israel Varea >Assignee: Richard Ding > Fix For: 0.9-incubating > > > It would be very nice that Atlas model could contain a way to represent > attribute validation. > A simple example is that we would like to model a Person, with attributes > Name, Email and Country. Now we would like to specify that Email has to > follow a specific regular expression, so it would be nice if we could set > Email -> hasValidation -> EmailRegex, with EmailRegex having: > Name: Email Regular Expresion > Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/ > For more complex types of validation, e.g. checking card number validity, it > could be added some external validator function/service. > Name: Credit Card Number Validator > Validator: org.apache.atlas.validators.creditcard or > https://host:port/creditCardValidator > For validations from a reference table, for example a country name, it could > be: > Name: Country Name Ref Validator > Reference Column: > where would be an instance of type Hive_Column or > HBase_Column. > Since this is a kind of Standarization, it could be placed in [Area > 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards]. > A similar approach is followed in software > [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ATLAS-1955) Validation for Attributes
[ https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130275#comment-16130275 ] David Radley commented on ATLAS-1955: - [~ivarea]. I think we want reusable validation one email type all can use. Atlas will then ship some of these types that can be used by the shipped models - I am thinking of url and image being very useful. We could have new attribute types to police valid names for hive tables and the like. I think option 2 or a variant will be very powerful. > Validation for Attributes > - > > Key: ATLAS-1955 > URL: https://issues.apache.org/jira/browse/ATLAS-1955 > Project: Atlas > Issue Type: New Feature > Components: atlas-core >Affects Versions: 0.9-incubating >Reporter: Israel Varea >Assignee: Richard Ding > Fix For: 0.9-incubating > > > It would be very nice that Atlas model could contain a way to represent > attribute validation. > A simple example is that we would like to model a Person, with attributes > Name, Email and Country. Now we would like to specify that Email has to > follow a specific regular expression, so it would be nice if we could set > Email -> hasValidation -> EmailRegex, with EmailRegex having: > Name: Email Regular Expresion > Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/ > For more complex types of validation, e.g. checking card number validity, it > could be added some external validator function/service. > Name: Credit Card Number Validator > Validator: org.apache.atlas.validators.creditcard or > https://host:port/creditCardValidator > For validations from a reference table, for example a country name, it could > be: > Name: Country Name Ref Validator > Reference Column: > where would be an instance of type Hive_Column or > HBase_Column. > Since this is a kind of Standarization, it could be placed in [Area > 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards]. > A similar approach is followed in software > [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ATLAS-1955) Validation for Attributes
[ https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130253#comment-16130253 ] Israel Varea commented on ATLAS-1955: - I can't find a strong argument to decide between the two options. A good question to choose between the two options is: must validations be reusable for different attributes? If the answer is yes, then go with Option 2. If the answer is "A reusable validation is not so important" then go with Option 1 since it is more simple. I think both options will cover most of the common use cases anyway. > Validation for Attributes > - > > Key: ATLAS-1955 > URL: https://issues.apache.org/jira/browse/ATLAS-1955 > Project: Atlas > Issue Type: New Feature > Components: atlas-core >Affects Versions: 0.9-incubating >Reporter: Israel Varea >Assignee: Richard Ding > Fix For: 0.9-incubating > > > It would be very nice that Atlas model could contain a way to represent > attribute validation. > A simple example is that we would like to model a Person, with attributes > Name, Email and Country. Now we would like to specify that Email has to > follow a specific regular expression, so it would be nice if we could set > Email -> hasValidation -> EmailRegex, with EmailRegex having: > Name: Email Regular Expresion > Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/ > For more complex types of validation, e.g. checking card number validity, it > could be added some external validator function/service. > Name: Credit Card Number Validator > Validator: org.apache.atlas.validators.creditcard or > https://host:port/creditCardValidator > For validations from a reference table, for example a country name, it could > be: > Name: Country Name Ref Validator > Reference Column: > where would be an instance of type Hive_Column or > HBase_Column. > Since this is a kind of Standarization, it could be placed in [Area > 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards]. > A similar approach is followed in software > [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ATLAS-1955) Validation for Attributes
[ https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130087#comment-16130087 ] David Radley commented on ATLAS-1955: - [~rding] I agree with option 2, we could simplify the naming and add AttributeDefs as a top level object to TypeDefs, with elements of AttributeDefs. We would then add the optional regex pattern to AttributeDef. If we take this approach, I suggest we also add an optional description to AttributeDef as well (which we want to be able to support RDF standards).. Adding these AttributeDefs like this would also allow people to alias existing primitiveTypes. We should only allow primitive types in the typeDefs attributeDefs type. > Validation for Attributes > - > > Key: ATLAS-1955 > URL: https://issues.apache.org/jira/browse/ATLAS-1955 > Project: Atlas > Issue Type: New Feature > Components: atlas-core >Affects Versions: 0.9-incubating >Reporter: Israel Varea >Assignee: Richard Ding > Fix For: 0.9-incubating > > > It would be very nice that Atlas model could contain a way to represent > attribute validation. > A simple example is that we would like to model a Person, with attributes > Name, Email and Country. Now we would like to specify that Email has to > follow a specific regular expression, so it would be nice if we could set > Email -> hasValidation -> EmailRegex, with EmailRegex having: > Name: Email Regular Expresion > Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/ > For more complex types of validation, e.g. checking card number validity, it > could be added some external validator function/service. > Name: Credit Card Number Validator > Validator: org.apache.atlas.validators.creditcard or > https://host:port/creditCardValidator > For validations from a reference table, for example a country name, it could > be: > Name: Country Name Ref Validator > Reference Column: > where would be an instance of type Hive_Column or > HBase_Column. > Since this is a kind of Standarization, it could be placed in [Area > 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards]. > A similar approach is followed in software > [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ATLAS-1955) Validation for Attributes
[ https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098207#comment-16098207 ] Israel Varea commented on ATLAS-1955: - I think these two validations will cover most of the basic validation cases. In the second validation, from a reference table, it would be nice to be able to provide a reference to a column of a table, since columns are already modeled. I think importing a reference table into Atlas enums will duplicate data, and you will have to keep a synchronization between the two of them, so I think it will be much simple if we just point to a column. However, both of the two alternatives can solve succesfully the same modelling use case :) > Validation for Attributes > - > > Key: ATLAS-1955 > URL: https://issues.apache.org/jira/browse/ATLAS-1955 > Project: Atlas > Issue Type: New Feature > Components: atlas-core >Affects Versions: 0.9-incubating >Reporter: Israel Varea > Fix For: 0.9-incubating > > > It would be very nice that Atlas model could contain a way to represent > attribute validation. > A simple example is that we would like to model a Person, with attributes > Name, Email and Country. Now we would like to specify that Email has to > follow a specific regular expression, so it would be nice if we could set > Email -> hasValidation -> EmailRegex, with EmailRegex having: > Name: Email Regular Expresion > Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/ > For more complex types of validation, e.g. checking card number validity, it > could be added some external validator function/service. > Name: Credit Card Number Validator > Validator: org.apache.atlas.validators.creditcard or > https://host:port/creditCardValidator > For validations from a reference table, for example a country name, it could > be: > Name: Country Name Ref Validator > Reference Column: > where would be an instance of type Hive_Column or > HBase_Column. > Since this is a kind of Standarization, it could be placed in [Area > 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards]. > A similar approach is followed in software > [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [jira] [Commented] (ATLAS-1955) Validation for Attributes
On 23/07/2017 10:36, David Radley (JIRA) wrote: > > [~ivarea] some thoughts: > I think there is a value in Atlas having the capability to validate attribute values conform to a certain pattern. I think there's 2 forms of validation here.. a) Capturing validation rules in Atlas - ie more metadata that we might relate to business terms & apply to assets.. for example to define the fact that whenever we refer to a credit card number it needs to be in a certain format b) Validating the metadata itself Both seem entirely valid.. but I think Israel is referring to a) and David - you are referring to b. Two JIRAS? I could be wrong though, if so a) is an additional idea :-)
[jira] [Commented] (ATLAS-1955) Validation for Attributes
[ https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093242#comment-16093242 ] Nigel Jones commented on ATLAS-1955: I believe here you are modeling the fact that an email has to follow a certain format, so that metadata should be captured in atlas However the actual validation for instances of this data, ie a customer record being stored in a DB, would typically be outside atlas. In addition the validation may differ as it would be specific to the data processing system being used - an ETL engine, hbase, a filesystem, different languages. So I think the model is more similar to that of Policies, Rules & how ranger works In atlas we have a business-centric definition of a policy, but the actual implementation sits at the enforcement point (in this case a ranger rule) I'm interesting in being able to add capability to capture metadata from ranger so we can then tie back the rule implementation to the policy, to aid in compliance checks, reporting -as well as allow ranger to query atlas for policies when a security admin is creating a rule So I wonder if the same pattern applies here with validation? > Validation for Attributes > - > > Key: ATLAS-1955 > URL: https://issues.apache.org/jira/browse/ATLAS-1955 > Project: Atlas > Issue Type: New Feature > Components: atlas-core >Affects Versions: 0.9-incubating >Reporter: Israel Varea > Fix For: 0.9-incubating > > > It would be very nice that Atlas model could contain a way to represent > attribute validation. > A simple example is that we would like to model a Person, with attributes > Name, Email and Country. Now we would like to specify that Email has to > follow a specific regular expression, so it would be nice if we could set > Email -> hasValidation -> EmailRegex, with EmailRegex having: > Name: Email Regular Expresion > Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/ > For more complex types of validation, e.g. checking card number validity, it > could be added some external validator function/service. > Name: Credit Card Number Validator > Validator: org.apache.atlas.validators.creditcard or > https://host:port/creditCardValidator > For validations from a reference table, for example a country name, it could > be: > Name: Country Name Ref Validator > Reference Column: > where would be an instance of type Hive_Column or > HBase_Column. > Since this is a kind of Standarization, it could be placed in [Area > 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards]. > A similar approach is followed in software > [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse] -- This message was sent by Atlassian JIRA (v6.4.14#64029)