[jira] [Commented] (ATLAS-1955) Validation for Attributes

2017-08-22 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137157#comment-16137157
 ] 

Richard Ding commented on ATLAS-1955:
-

[~davidrad] and [~mandy_chessell] suggested using _PrimitiveDefs_ instead of 
_attributeTypeDefs_. I think it is a better name for custom data types.

> Validation for Attributes
> -
>
> Key: ATLAS-1955
> URL: https://issues.apache.org/jira/browse/ATLAS-1955
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Affects Versions: 0.9-incubating
>Reporter: Israel Varea
>Assignee: Richard Ding
> Fix For: 0.9-incubating
>
>
> It would be very nice that Atlas model could contain a way to represent 
> attribute validation. 
> A simple example is that we would like to model a Person, with attributes 
> Name, Email and Country. Now we would like to specify that Email has to 
> follow a specific regular expression, so it would be nice if we could set 
> Email -> hasValidation -> EmailRegex, with EmailRegex having:
> Name: Email Regular Expresion
> Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/
> For more complex types of validation, e.g. checking card number validity, it 
> could be added some external validator function/service.
> Name: Credit Card Number Validator
> Validator: org.apache.atlas.validators.creditcard or 
> https://host:port/creditCardValidator
> For validations from a reference table, for example a country name, it could 
> be:
> Name: Country Name Ref Validator
> Reference Column: 
> where  would be an instance of type Hive_Column or 
> HBase_Column.
> Since this is a kind of Standarization, it could be placed in [Area 
> 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards].
> A similar approach is followed in software 
> [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ATLAS-1955) Validation for Attributes

2017-08-21 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136014#comment-16136014
 ] 

Richard Ding commented on ATLAS-1955:
-

Thanks [~davidrad] and [~ivarea] for your comments and suggestions.

It seems that what we want here is a new custom attribute type defined as a 
top-level type in Atlas type system. For example, we can have email and credit 
card attribute types:
{code}
"attributeTypeDefs": [
  
{

"name": "email",

"typeVersion": "1.0",
"baseType": "string"
,   
"validationType": "regex",

"validator": "[0-9a-z]@[0-9a-z].[0-9a-z]+"
  
},
  
{

"name": "credit_card",

"typeVersion": "1.0",
"baseType": "string",

"validationType": "class",

"validator": "org.apache.atlas.model.validataion.CreditCardValidator"
  
}

]
{code}
And these custom attribute types then can be used in entity attribute 
definitions:
{code}
"entityDefs": [
  {
       "name":"Person",
       "superTypes": [
         "Referenceable"
       ],
       "typeVersion":"1.0",
       "attributeDefs":[
         {
           "name":"emailAddresses",
           "typeName":"email",
           "cardinality":"SET",
           "isIndexable":true,
           "isOptional":false,
           "isUnique":false
         },
 {
           "name":"creditCardNumbers",
           "typeName":"credit_card",
           "cardinality":"SET",
           "isIndexable":true,
           "isOptional":true,
           "isUnique":false
         },
 ……
}
]
{code}
Here  _attributeTypeDefs_ is used to avoid confusion from _attributeDefs_ 
defined inside _entityDefs_.

> Validation for Attributes
> -
>
> Key: ATLAS-1955
> URL: https://issues.apache.org/jira/browse/ATLAS-1955
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Affects Versions: 0.9-incubating
>Reporter: Israel Varea
>Assignee: Richard Ding
> Fix For: 0.9-incubating
>
>
> It would be very nice that Atlas model could contain a way to represent 
> attribute validation. 
> A simple example is that we would like to model a Person, with attributes 
> Name, Email and Country. Now we would like to specify that Email has to 
> follow a specific regular expression, so it would be nice if we could set 
> Email -> hasValidation -> EmailRegex, with EmailRegex having:
> Name: Email Regular Expresion
> Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/
> For more complex types of validation, e.g. checking card number validity, it 
> could be added some external validator function/service.
> Name: Credit Card Number Validator
> Validator: org.apache.atlas.validators.creditcard or 
> https://host:port/creditCardValidator
> For validations from a reference table, for example a country name, it could 
> be:
> Name: Country Name Ref Validator
> Reference Column: 
> where  would be an instance of type Hive_Column or 
> HBase_Column.
> Since this is a kind of Standarization, it could be placed in [Area 
> 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards].
> A similar approach is followed in software 
> [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ATLAS-1955) Validation for Attributes

2017-08-17 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130275#comment-16130275
 ] 

David Radley commented on ATLAS-1955:
-

[~ivarea]. I think we want reusable validation one email type all can use. 
Atlas will then ship some of these types that can be used by the shipped models 
- I am thinking of url and image being very useful. We could have new attribute 
types to police valid names for hive tables and the like. I think option 2 or a 
variant will be very powerful. 

> Validation for Attributes
> -
>
> Key: ATLAS-1955
> URL: https://issues.apache.org/jira/browse/ATLAS-1955
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Affects Versions: 0.9-incubating
>Reporter: Israel Varea
>Assignee: Richard Ding
> Fix For: 0.9-incubating
>
>
> It would be very nice that Atlas model could contain a way to represent 
> attribute validation. 
> A simple example is that we would like to model a Person, with attributes 
> Name, Email and Country. Now we would like to specify that Email has to 
> follow a specific regular expression, so it would be nice if we could set 
> Email -> hasValidation -> EmailRegex, with EmailRegex having:
> Name: Email Regular Expresion
> Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/
> For more complex types of validation, e.g. checking card number validity, it 
> could be added some external validator function/service.
> Name: Credit Card Number Validator
> Validator: org.apache.atlas.validators.creditcard or 
> https://host:port/creditCardValidator
> For validations from a reference table, for example a country name, it could 
> be:
> Name: Country Name Ref Validator
> Reference Column: 
> where  would be an instance of type Hive_Column or 
> HBase_Column.
> Since this is a kind of Standarization, it could be placed in [Area 
> 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards].
> A similar approach is followed in software 
> [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ATLAS-1955) Validation for Attributes

2017-08-17 Thread Israel Varea (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130253#comment-16130253
 ] 

Israel Varea commented on ATLAS-1955:
-

I can't find a strong argument to decide between the two options. A good 
question to choose between the two options is:  must validations be reusable 
for different attributes?
If the answer is yes, then go with Option 2. If the answer is "A reusable 
validation is not so important" then go with Option 1 since it is more simple.
I think both options will cover most of the common use cases anyway.



> Validation for Attributes
> -
>
> Key: ATLAS-1955
> URL: https://issues.apache.org/jira/browse/ATLAS-1955
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Affects Versions: 0.9-incubating
>Reporter: Israel Varea
>Assignee: Richard Ding
> Fix For: 0.9-incubating
>
>
> It would be very nice that Atlas model could contain a way to represent 
> attribute validation. 
> A simple example is that we would like to model a Person, with attributes 
> Name, Email and Country. Now we would like to specify that Email has to 
> follow a specific regular expression, so it would be nice if we could set 
> Email -> hasValidation -> EmailRegex, with EmailRegex having:
> Name: Email Regular Expresion
> Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/
> For more complex types of validation, e.g. checking card number validity, it 
> could be added some external validator function/service.
> Name: Credit Card Number Validator
> Validator: org.apache.atlas.validators.creditcard or 
> https://host:port/creditCardValidator
> For validations from a reference table, for example a country name, it could 
> be:
> Name: Country Name Ref Validator
> Reference Column: 
> where  would be an instance of type Hive_Column or 
> HBase_Column.
> Since this is a kind of Standarization, it could be placed in [Area 
> 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards].
> A similar approach is followed in software 
> [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ATLAS-1955) Validation for Attributes

2017-08-17 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130087#comment-16130087
 ] 

David Radley commented on ATLAS-1955:
-

[~rding] I agree with option 2, we could simplify the naming and add 
AttributeDefs as a top level object to TypeDefs, with elements of 
AttributeDefs. We would then add the optional regex pattern to AttributeDef. If 
we take this approach,  I suggest we also add an optional description to 
AttributeDef as well (which we want to be able to support RDF standards).. 

Adding these AttributeDefs like this would also allow people to alias existing 
primitiveTypes.

We should only allow primitive types in the typeDefs attributeDefs  type.


> Validation for Attributes
> -
>
> Key: ATLAS-1955
> URL: https://issues.apache.org/jira/browse/ATLAS-1955
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Affects Versions: 0.9-incubating
>Reporter: Israel Varea
>Assignee: Richard Ding
> Fix For: 0.9-incubating
>
>
> It would be very nice that Atlas model could contain a way to represent 
> attribute validation. 
> A simple example is that we would like to model a Person, with attributes 
> Name, Email and Country. Now we would like to specify that Email has to 
> follow a specific regular expression, so it would be nice if we could set 
> Email -> hasValidation -> EmailRegex, with EmailRegex having:
> Name: Email Regular Expresion
> Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/
> For more complex types of validation, e.g. checking card number validity, it 
> could be added some external validator function/service.
> Name: Credit Card Number Validator
> Validator: org.apache.atlas.validators.creditcard or 
> https://host:port/creditCardValidator
> For validations from a reference table, for example a country name, it could 
> be:
> Name: Country Name Ref Validator
> Reference Column: 
> where  would be an instance of type Hive_Column or 
> HBase_Column.
> Since this is a kind of Standarization, it could be placed in [Area 
> 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards].
> A similar approach is followed in software 
> [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ATLAS-1955) Validation for Attributes

2017-07-24 Thread Israel Varea (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098207#comment-16098207
 ] 

Israel Varea commented on ATLAS-1955:
-

I think these two validations will cover most of the basic validation cases.
In the second validation, from a reference table, it would be nice to be able 
to provide a reference to a column of a table, since columns are already 
modeled. I think importing a reference table into Atlas enums will duplicate 
data, and you will have to keep a synchronization between the two of them, so I 
think it will be much simple if we just point to a column. However, both of the 
two alternatives can solve succesfully the same modelling use case :)


> Validation for Attributes
> -
>
> Key: ATLAS-1955
> URL: https://issues.apache.org/jira/browse/ATLAS-1955
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Affects Versions: 0.9-incubating
>Reporter: Israel Varea
> Fix For: 0.9-incubating
>
>
> It would be very nice that Atlas model could contain a way to represent 
> attribute validation. 
> A simple example is that we would like to model a Person, with attributes 
> Name, Email and Country. Now we would like to specify that Email has to 
> follow a specific regular expression, so it would be nice if we could set 
> Email -> hasValidation -> EmailRegex, with EmailRegex having:
> Name: Email Regular Expresion
> Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/
> For more complex types of validation, e.g. checking card number validity, it 
> could be added some external validator function/service.
> Name: Credit Card Number Validator
> Validator: org.apache.atlas.validators.creditcard or 
> https://host:port/creditCardValidator
> For validations from a reference table, for example a country name, it could 
> be:
> Name: Country Name Ref Validator
> Reference Column: 
> where  would be an instance of type Hive_Column or 
> HBase_Column.
> Since this is a kind of Standarization, it could be placed in [Area 
> 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards].
> A similar approach is followed in software 
> [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [jira] [Commented] (ATLAS-1955) Validation for Attributes

2017-07-24 Thread Nigel Jones

On 23/07/2017 10:36, David Radley (JIRA) wrote:
>
> [~ivarea] some thoughts:
> I think there is a value in Atlas having the capability to validate 
attribute values conform to a certain pattern.

I think there's 2 forms of validation here..

a) Capturing validation rules in Atlas - ie more metadata that we might 
relate to business terms & apply to assets.. for example to define the 
fact that whenever we refer to a credit card number it needs to be in a 
certain format


b) Validating the metadata itself

Both seem entirely valid.. but I think Israel is referring to a) and 
David - you are referring to b. Two JIRAS? I could be wrong though, if 
so a) is an additional idea :-)




[jira] [Commented] (ATLAS-1955) Validation for Attributes

2017-07-19 Thread Nigel Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093242#comment-16093242
 ] 

Nigel Jones commented on ATLAS-1955:


I believe here you are modeling the fact that an email has to follow a certain 
format, so that metadata should be captured in atlas
However the actual validation for instances of this data, ie a customer record 
being stored in a DB, would typically be outside atlas. In addition the 
validation may differ as it would be specific to the data processing system 
being used - an ETL engine, hbase, a filesystem, different languages. 
So I think the model is more similar to that of Policies, Rules & how ranger 
works
In atlas we have a business-centric definition of a policy, but the actual 
implementation sits at the enforcement point (in this case a ranger rule)
I'm interesting in being able to add capability to capture metadata from ranger 
so we can then tie back the rule implementation to the policy, to aid in 
compliance checks, reporting -as well as allow ranger to query atlas for 
policies when a security admin is creating a rule
So I wonder if the same pattern applies here with validation?

> Validation for Attributes
> -
>
> Key: ATLAS-1955
> URL: https://issues.apache.org/jira/browse/ATLAS-1955
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Affects Versions: 0.9-incubating
>Reporter: Israel Varea
> Fix For: 0.9-incubating
>
>
> It would be very nice that Atlas model could contain a way to represent 
> attribute validation. 
> A simple example is that we would like to model a Person, with attributes 
> Name, Email and Country. Now we would like to specify that Email has to 
> follow a specific regular expression, so it would be nice if we could set 
> Email -> hasValidation -> EmailRegex, with EmailRegex having:
> Name: Email Regular Expresion
> Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/
> For more complex types of validation, e.g. checking card number validity, it 
> could be added some external validator function/service.
> Name: Credit Card Number Validator
> Validator: org.apache.atlas.validators.creditcard or 
> https://host:port/creditCardValidator
> For validations from a reference table, for example a country name, it could 
> be:
> Name: Country Name Ref Validator
> Reference Column: 
> where  would be an instance of type Hive_Column or 
> HBase_Column.
> Since this is a kind of Standarization, it could be placed in [Area 
> 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards].
> A similar approach is followed in software 
> [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)