[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-05-25 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025052#comment-16025052
 ] 

David Radley commented on ATLAS-1410:
-

[~mandy_chessell] Looks really good. Some thoughts: 

- 210 I wonder if language should be a code table value - or more generally an 
valid value from reference data   
-210 I am wondering about usage. Should this also be a code table - it seems 
more structural than the description 
-220 I suggest the supercategory to the subcategory be a composition (filled in 
diamond) relationship.  
- 230 I think the GlossaryCategory role name should be categories rather than 
category 
- 240 I wonder about the "to" and "from" ends of the related term as they imply 
a direction - for a SYNONYM and TRANSLATION there is no direction. It is almost 
like synonyms and transactions should be in a synonym group or translation 
group respectively. Maybe we introduce an equivalence group concept, where 
everything in the group is related to everything else in the group. This would 
help for tag propagation for these terms.

I don't think we have a way in the current Atlas model to constrain the number 
of classifications to  0..1. 

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 
> proposal v1.3.pdf, Atlas Glossary V2 proposal v1.4.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-05-25 Thread Mandy Chessell (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024937#comment-16024937
 ] 

Mandy Chessell commented on ATLAS-1410:
---

A proposed model for the Apache Atlas Glossary is shown on wiki page: 
https://cwiki.apache.org/confluence/display/ATLAS/Area+2+-+Glossary

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 
> proposal v1.3.pdf, Atlas Glossary V2 proposal v1.4.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-30 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949427#comment-15949427
 ] 

David Radley commented on ATLAS-1410:
-

Hi [~Stefhan] and [~clyned] . Thanks you for your feedback. From recent 
discussions, prompted by your feedback, we are thinking we need:
- relationships to be top level relationships. I have raised this as a subtask.
- relationships should have modelling flags to indicate composite , names of 
each side and the relationship, cardinality and the like. 
We think that if the association ends are not named then we can default the 
names to has-a etc. I think this is a nice compromise to give some default 
relationship names  but also encourage custom names and to use the modelling 
flags to see what the real meaning is. 
- We could introduce default names like has-an. I think has-a and has-an are a 
bit confusing as in English an is used when the noun starts with a vowel. 




> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 
> proposal v1.3.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-29 Thread Deirdre Clyne (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947758#comment-15947758
 ] 

Deirdre Clyne commented on ATLAS-1410:
--

Hi Stefhan, I was interested in your idea of a "has-an" relationship. Using 
your example of a house, a house could "has an" color, occupants, construction 
type, lot type, power source type and so many other things. There is probably a 
nearly infinite list of things you could relate to a concept using this 
relationship. 

I wonder if another way to look at this is that a occupant is a role or 
relationship played by the concept of person or customer. So, the underlying 
customer goes somewhere else and takes on a new relationship of occupier to a 
new house if the original house somehow "disappears". The other examples I came 
up with are all reference data types that would exist independently in the 
glossary anyway. 

I'm not sure if there is an implicit desire here to stick to pre-defined 
relationships and if this approach might encourage too many custom 
relationships. 


> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 
> proposal v1.3.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-29 Thread Deirdre Clyne (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947703#comment-15947703
 ] 

Deirdre Clyne commented on ATLAS-1410:
--

David, I just reviewed the V1.3 and the document is shaping up well. I have a 
couple of comments - the first is this idea of having two terms with the same 
name in the same workspace. Your example of this regarding replacement prompts 
the question - is this a response to not having a versioning solution and thus 
adding complexity instead of handling the workflow issue around changing a 
definition? If we think there are other reasons for having non unique term 
names, how do we understand their separate contexts or decide which to use when?
My 2nd comment is around the use of the word taxonomy. We have the term 
glossary to describe a set of terms and categories around a line of business or 
other grouping. What would the definition of a taxonomy be to differentiate it 
from a glossary? We should only use the two terms if we define them differently 
and if they each have a different purpose. 

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 
> proposal v1.3.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-26 Thread Stefhan van Helvoirt (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942395#comment-15942395
 ] 

Stefhan van Helvoirt commented on ATLAS-1410:
-

With regards to the term relationships. 

I think there is a difference between conceptual relationships such as 'Is a 
Type Of / Has  Types', 'Is Of / Has A' and contextual relationships. Where are 
contextual relationship could be a 'Is a'. 

The relationship 'Is a' is very specific and directing and therefor can only be 
used when describing a specific context or in the scenario in which the 
relationship is always true. Example of the relationship as described in the 
document 'Customer is a Person'. In most situations this might not be universal 
true as for example an Organization can also be a Customer and not every Person 
is a customer. It should not be possible to have more than one 'Is a' of a 
particular type. Example, it should not be possible to have both 'Customer is a 
Person' and 'Customer is an Organization' as both Person and Organization are 
from the same taxonomy and per instance it can be either the one or the other 
but not both.

Furthermore, i think it's worthwhile to add a new relationship similar to the 
'Has a' namely 'Has an'. 
Example: 
- House has a Room
- House has an Occupant
Whereas the 'Has a'  is an composition which implies that the child object 
cannot live withouth the context of its parent. Destroy the house and the rooms 
disappear. 
The 'Has an' is an aggregation which implies that the child can exist on its 
own. Destroy the house and the occupant goes elsewhere. 
In database modelling the 'Has an' can be seen as a foreign key instead of a 
normal attribute. 

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 
> proposal v1.3.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-22 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936571#comment-15936571
 ] 

David Radley commented on ATLAS-1410:
-

Responding to [~eostic] 
Use Case 1 (page 5), with references also to comments on page 7 "It is 
important that duplicate Glossary Term names
can be defined in the glossary, each with their own context..". That "context" 
is important, and in fact, should be what makes the Term unique. It might be a 
"duplicate" in the highest sense of the word, for the whole "collection" of 
Terms in a glossary, but parentage is indeed important, and keeps things 
separate. Enterprises can't always agree on Term names or their categorization. 
One company might, in an insurance example, spend the time to clarify their 
terms and be sure to have "Automobile Accident Claim" along with "Homeowners 
Claim" in the same Glossary, but another site might just as easily have the 
same Term "Claim", existing many times.once in a "Personal 
Lines/Claims/Auto/Accident" category and another in "Personal 
Lines/Claims/Homeowners" category.  <>   
This is a complex topic, however so it might be beneficial for the model 
and the APIs to allow full duplication within the higher Glossary level, 
without requiring parentage definition, leaving it to the "implementer" of any 
GUI to support (or not) some further level of identitybut it should be 
strongly recommended. 
Use Case 9. Classifying terms rather than assets sounds very natural, but 
shouldn't be a "requirement". Is this implying that direct asset classification 
wouldn't be permitted? <>There will be times when the classifications are only 
applicable to assets<>, because a Term does not yet exist, or 
because the type of classification isn't fine grained enough, or other equally 
creative reasons. 
Page 7. Kinds of Glossary Terms. It isn't clear why there is a need for 
different formal "types" of mutually exclusive Terms. Relationships determine 
the "use" of a particular Term, and if it is important for a consumer to have, 
for their users and model, a set of "Semantic" terms vs "Classifying" terms, it 
can be done other ways, such as by putting the Terms into their own separate 
categories or parental structures. <>
Page 10 and 11 this discussion is more clear now, since the removal of the 
idea of an "Attribute Term". On page 11 specifically, it may be beneficial to 
also show how "has-a" can cascade. Customer "has_a" address, and then address 
could also "has_a" City, State and Zip (and so forth...). 
<> 
Page 13. Great that you brought up the need for custom relationships. We need 
to ensure that this capability as "hoped for", remains intact. ; ) 
Page 14. Perhaps it needs more explanation, but I found the definition of 
Has-type and Types to be a bit confusing. "Has-types/is-a-type-of" seems more 
natural.and that perhaps these could be combined into one. <> 
Page 14. Synonyms. Perhaps needs more explanation? Synonyms are difficult to 
have any kind of "owner". They are all peers in a "collection" of similar 
concepts. Having one owner, in the model itself, could create issues if/when 
that owner is deleted. <> 
Page 14. Antonymsneeds further definition so that it is explained 
separately from Synonyms. In this case, there could be many Terms that are 
opposites, but they themselves are not necessarily antonyms of each other. This 
one seems ok to have an "owner" concept. <>
Page 14 Homonyms. This one is more like Synonyms, where they can be peers of 
each other.
Page 15. Preferred Term. Great concept. Especially important for enterprises 
that are overloading the glossary to meet a lot of their governance objectives, 
but still want to retain the idea that "this term" is "the one to use" for 
specific alternate name, or priority reference purposes. Specifically critical 
to scenarios where Terms are seen as a "replacement for names in retrieval 
requests or reporting tool interfaces". <> 
Page 15. Collections. Very important concept — but is it part of the Glossary 
specification? <>  ...or should it be reviewed at 
a much higher Atlas perspective? Certainly the glossary could have a set of 
Terms, qualified in some way as a "Collection Glossary" and then more 
generically use "assigned assets" [including other Terms] as a generic 
relationship  but it maybe that this is overloading the Glossary too much

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are 

[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-20 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932746#comment-15932746
 ] 

David Radley commented on ATLAS-1410:
-

Thanks you very much for the feedback  [~Stefhan]. some responses 
Page 5:
Use case 1: Changed as discussed

Use case 2 and Page 6: 
<>  

Page 7: <>
Unclear why there is a need to have different term types. <>

Page 8 "A Classification points to one entity and can have many associated 
term." I don't think i fully understand this statement. It would be wiser to 
have the classification point to one or more terms and that the term will point 
to one or more entities. To be further discussed. <>

Also it should be possible to have multiple classification pointing to the same 
object. <>

Page 9 "The classification associated with the term should not be automatically 
cascaded by Atlas to the assigned assets." Agree that Atlas does not 
necessarily needs to do the cascading because logic might need to be involved.  
<> However, the result might need to be made available in Atlas 
and shown in a distinct way. If Atlas is seen as the single source of truth 
then it must be possible for a end user to see from solely Atlas that a 
classification is 'Derived from'. How that derivation has occurred can happen 
by a different service. <>

changes will be in the next document.  

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-20 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932552#comment-15932552
 ] 

David Radley commented on ATLAS-1410:
-

Responding to [~jonesn]
6- Ok so a term belongs in a glossary, but can be categorized by one in another 
. I understand this from an object perspective but trying to think of an 
example as to why that is needed? I guess I'm not clear on the meaning of 
having multiple glossaries that are interlinked << David this allows us to 
separate terms into subject areas. It also allows duplicate term names to occur 
by putting them in different glossaries; something Stefhan is keen on. Also 
glossaries are the owning entity for terms, so there is no need for a single 
owning category. >>  

8- Are you saying that additional attribute values can be stored with the 
classification object? I'm thinking here of the example tag based policies 
covered at section 8.2 of 
https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies where 
"EXPIRES_ON" is referred to << David the classification type would be 
subclassed and new attributes added>> 


> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-19 Thread Russell Anderson


These points that Mandy raises needs to be addressed.

Russ

Sent from my iPhone

> On Feb 19, 2017, at 6:37 AM, Mandy Chessell (JIRA) 
wrote:
>
>
>
[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873650#comment-15873650
 ]

>
> Mandy Chessell commented on ATLAS-1410:
> ---
>
> Comments on V1.0
>
> - Page numbers would help to tie these comments to the document.
> - Page 2 - Asset type - defined in terms of itself.  How are they used?
or is this not relevant to this paper?
> - Page 2 - Why do we need to know about V1 and V2?  I think it is because
the current interfaces works with V1 and the new one will work with V2 - it
would be helpful to state this explicitly.
> - Page 4 - bullets 4-5 - has-a and is-a relationships are semantic
relationships.
> - Page 4 - missing from list - ability to associate a semantic meaning to
a classification (v2), trait (v1)?
> - Page 4 - Missing from the list - "typed-by" relationship to associate
terms that include meaning in context with terms that describe more pure
objects.  For example Home Address is typed by Address.
> - Page 5 - Figure 1 - I am not comfortable with terms being owned by
categories.  I think each terms should be owned by a glossary and linked
into 0, 1 or more categories as appropriate.  This creates a much simpler
deletion rule for the API/End user - particularly when you look at Figure 2
where terms are owned by multiple categories. IE, delete term from its
glossary and it is deleted.  In the proposed design, it raises such
questions as "Is the term deleted when unlinked from all categories - or
the first category it is linked to?"
> - Page 6 - Figure 3 - I need more detail to understand the "classifies"
relationship and how it relates to a classification.  It seems redundant.
Would you not relate a term to a classification which is in itself
semantically classified by its definition term?
> - Page 6 - Bullet 6) - What is the alternative to using Gremlin queries?
> - Page 6 - Bullet 7) - is this an incomplete sentence - or does the
paragraph that follows supposed to be a nested bullet list?  Assuming it is
a follow on point.  My confusion is that I do not understand why the
term/category hierarchy is relevant to the enhancement of classifications?
The Classification object is defining the type of classification and its
meaning is coming from the term?  Is this suggesting that the relationships
between classifications is coming from the term relationships in the same
way we do thin in IGC today?  If so it may help to show an example?
> - Page 7 - Figure 4 and 5 - what is the difference between
"Classification" and "Classification Relationship"?
> - Page 7 - Maybe strange examples - the Glossaries would be for different
subject areas - for example, there may be a marketing glossary, a customer
care glossary, a banking glossary.  These may be used for associating
meaning to data assets (ie data assets).  there may also be glossaries for
different regulations, or standard governance approaches, and these may
include terms that can be used to describe classification for data that
drive operational governance?
> - Page 8 - I am not sure what the proposed enhancements are - it just
seems to list the problems with the current model.  All relationships in
metadata are bi-directional.  It should be the default.  This mechanism
seems complicated.  Really need to define relationships independent of
entities so we can define attributes on these relationships.  The
Classification is actually an example of an independently defined
relationship that includes the GUID of the 2 entities it connects.   This
should be the common style of relationship.
> - Page 9 - on discussion point - a Taxonomy is a hierarchy of categories
that the terms are placed in - I thought this was included in the proposal
and we do need this for organising terms so that people can find them - and
the category hierarchies (taxonomies) help to provide context to terms too.
Also, the semantic relationships discussed would mean we could support a
simple ontology.
> - Page 9 - Fully-qualified name - What a grandparent or parent term?
What does a fully qualified name mean and when is it used?  The unique name
is its GUID.  Its path name (there may be many) is the navigation to the
term through the category hierarchies.
> - Page 9 - why do Atlas terms need to follow the schema in defined at
this link -
https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html?
   it seem to imply a lifecycle which is not included in this proposal and
a very specific modelling of the IBM industry models that have mandatory
fields that are not always applicable to all glossaries.  I think this doc
should describe the schema of the glossary term explicitly and explain the
fields.
> - page 10 - Figure 7 shows the navigation relationships and 1 

[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-19 Thread Stefhan van Helvoirt (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931905#comment-15931905
 ] 

Stefhan van Helvoirt commented on ATLAS-1410:
-

Page 5:
Use case 1: "It is important that duplicate glossary terms names can be defined 
in the glossary", No. Within a single glossary, it should not be possible to 
have multiple terms with identical names. If the requirement arises to maintain 
duplicates then this should be done across multiple glossaries. This is in 
alignment with the idea of having 'departmental glossaries'.

Use case 2: There might be a need to have two different types of 
categorizations. One for scoping / context and another for adding generic 
characteristics on the set of contained terms. In other glossaries this is 
sometimes referred to as a Parent Category and Referencing Categories. Each 
term has only one Parent Category but could be referenced by multiple other 
categories. These referencing categories have a scoping purpose, while the 
parent category could also have tags / characteristics that will be inherited 
to the contained terms. 

Page 6: Regarding discussion point 'There does not appear to be a need for a 
Glossary Term to have a special “parent” category, as the Glossary owns the 
Glossary Term' If you want to manage a collection of terms in a similar way 
within a glossary then some form of parent category or unique structuring 
method is needed. If there is no uniqueness then multiple groupings with 
different characteristics can collide.

Page 7: "Glossary Term names might not be unique in a Glossary." No, see 
earlier comments. 
"This is a name containing a term’s
inheritance and the Glossary it comes from." Only Glossary + Term name should 
be sufficient, no need to add parent terms in the fully qualified name. 

Unclear why there is a need to have different term types. From a business 
perspective there is only one type of term. These various types such as Concept 
and Attribute suggest something technical which is not relevant from a term 
perspective as they are written from a business view. Also, a term can be a 
concept in one context and a attribute in another, how is that handled with 
this setup? E.g. 'Email address' is a attribute of 'Customer' and a Concept in 
the structure 'Location' --> 'Address' --> 'Electronic address' --> 'Email 
address'.


Page 8 "A Classification points to one entity and can have many associated 
term." I don't think i fully understand this statement. It would be wiser to 
have the classification point to one or more terms and that the term will point 
to one or more entities. To be further discussed. Also it should be possible to 
have multiple classification pointing to the same object. 

Page 9 "The classification associated with the term should not be automatically 
cascaded by Atlas to the assigned assets." Agree that Atlas does not 
necessarily needs to do the cascading because logic might need to be involved. 
However, the result might need to be made available in Atlas and shown in a 
distinct way. If Atlas is seen as the single source of truth then it must be 
possible for a end user to see from solely Atlas that a classification is 
'Derived from'. How that derivation has occurred can happen by a different 
service. 

Stopped after page 11. Will continue to review remaining pages in the coming 
days. 

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> 

[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-16 Thread Nigel Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928488#comment-15928488
 ] 

Nigel Jones commented on ATLAS-1410:


Opened RANGER-1464

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-16 Thread Nigel Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928475#comment-15928475
 ] 

Nigel Jones commented on ATLAS-1410:


Note also I think we should open up an additional JIRA to add support for the 
v2 glossary to the ranger atlas plugin. See 
https://cwiki.apache.org/confluence/display/RANGER/ATLAS+Plugin#ATLASPlugin-AtlasAccessPermissions
 for an example of it's use with taxonomy today (albeit simple)

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-16 Thread Nigel Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928458#comment-15928458
 ] 

Nigel Jones commented on ATLAS-1410:


Thanks David
 * 6- Ok so a term belongs in a glossary, but can be categorized by one in 
another . I understand this from an object perspective but trying to think of 
an example as to why that is needed? I guess I'm not clear on the meaning of 
having multiple glossaries that are interlinked
 * ok to defer internationalization.. though as well as display names 
relationships like homonyms could be affected since they are sound/dialect as 
well as country/language specific (this is somewhat peripheral for most I agree)
 * 8- Are you saying that additional attribute values can be stored with the 
classification object? I'm thinking here of the example tag based policies 
covered at section 8.2 of 
https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies where 
"EXPIRES_ON" is referred to
 * ranger tagsync - yes I think we have what's needed. See the referenced ATLAS 
Jira I opened on a new interface to support the new glossary (including 
flattening the structure down to simple tags). An example of the JSON that ends 
up being sent to the ranger server (after extracting from atlas... and we'll 
use a new API for this... and then going through tagsync) is 
https://github.com/apache/ranger/blob/master/tagsync/src/main/resources/etc/ranger/data/tags.json

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-16 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928409#comment-15928409
 ] 

David Radley commented on ATLAS-1410:
-

Thank you  [~zimnymc] for your feedback. It is great. Here are my responses
ad. Use Case
use case 1 
It shouldn't be possible to define two terms with exactly the same name. 
<> 
It can be possible to do it only through synonyms if definition stays the same. 
<>
If we have different definition then we also must have different name for each 
term. 
If we will allow same naming we will probably enormously stress glossary 
integrity.  <>  
use cases 2 and 3
I agree that Categories are needed to give more control over terms organization 
but I think I need a bit more thinking if 
categories should help in creating hierarchies. It might be the case but then 
we should allow terms to only be leaves and every kind
of grouping should be done via category. This would mean that categories should 
also have classifiers. <>  
use case 7
Do you mean collections ? <>
use case 11
this sounds a bit too high level and would probably be nice to describe it in 
more details <>
I'm explicitly missing two things:
1. ability to inherit classifiers <>
2. are there any models between terms and assets or is it only about term to 
asset ? 
we might want to include couple of levels of models (like LDM and/or PDM for 
particular technology)
at least one is already there - by connecting terms to other terms we are 
creating concepts which should 
be visualized in some way for easier navigation <>
ad. discussion point on p. 5
yes, that's how I also see it - Taxonomy is the name of the hierarchy of 
Glossary Categories
but does this mean that Taxonomy is a name of Glossary instance ? > 
ad. Glossary Terms and Glossary Categories
discussion point - can there be a term without Category ? if not will there 
always be at least one prime category for each Glossary ?
if yes what is the difference then between Glossary and prime category ? is 
there any at all ? <> 
point for discussion - should it be allowed that term from one Glossary is 
inside Category from another Glossary ?
I think we should not allow this kind of situations as those increase the risk 
of loosing integrity for particular Glossary. <>
I'd say that there should be a copy of that term done to the other Glossary 
with some kind of a marker "inspired by".
Otherwise we will create tight connection between two Glossaries and their 
maintenance will be more difficult (e.g. upgrades). <> 
ad. Glossary Term identification and names
Glossary Term names might not be unique in a Glossary. For example, there could 
be 2 definitions of customer. - just NO  :-) <> 
"we do not allow 2 Glossary Terms of the same name inheriting from a parent 
Glossary Term" - so we do allow or not ? or I missed something ?
I need an example for this one to properly understand it.
ad. Glossary Term context
I'd like to create clear distinction between what is here meant by context and 
the term business context (being a term to term relations that
create business context) - I just don't like using word "context" for both. 
<>
ad. page 9, example
In general I do agree with the line of thinking but I have a question:
both customer and attributes are terms right ? if so then is "has-a" 
relationship the best one to do term-to-term assignments ? <> 
ad. Owning relationships
this "Concept Glossary Terms own Attribute Glossary Terms." I've some doubts 
about (see above remark for page 9, example)
I not saying not go there, I just want to explore it more to understand it 
better <> 
ad. Discussion point – maybe we should consider defining the Glossary Term 
attributes using the
type system rather than relationships - yes we should
ad. Discussion point: we could add homophones as well – if there was a need.
I don't think there is a need now to do that. <> 
ad. Discussion point preferred-term attribute could be stored in the entity, 
AtlasObjectId or
classification. I suggest storing it in the entity.
I agree.
ad. Discussion point: We will enable collection types to be created. 
Additionally, we may want to
consider including a Collection type that has one attribute called contents 
with multiple
values of the top-level type.
Do you mean nesting Collections ? <>
ad. Discussion point Introduction of bidirectional relationships, could be done 
separately from
the Glossary enhancement.
We may take a step-by-step approach but I'd say we need this from nearly the 
very beginning.  <>
ad. Discussion point: We may wish to take a more revolutionary approach and 
allow
relationships to be defined as top level artifacts, of 

[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-16 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928083#comment-15928083
 ] 

David Radley commented on ATLAS-1410:
-

Many thanks [~jonesn].
Responding to your comments for pages
6 : the intent is that a term is owned by one glossary but can be categorized 
by categories from any glossary. Do you think I need to be more explicit in the 
text about this?
7 : Very valid concerns relating to globalization, which I suggest we deal with 
this separately, as per my exclusions at the top of the document. 
We have talked of display attributes on the dev list. I have not looked into 
TitanDbs encoding, whether this is effected by which store is used, whether 
String data type in Titan supports unicode or UTF-8 and how this fits with 
indexes.  
p8 - in figure 3 "hive column" is meant to be an instance - so could be worth 
using an example like "employee salary" or similar to avoid confusion with type 
definitions. <>Also on this page it would be worth comparing to 
the v1 implementation. The association there between the column (entity) & term 
(trait) is the trait instance, which also carries additional information - 
parameters. That’s how we might capture the level of SPI, whilst I think with 
this new design that is done through the hierarchy of glossary terms <>. An example may help?  or just a link to page 16. Question 
for other reviewers - is this sufficient (I think it's simpler, but do we lose 
additional attributes?) <>
13 : yes there is scope to add new semantics relationships. 
I agree on your search comment
16/17 agreed.
On Ranger Tag sync. I am suggesting we continue to expose classifications as 
tags. Now V2 Classifications are enhanced by
* having a guid (as the name cannot be relied upon to be unique)
* having an associated Glossary Terms, including the classifying term. 
I hope this is sufficient to meet the needs of tag sync; or do you think more 
is required? 



> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-16 Thread Mike Nicpan (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927857#comment-15927857
 ] 

Mike Nicpan commented on ATLAS-1410:


Comments on v1.1

ad. Use Case
use case 1 
It shouldn't be possible to define two terms with exactly the same 
name. 
It can be possible to do it only through synonyms if definition stays 
the same. 
If we have different definition then we also must have different name 
for each term.
If we will allow same naming we will probably enormously stress 
glossary integrity.

use cases 2 and 3
I agree that Categories are needed to give more control over terms 
organization but I think I need a bit more thinking if 
categories should help in creating hierarchies. It might be the case 
but then we should allow terms to only be leaves and every kind
of grouping should be done via category. This would mean that 
categories should also have classifiers.

use case 7
Do you mean collections ?

use case 11
this sounds a bit to high level and would probably be nice to describe 
it in more details

I'm explicitly missing two things:
1. ability to inherit classifiers
2. are there any models between terms and assets or is it only about term to 
asset ? 
we might want to include couple of levels of models (like LDM and/or 
PDM for particular technology)
at least one is already there - by connecting terms to other terms we 
are creating concepts which should 
be visualized in some way for easier navigation

ad. discussion point on p. 5
yes, that's how I also see it - Taxonomy is the name of the hierarchy of 
Glossary Categories
but does this mean that Taxonomy is a name of Glossary instance ?

ad. Glossary Terms and Glossary Categories
discussion point - can there be a term without Category ? if not will there 
always be at least one prime category for each Glossary ?
if yes what is the difference then between Glossary and prime category 
? is there any at all ?

point for discussion - should it be allowed that term from one Glossary is 
inside Category from another Glossary ?
I think we should not allow this kind of situations as those increase 
the risk of loosing integrity for particular Glossary.
I'd say that there should be a copy of that term done to the other 
Glossary with some kind of a marker "inspired by".
Otherwise we will create tight connection between two Glossaries and 
their maintenance will be more difficult (e.g. upgrades).

ad. Glossary Term identification and names
Glossary Term names might not be unique in a Glossary. For example, there could 
be 2 definitions of customer. - just NO :)
"we do not allow 2 Glossary Terms of the same name inheriting from a parent 
Glossary Term" - so we do allow or not ? or I missed something ?
I need an example for this one to properly understand it.

ad. Glossary Term context
I'd like to create clear distinction between what is here meant by context and 
the term business context (being a term to term relations that
create business context) - I just don't like using word "context" for 
both.

ad. page 9, example
In general I do agree with the line of thinking but I have a question:
both customer and attributes are terms right ? if so then is "has-a" 
relationship the best one to do term-to-term assignments ?

ad. Owning relationships
this "Concept Glossary Terms own Attribute Glossary Terms." I've some doubts 
about (see above remark for page 9, example)
I not saying not go there, I just want to explore it more to understand it 
better

ad. Discussion point – maybe we should consider defining the Glossary Term 
attributes using the
type system rather than relationships - yes we should

ad. Discussion point: we could add homophones as well – if there was a need.
I don't think there is a need now to do that.

ad. Discussion point preferred-term attribute could be stored in the entity, 
AtlasObjectId or
classification. I suggest storing it in the entity.
I agree.

ad. Discussion point: We will enable collection types to be created. 
Additionally, we may want to
consider including a Collection type that has one attribute called contents 
with multiple
values of the top-level type.
Do you mean nesting Collections ?

ad. Discussion point Introduction of bidirectional relationships, could be done 
separately from
the Glossary enhancement.
We may take a step-by-step approach but I'd say we need this from nearly the 
very beginning.

ad. Discussion point: We may wish to take a more revolutionary approach and 
allow
relationships to be defined as top level artifacts, of which classifications 
are a type.
Can we explore it more ? Sound pretty ambitious and worth to do but let's list 
consequences.

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: 

[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-14 Thread Nigel Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924207#comment-15924207
 ] 

Nigel Jones commented on ATLAS-1410:


Opened ATLAS-1662 to track new API needed for ranger

A further comment on the proposal. What is the interaction with the Atlas 
ranger plugin? This is an optional component that can restrict access to 
metadata in atlas - more info is at 
https://cwiki.apache.org/confluence/display/RANGER/ATLAS+Plugin

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-14 Thread Nigel Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924065#comment-15924065
 ] 

Nigel Jones commented on ATLAS-1410:


On p6 is the intent that term12 can sit in both glossaries - ie it could for 
example also be linked to cat14 in Glossary 2? This makes sense

p7 - How about unicode support for term names ? I think this would be valuable 
for any displayed name (and perhaps the description is not enough). Surely we 
should be able to have for example a chinese term in the glossary? More 
restrictive spec for an internal name is then ok

p8 - in figure 3 "hive column" is meant to be an instance - so could be worth 
using an example like "employee salary" or similar to avoid confusion with type 
definitions. Also on this page it would be worth comparing to the v1 
implementation. The association there between the column (entity) & term 
(trait) is the trait instance, which also carries additional information - 
parameters. That’s how we might capture the level of SPI, whilst I think with 
this new design that is done through the hierarchy of glossary terms. An 
example may help? or just a link to page 16. Question for other reviewers - is 
this sufficient (I think it's simpler, but do we lose additional attributes?)

p13 - Homophones. This gets more complex due to dialect? (I'm not a linguist). 
It perhaps brings another dimension - the need for translation of 
names/descriptions for display purposes

General - it's mentioned search is excluded. Perhaps improved DSL/UI support 
for glossary navigation could be the subject of an additional JIRA (revisit 
later)

p16/17 - Apache ranger integration * We should consider how the existing tag 
support in ranger is affected by these changes. Perhaps the simplest suggestion 
is that the current ranger tagsync works only with the v1 glossary. Then I 
propose that for v2 we have a new tagsync which will navigate the hierarchy to 
allow ranger (or other enforcement engines) to pick up a simple 
entity:classification map as it does today, but using a new "OMAS" API. This 
will require an additional Atlas JIRA (to provide the new Governance API) and a 
ranger JIRA (to consume that API using a new tagsync process). In those JIRAs 
we should also update docs to clarify the interoparability I'll open these 
shortly.  
 * On a related point, it could be useful to be able to identify a term as 
"relevant" for ranger/enforcement engines. This could come from it's membership 
in a category, or some other attribute such as a flag. 

Otherwise an excellent proposal and very much needed for Atlas to "step up" to 
support enterprise glossaries.

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the 

[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-10 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905259#comment-15905259
 ] 

David Radley commented on ATLAS-1410:
-

Responses to comments 

Page numbers would help to tie these comments to the document.  <>
Page 2 - Asset type - defined in terms of itself. How are they used? or is this 
not relevant to this paper?  <>
Page 2 - Why do we need to know about V1 and V2? I think it is because the 
current interfaces works with V1 and the new one will work with V2 - it would 
be helpful to state this explicitly. <>
Page 4 - bullets 4-5 - has-a and is-a relationships are semantic relationships. 
<>
Page 4 - missing from list - ability to associate a semantic meaning to a 
classification (v2), trait (v1)?  <>
Page 4 - Missing from the list - "typed-by" relationship to associate terms 
that include meaning in context with terms that describe more pure objects. For 
example Home Address is typed by Address.<>
Page 5 - Figure 1 - I am not comfortable with terms being owned by categories. 
I think each terms should be owned by a glossary and linked into 0, 1 or more 
categories as appropriate. This creates a much simpler deletion rule for the 
API/End user - particularly when you look at Figure 2 where terms are owned by 
multiple categories. IE, delete term from its glossary and it is deleted. In 
the proposed design, it raises such questions as "Is the term deleted when 
unlinked from all categories - or the first category it is linked to?" <>
Page 6 - Figure 3 - I need more detail to understand the "classifies" 
relationship and how it relates to a classification. It seems redundant. Would 
you not relate a term to a classification which is in itself semantically 
classified by its definition term?
Page 6 - Bullet 6) - What is the alternative to using Gremlin queries? <> 
Page 6 - Bullet 7) - is this an incomplete sentence - or does the paragraph 
that follows supposed to be a nested bullet list? Assuming it is a follow on 
point. My confusion is that I do not understand why the term/category hierarchy 
is relevant to the enhancement of classifications? The Classification object is 
defining the type of classification and its meaning is coming from the term? 
<>  Is this suggesting that the relationships between 
classifications is coming from the term relationships in the same way we do 
thin in IGC today? <> If so it may help to show an example? 
<> 
Page 7 - Figure 4 and 5 - what is the difference between "Classification" and 
"Classification Relationship"? <> 
Page 7 - Maybe strange examples - the Glossaries would be for different subject 
areas - for example, there may be a marketing glossary, a customer care 
glossary, a banking glossary. These may be used for associating meaning to data 
assets (ie data assets). there may also be glossaries for different 
regulations, or standard governance approaches, and these may include terms 
that can be used to describe classification for data that drive operational 
governance? <>  
Page 8 - I am not sure what the proposed enhancements are - it just seems to 
list the problems with the current model. All relationships in metadata are 
bi-directional. It should be the default. This mechanism seems complicated. 
Really need to define relationships independent of entities so we can define 
attributes on these relationships. The Classification is actually an example of 
an independently defined relationship that includes the GUID of the 2 entities 
it connects. This should be the common style of relationship. <> 
Page 9 - on discussion point - a Taxonomy is a hierarchy of categories that the 
terms are placed in - I thought this was included in the proposal and we do 
need this for organising terms so that people can find them - and the category 
hierarchies (taxonomies) help to provide context to terms too. Also, the 
semantic relationships discussed would mean we could support a simple ontology. 
<> 
Page 9 - Fully-qualified name - What a grandparent or parent term? What does a 
fully qualified name mean and when is it used? The unique name is its GUID. Its 
path name (there may be many) is the navigation to the term through the 
category hierarchies. <>  
Page 9 - why do Atlas terms need to follow the schema in defined at this link - 
https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html?
 it seem to imply a lifecycle which is not included in this proposal and a very 
specific modelling of the IBM industry models that have mandatory fields that 
are not always applicable to all glossaries. I think this doc should describe 
the schema of the glossary term explicitly and explain the fields.<>
page 10 - Figure 7 shows the navigation relationships and 1 way. We need to be 
able to navigate from the hive table to its classification to support the GAF. 
<>
Page 11 - Figure 8 - Atlas entities box is hard to 

[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-02-19 Thread Mandy Chessell (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873650#comment-15873650
 ] 

Mandy Chessell commented on ATLAS-1410:
---

Comments on V1.0

- Page numbers would help to tie these comments to the document.
- Page 2 - Asset type - defined in terms of itself.  How are they used? or is 
this not relevant to this paper?
- Page 2 - Why do we need to know about V1 and V2?  I think it is because the 
current interfaces works with V1 and the new one will work with V2 - it would 
be helpful to state this explicitly.
- Page 4 - bullets 4-5 - has-a and is-a relationships are semantic 
relationships.
- Page 4 - missing from list - ability to associate a semantic meaning to a 
classification (v2), trait (v1)?
- Page 4 - Missing from the list - "typed-by" relationship to associate terms 
that include meaning in context with terms that describe more pure objects.  
For example Home Address is typed by Address. 
- Page 5 - Figure 1 - I am not comfortable with terms being owned by 
categories.  I think each terms should be owned by a glossary and linked into 
0, 1 or more categories as appropriate.  This creates a much simpler deletion 
rule for the API/End user - particularly when you look at Figure 2 where terms 
are owned by multiple categories. IE, delete term from its glossary and it is 
deleted.  In the proposed design, it raises such questions as "Is the term 
deleted when unlinked from all categories - or the first category it is linked 
to?"
- Page 6 - Figure 3 - I need more detail to understand the "classifies" 
relationship and how it relates to a classification.  It seems redundant.  
Would you not relate a term to a classification which is in itself semantically 
classified by its definition term?
- Page 6 - Bullet 6) - What is the alternative to using Gremlin queries?
- Page 6 - Bullet 7) - is this an incomplete sentence - or does the paragraph 
that follows supposed to be a nested bullet list?  Assuming it is a follow on 
point.  My confusion is that I do not understand why the term/category 
hierarchy is relevant to the enhancement of classifications?  The 
Classification object is defining the type of classification and its meaning is 
coming from the term?  Is this suggesting that the relationships between 
classifications is coming from the term relationships in the same way we do 
thin in IGC today?  If so it may help to show an example?
- Page 7 - Figure 4 and 5 - what is the difference between "Classification" and 
"Classification Relationship"?
- Page 7 - Maybe strange examples - the Glossaries would be for different 
subject areas - for example, there may be a marketing glossary, a customer care 
glossary, a banking glossary.  These may be used for associating meaning to 
data assets (ie data assets).  there may also be glossaries for different 
regulations, or standard governance approaches, and these may include terms 
that can be used to describe classification for data that drive operational 
governance?
- Page 8 - I am not sure what the proposed enhancements are - it just seems to 
list the problems with the current model.  All relationships in metadata are 
bi-directional.  It should be the default.  This mechanism seems complicated.  
Really need to define relationships independent of entities so we can define 
attributes on these relationships.  The Classification is actually an example 
of an independently defined relationship that includes the GUID of the 2 
entities it connects.   This should be the common style of relationship.  
- Page 9 - on discussion point - a Taxonomy is a hierarchy of categories that 
the terms are placed in - I thought this was included in the proposal and we do 
need this for organising terms so that people can find them - and the category 
hierarchies (taxonomies) help to provide context to terms too.  Also, the 
semantic relationships discussed would mean we could support a simple ontology.
- Page 9 - Fully-qualified name - What a grandparent or parent term?  What does 
a fully qualified name mean and when is it used?  The unique name is its GUID.  
Its path name (there may be many) is the navigation to the term through the 
category hierarchies.
- Page 9 - why do Atlas terms need to follow the schema in defined at this link 
- 
https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html?
   it seem to imply a lifecycle which is not included in this proposal and a 
very specific modelling of the IBM industry models that have mandatory fields 
that are not always applicable to all glossaries.  I think this doc should 
describe the schema of the glossary term explicitly and explain the fields.
- page 10 - Figure 7 shows the navigation relationships and 1 way.  We need to 
be able to navigate from the hive table to its classification to support the 
GAF.
- Page 11 - Figure 8 - Atlas entities box is hard