[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025052#comment-16025052 ] David Radley commented on ATLAS-1410: - [~mandy_chessell] Looks really good. Some thoughts: - 210 I wonder if language should be a code table value - or more generally an valid value from reference data -210 I am wondering about usage. Should this also be a code table - it seems more structural than the description -220 I suggest the supercategory to the subcategory be a composition (filled in diamond) relationship. - 230 I think the GlossaryCategory role name should be categories rather than category - 240 I wonder about the "to" and "from" ends of the related term as they imply a direction - for a SYNONYM and TRANSLATION there is no direction. It is almost like synonyms and transactions should be in a synonym group or translation group respectively. Maybe we introduce an equivalence group concept, where everything in the group is related to everything else in the group. This would help for tag propagation for these terms. I don't think we have a way in the current Atlas model to constrain the number of classifications to 0..1. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 > proposal v1.3.pdf, Atlas Glossary V2 proposal v1.4.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024937#comment-16024937 ] Mandy Chessell commented on ATLAS-1410: --- A proposed model for the Apache Atlas Glossary is shown on wiki page: https://cwiki.apache.org/confluence/display/ATLAS/Area+2+-+Glossary > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 > proposal v1.3.pdf, Atlas Glossary V2 proposal v1.4.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949427#comment-15949427 ] David Radley commented on ATLAS-1410: - Hi [~Stefhan] and [~clyned] . Thanks you for your feedback. From recent discussions, prompted by your feedback, we are thinking we need: - relationships to be top level relationships. I have raised this as a subtask. - relationships should have modelling flags to indicate composite , names of each side and the relationship, cardinality and the like. We think that if the association ends are not named then we can default the names to has-a etc. I think this is a nice compromise to give some default relationship names but also encourage custom names and to use the modelling flags to see what the real meaning is. - We could introduce default names like has-an. I think has-a and has-an are a bit confusing as in English an is used when the noun starts with a vowel. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 > proposal v1.3.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947758#comment-15947758 ] Deirdre Clyne commented on ATLAS-1410: -- Hi Stefhan, I was interested in your idea of a "has-an" relationship. Using your example of a house, a house could "has an" color, occupants, construction type, lot type, power source type and so many other things. There is probably a nearly infinite list of things you could relate to a concept using this relationship. I wonder if another way to look at this is that a occupant is a role or relationship played by the concept of person or customer. So, the underlying customer goes somewhere else and takes on a new relationship of occupier to a new house if the original house somehow "disappears". The other examples I came up with are all reference data types that would exist independently in the glossary anyway. I'm not sure if there is an implicit desire here to stick to pre-defined relationships and if this approach might encourage too many custom relationships. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 > proposal v1.3.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947703#comment-15947703 ] Deirdre Clyne commented on ATLAS-1410: -- David, I just reviewed the V1.3 and the document is shaping up well. I have a couple of comments - the first is this idea of having two terms with the same name in the same workspace. Your example of this regarding replacement prompts the question - is this a response to not having a versioning solution and thus adding complexity instead of handling the workflow issue around changing a definition? If we think there are other reasons for having non unique term names, how do we understand their separate contexts or decide which to use when? My 2nd comment is around the use of the word taxonomy. We have the term glossary to describe a set of terms and categories around a line of business or other grouping. What would the definition of a taxonomy be to differentiate it from a glossary? We should only use the two terms if we define them differently and if they each have a different purpose. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 > proposal v1.3.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942395#comment-15942395 ] Stefhan van Helvoirt commented on ATLAS-1410: - With regards to the term relationships. I think there is a difference between conceptual relationships such as 'Is a Type Of / Has Types', 'Is Of / Has A' and contextual relationships. Where are contextual relationship could be a 'Is a'. The relationship 'Is a' is very specific and directing and therefor can only be used when describing a specific context or in the scenario in which the relationship is always true. Example of the relationship as described in the document 'Customer is a Person'. In most situations this might not be universal true as for example an Organization can also be a Customer and not every Person is a customer. It should not be possible to have more than one 'Is a' of a particular type. Example, it should not be possible to have both 'Customer is a Person' and 'Customer is an Organization' as both Person and Organization are from the same taxonomy and per instance it can be either the one or the other but not both. Furthermore, i think it's worthwhile to add a new relationship similar to the 'Has a' namely 'Has an'. Example: - House has a Room - House has an Occupant Whereas the 'Has a' is an composition which implies that the child object cannot live withouth the context of its parent. Destroy the house and the rooms disappear. The 'Has an' is an aggregation which implies that the child can exist on its own. Destroy the house and the occupant goes elsewhere. In database modelling the 'Has an' can be seen as a foreign key instead of a normal attribute. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf, Atlas Glossary V2 > proposal v1.3.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936571#comment-15936571 ] David Radley commented on ATLAS-1410: - Responding to [~eostic] Use Case 1 (page 5), with references also to comments on page 7 "It is important that duplicate Glossary Term names can be defined in the glossary, each with their own context..". That "context" is important, and in fact, should be what makes the Term unique. It might be a "duplicate" in the highest sense of the word, for the whole "collection" of Terms in a glossary, but parentage is indeed important, and keeps things separate. Enterprises can't always agree on Term names or their categorization. One company might, in an insurance example, spend the time to clarify their terms and be sure to have "Automobile Accident Claim" along with "Homeowners Claim" in the same Glossary, but another site might just as easily have the same Term "Claim", existing many times.once in a "Personal Lines/Claims/Auto/Accident" category and another in "Personal Lines/Claims/Homeowners" category. <> This is a complex topic, however so it might be beneficial for the model and the APIs to allow full duplication within the higher Glossary level, without requiring parentage definition, leaving it to the "implementer" of any GUI to support (or not) some further level of identitybut it should be strongly recommended. Use Case 9. Classifying terms rather than assets sounds very natural, but shouldn't be a "requirement". Is this implying that direct asset classification wouldn't be permitted? <>There will be times when the classifications are only applicable to assets<>, because a Term does not yet exist, or because the type of classification isn't fine grained enough, or other equally creative reasons. Page 7. Kinds of Glossary Terms. It isn't clear why there is a need for different formal "types" of mutually exclusive Terms. Relationships determine the "use" of a particular Term, and if it is important for a consumer to have, for their users and model, a set of "Semantic" terms vs "Classifying" terms, it can be done other ways, such as by putting the Terms into their own separate categories or parental structures. <> Page 10 and 11 this discussion is more clear now, since the removal of the idea of an "Attribute Term". On page 11 specifically, it may be beneficial to also show how "has-a" can cascade. Customer "has_a" address, and then address could also "has_a" City, State and Zip (and so forth...). <> Page 13. Great that you brought up the need for custom relationships. We need to ensure that this capability as "hoped for", remains intact. ; ) Page 14. Perhaps it needs more explanation, but I found the definition of Has-type and Types to be a bit confusing. "Has-types/is-a-type-of" seems more natural.and that perhaps these could be combined into one. <> Page 14. Synonyms. Perhaps needs more explanation? Synonyms are difficult to have any kind of "owner". They are all peers in a "collection" of similar concepts. Having one owner, in the model itself, could create issues if/when that owner is deleted. <> Page 14. Antonymsneeds further definition so that it is explained separately from Synonyms. In this case, there could be many Terms that are opposites, but they themselves are not necessarily antonyms of each other. This one seems ok to have an "owner" concept. <> Page 14 Homonyms. This one is more like Synonyms, where they can be peers of each other. Page 15. Preferred Term. Great concept. Especially important for enterprises that are overloading the glossary to meet a lot of their governance objectives, but still want to retain the idea that "this term" is "the one to use" for specific alternate name, or priority reference purposes. Specifically critical to scenarios where Terms are seen as a "replacement for names in retrieval requests or reporting tool interfaces". <> Page 15. Collections. Very important concept — but is it part of the Glossary specification? <> ...or should it be reviewed at a much higher Atlas perspective? Certainly the glossary could have a set of Terms, qualified in some way as a "Collection Glossary" and then more generically use "assigned assets" [including other Terms] as a generic relationship but it maybe that this is overloading the Glossary too much > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf, Atlas Glossary V2 proposal v1.2.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932746#comment-15932746 ] David Radley commented on ATLAS-1410: - Thanks you very much for the feedback [~Stefhan]. some responses Page 5: Use case 1: Changed as discussed Use case 2 and Page 6: <> Page 7: <> Unclear why there is a need to have different term types. <> Page 8 "A Classification points to one entity and can have many associated term." I don't think i fully understand this statement. It would be wiser to have the classification point to one or more terms and that the term will point to one or more entities. To be further discussed. <> Also it should be possible to have multiple classification pointing to the same object. <> Page 9 "The classification associated with the term should not be automatically cascaded by Atlas to the assigned assets." Agree that Atlas does not necessarily needs to do the cascading because logic might need to be involved. <> However, the result might need to be made available in Atlas and shown in a distinct way. If Atlas is seen as the single source of truth then it must be possible for a end user to see from solely Atlas that a classification is 'Derived from'. How that derivation has occurred can happen by a different service. <> changes will be in the next document. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932552#comment-15932552 ] David Radley commented on ATLAS-1410: - Responding to [~jonesn] 6- Ok so a term belongs in a glossary, but can be categorized by one in another . I understand this from an object perspective but trying to think of an example as to why that is needed? I guess I'm not clear on the meaning of having multiple glossaries that are interlinked << David this allows us to separate terms into subject areas. It also allows duplicate term names to occur by putting them in different glossaries; something Stefhan is keen on. Also glossaries are the owning entity for terms, so there is no need for a single owning category. >> 8- Are you saying that additional attribute values can be stored with the classification object? I'm thinking here of the example tag based policies covered at section 8.2 of https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies where "EXPIRES_ON" is referred to << David the classification type would be subclassed and new attributes added>> > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [jira] [Commented] (ATLAS-1410) V2 Glossary API
These points that Mandy raises needs to be addressed. Russ Sent from my iPhone > On Feb 19, 2017, at 6:37 AM, Mandy Chessell (JIRA)wrote: > > > [ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873650#comment-15873650 ] > > Mandy Chessell commented on ATLAS-1410: > --- > > Comments on V1.0 > > - Page numbers would help to tie these comments to the document. > - Page 2 - Asset type - defined in terms of itself. How are they used? or is this not relevant to this paper? > - Page 2 - Why do we need to know about V1 and V2? I think it is because the current interfaces works with V1 and the new one will work with V2 - it would be helpful to state this explicitly. > - Page 4 - bullets 4-5 - has-a and is-a relationships are semantic relationships. > - Page 4 - missing from list - ability to associate a semantic meaning to a classification (v2), trait (v1)? > - Page 4 - Missing from the list - "typed-by" relationship to associate terms that include meaning in context with terms that describe more pure objects. For example Home Address is typed by Address. > - Page 5 - Figure 1 - I am not comfortable with terms being owned by categories. I think each terms should be owned by a glossary and linked into 0, 1 or more categories as appropriate. This creates a much simpler deletion rule for the API/End user - particularly when you look at Figure 2 where terms are owned by multiple categories. IE, delete term from its glossary and it is deleted. In the proposed design, it raises such questions as "Is the term deleted when unlinked from all categories - or the first category it is linked to?" > - Page 6 - Figure 3 - I need more detail to understand the "classifies" relationship and how it relates to a classification. It seems redundant. Would you not relate a term to a classification which is in itself semantically classified by its definition term? > - Page 6 - Bullet 6) - What is the alternative to using Gremlin queries? > - Page 6 - Bullet 7) - is this an incomplete sentence - or does the paragraph that follows supposed to be a nested bullet list? Assuming it is a follow on point. My confusion is that I do not understand why the term/category hierarchy is relevant to the enhancement of classifications? The Classification object is defining the type of classification and its meaning is coming from the term? Is this suggesting that the relationships between classifications is coming from the term relationships in the same way we do thin in IGC today? If so it may help to show an example? > - Page 7 - Figure 4 and 5 - what is the difference between "Classification" and "Classification Relationship"? > - Page 7 - Maybe strange examples - the Glossaries would be for different subject areas - for example, there may be a marketing glossary, a customer care glossary, a banking glossary. These may be used for associating meaning to data assets (ie data assets). there may also be glossaries for different regulations, or standard governance approaches, and these may include terms that can be used to describe classification for data that drive operational governance? > - Page 8 - I am not sure what the proposed enhancements are - it just seems to list the problems with the current model. All relationships in metadata are bi-directional. It should be the default. This mechanism seems complicated. Really need to define relationships independent of entities so we can define attributes on these relationships. The Classification is actually an example of an independently defined relationship that includes the GUID of the 2 entities it connects. This should be the common style of relationship. > - Page 9 - on discussion point - a Taxonomy is a hierarchy of categories that the terms are placed in - I thought this was included in the proposal and we do need this for organising terms so that people can find them - and the category hierarchies (taxonomies) help to provide context to terms too. Also, the semantic relationships discussed would mean we could support a simple ontology. > - Page 9 - Fully-qualified name - What a grandparent or parent term? What does a fully qualified name mean and when is it used? The unique name is its GUID. Its path name (there may be many) is the navigation to the term through the category hierarchies. > - Page 9 - why do Atlas terms need to follow the schema in defined at this link - https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html? it seem to imply a lifecycle which is not included in this proposal and a very specific modelling of the IBM industry models that have mandatory fields that are not always applicable to all glossaries. I think this doc should describe the schema of the glossary term explicitly and explain the fields. > - page 10 - Figure 7 shows the navigation relationships and 1
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931905#comment-15931905 ] Stefhan van Helvoirt commented on ATLAS-1410: - Page 5: Use case 1: "It is important that duplicate glossary terms names can be defined in the glossary", No. Within a single glossary, it should not be possible to have multiple terms with identical names. If the requirement arises to maintain duplicates then this should be done across multiple glossaries. This is in alignment with the idea of having 'departmental glossaries'. Use case 2: There might be a need to have two different types of categorizations. One for scoping / context and another for adding generic characteristics on the set of contained terms. In other glossaries this is sometimes referred to as a Parent Category and Referencing Categories. Each term has only one Parent Category but could be referenced by multiple other categories. These referencing categories have a scoping purpose, while the parent category could also have tags / characteristics that will be inherited to the contained terms. Page 6: Regarding discussion point 'There does not appear to be a need for a Glossary Term to have a special “parent” category, as the Glossary owns the Glossary Term' If you want to manage a collection of terms in a similar way within a glossary then some form of parent category or unique structuring method is needed. If there is no uniqueness then multiple groupings with different characteristics can collide. Page 7: "Glossary Term names might not be unique in a Glossary." No, see earlier comments. "This is a name containing a term’s inheritance and the Glossary it comes from." Only Glossary + Term name should be sufficient, no need to add parent terms in the fully qualified name. Unclear why there is a need to have different term types. From a business perspective there is only one type of term. These various types such as Concept and Attribute suggest something technical which is not relevant from a term perspective as they are written from a business view. Also, a term can be a concept in one context and a attribute in another, how is that handled with this setup? E.g. 'Email address' is a attribute of 'Customer' and a Concept in the structure 'Location' --> 'Address' --> 'Electronic address' --> 'Email address'. Page 8 "A Classification points to one entity and can have many associated term." I don't think i fully understand this statement. It would be wiser to have the classification point to one or more terms and that the term will point to one or more entities. To be further discussed. Also it should be possible to have multiple classification pointing to the same object. Page 9 "The classification associated with the term should not be automatically cascaded by Atlas to the assigned assets." Agree that Atlas does not necessarily needs to do the cascading because logic might need to be involved. However, the result might need to be made available in Atlas and shown in a distinct way. If Atlas is seen as the single source of truth then it must be possible for a end user to see from solely Atlas that a classification is 'Derived from'. How that derivation has occurred can happen by a different service. Stopped after page 11. Will continue to review remaining pages in the coming days. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), >
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928488#comment-15928488 ] Nigel Jones commented on ATLAS-1410: Opened RANGER-1464 > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928475#comment-15928475 ] Nigel Jones commented on ATLAS-1410: Note also I think we should open up an additional JIRA to add support for the v2 glossary to the ranger atlas plugin. See https://cwiki.apache.org/confluence/display/RANGER/ATLAS+Plugin#ATLASPlugin-AtlasAccessPermissions for an example of it's use with taxonomy today (albeit simple) > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928458#comment-15928458 ] Nigel Jones commented on ATLAS-1410: Thanks David * 6- Ok so a term belongs in a glossary, but can be categorized by one in another . I understand this from an object perspective but trying to think of an example as to why that is needed? I guess I'm not clear on the meaning of having multiple glossaries that are interlinked * ok to defer internationalization.. though as well as display names relationships like homonyms could be affected since they are sound/dialect as well as country/language specific (this is somewhat peripheral for most I agree) * 8- Are you saying that additional attribute values can be stored with the classification object? I'm thinking here of the example tag based policies covered at section 8.2 of https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies where "EXPIRES_ON" is referred to * ranger tagsync - yes I think we have what's needed. See the referenced ATLAS Jira I opened on a new interface to support the new glossary (including flattening the structure down to simple tags). An example of the JSON that ends up being sent to the ranger server (after extracting from atlas... and we'll use a new API for this... and then going through tagsync) is https://github.com/apache/ranger/blob/master/tagsync/src/main/resources/etc/ranger/data/tags.json > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928409#comment-15928409 ] David Radley commented on ATLAS-1410: - Thank you [~zimnymc] for your feedback. It is great. Here are my responses ad. Use Case use case 1 It shouldn't be possible to define two terms with exactly the same name. <> It can be possible to do it only through synonyms if definition stays the same. <> If we have different definition then we also must have different name for each term. If we will allow same naming we will probably enormously stress glossary integrity. <> use cases 2 and 3 I agree that Categories are needed to give more control over terms organization but I think I need a bit more thinking if categories should help in creating hierarchies. It might be the case but then we should allow terms to only be leaves and every kind of grouping should be done via category. This would mean that categories should also have classifiers. <> use case 7 Do you mean collections ? <> use case 11 this sounds a bit too high level and would probably be nice to describe it in more details <> I'm explicitly missing two things: 1. ability to inherit classifiers <> 2. are there any models between terms and assets or is it only about term to asset ? we might want to include couple of levels of models (like LDM and/or PDM for particular technology) at least one is already there - by connecting terms to other terms we are creating concepts which should be visualized in some way for easier navigation <> ad. discussion point on p. 5 yes, that's how I also see it - Taxonomy is the name of the hierarchy of Glossary Categories but does this mean that Taxonomy is a name of Glossary instance ?> ad. Glossary Terms and Glossary Categories discussion point - can there be a term without Category ? if not will there always be at least one prime category for each Glossary ? if yes what is the difference then between Glossary and prime category ? is there any at all ? <> point for discussion - should it be allowed that term from one Glossary is inside Category from another Glossary ? I think we should not allow this kind of situations as those increase the risk of loosing integrity for particular Glossary. <> I'd say that there should be a copy of that term done to the other Glossary with some kind of a marker "inspired by". Otherwise we will create tight connection between two Glossaries and their maintenance will be more difficult (e.g. upgrades). <> ad. Glossary Term identification and names Glossary Term names might not be unique in a Glossary. For example, there could be 2 definitions of customer. - just NO :-) <> "we do not allow 2 Glossary Terms of the same name inheriting from a parent Glossary Term" - so we do allow or not ? or I missed something ? I need an example for this one to properly understand it. ad. Glossary Term context I'd like to create clear distinction between what is here meant by context and the term business context (being a term to term relations that create business context) - I just don't like using word "context" for both. <> ad. page 9, example In general I do agree with the line of thinking but I have a question: both customer and attributes are terms right ? if so then is "has-a" relationship the best one to do term-to-term assignments ? <> ad. Owning relationships this "Concept Glossary Terms own Attribute Glossary Terms." I've some doubts about (see above remark for page 9, example) I not saying not go there, I just want to explore it more to understand it better <> ad. Discussion point – maybe we should consider defining the Glossary Term attributes using the type system rather than relationships - yes we should ad. Discussion point: we could add homophones as well – if there was a need. I don't think there is a need now to do that. <> ad. Discussion point preferred-term attribute could be stored in the entity, AtlasObjectId or classification. I suggest storing it in the entity. I agree. ad. Discussion point: We will enable collection types to be created. Additionally, we may want to consider including a Collection type that has one attribute called contents with multiple values of the top-level type. Do you mean nesting Collections ? <> ad. Discussion point Introduction of bidirectional relationships, could be done separately from the Glossary enhancement. We may take a step-by-step approach but I'd say we need this from nearly the very beginning. <> ad. Discussion point: We may wish to take a more revolutionary approach and allow relationships to be defined as top level artifacts, of
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928083#comment-15928083 ] David Radley commented on ATLAS-1410: - Many thanks [~jonesn]. Responding to your comments for pages 6 : the intent is that a term is owned by one glossary but can be categorized by categories from any glossary. Do you think I need to be more explicit in the text about this? 7 : Very valid concerns relating to globalization, which I suggest we deal with this separately, as per my exclusions at the top of the document. We have talked of display attributes on the dev list. I have not looked into TitanDbs encoding, whether this is effected by which store is used, whether String data type in Titan supports unicode or UTF-8 and how this fits with indexes. p8 - in figure 3 "hive column" is meant to be an instance - so could be worth using an example like "employee salary" or similar to avoid confusion with type definitions. <>Also on this page it would be worth comparing to the v1 implementation. The association there between the column (entity) & term (trait) is the trait instance, which also carries additional information - parameters. That’s how we might capture the level of SPI, whilst I think with this new design that is done through the hierarchy of glossary terms <>. An example may help? or just a link to page 16. Question for other reviewers - is this sufficient (I think it's simpler, but do we lose additional attributes?) <> 13 : yes there is scope to add new semantics relationships. I agree on your search comment 16/17 agreed. On Ranger Tag sync. I am suggesting we continue to expose classifications as tags. Now V2 Classifications are enhanced by * having a guid (as the name cannot be relied upon to be unique) * having an associated Glossary Terms, including the classifying term. I hope this is sufficient to meet the needs of tag sync; or do you think more is required? > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927857#comment-15927857 ] Mike Nicpan commented on ATLAS-1410: Comments on v1.1 ad. Use Case use case 1 It shouldn't be possible to define two terms with exactly the same name. It can be possible to do it only through synonyms if definition stays the same. If we have different definition then we also must have different name for each term. If we will allow same naming we will probably enormously stress glossary integrity. use cases 2 and 3 I agree that Categories are needed to give more control over terms organization but I think I need a bit more thinking if categories should help in creating hierarchies. It might be the case but then we should allow terms to only be leaves and every kind of grouping should be done via category. This would mean that categories should also have classifiers. use case 7 Do you mean collections ? use case 11 this sounds a bit to high level and would probably be nice to describe it in more details I'm explicitly missing two things: 1. ability to inherit classifiers 2. are there any models between terms and assets or is it only about term to asset ? we might want to include couple of levels of models (like LDM and/or PDM for particular technology) at least one is already there - by connecting terms to other terms we are creating concepts which should be visualized in some way for easier navigation ad. discussion point on p. 5 yes, that's how I also see it - Taxonomy is the name of the hierarchy of Glossary Categories but does this mean that Taxonomy is a name of Glossary instance ? ad. Glossary Terms and Glossary Categories discussion point - can there be a term without Category ? if not will there always be at least one prime category for each Glossary ? if yes what is the difference then between Glossary and prime category ? is there any at all ? point for discussion - should it be allowed that term from one Glossary is inside Category from another Glossary ? I think we should not allow this kind of situations as those increase the risk of loosing integrity for particular Glossary. I'd say that there should be a copy of that term done to the other Glossary with some kind of a marker "inspired by". Otherwise we will create tight connection between two Glossaries and their maintenance will be more difficult (e.g. upgrades). ad. Glossary Term identification and names Glossary Term names might not be unique in a Glossary. For example, there could be 2 definitions of customer. - just NO :) "we do not allow 2 Glossary Terms of the same name inheriting from a parent Glossary Term" - so we do allow or not ? or I missed something ? I need an example for this one to properly understand it. ad. Glossary Term context I'd like to create clear distinction between what is here meant by context and the term business context (being a term to term relations that create business context) - I just don't like using word "context" for both. ad. page 9, example In general I do agree with the line of thinking but I have a question: both customer and attributes are terms right ? if so then is "has-a" relationship the best one to do term-to-term assignments ? ad. Owning relationships this "Concept Glossary Terms own Attribute Glossary Terms." I've some doubts about (see above remark for page 9, example) I not saying not go there, I just want to explore it more to understand it better ad. Discussion point – maybe we should consider defining the Glossary Term attributes using the type system rather than relationships - yes we should ad. Discussion point: we could add homophones as well – if there was a need. I don't think there is a need now to do that. ad. Discussion point preferred-term attribute could be stored in the entity, AtlasObjectId or classification. I suggest storing it in the entity. I agree. ad. Discussion point: We will enable collection types to be created. Additionally, we may want to consider including a Collection type that has one attribute called contents with multiple values of the top-level type. Do you mean nesting Collections ? ad. Discussion point Introduction of bidirectional relationships, could be done separately from the Glossary enhancement. We may take a step-by-step approach but I'd say we need this from nearly the very beginning. ad. Discussion point: We may wish to take a more revolutionary approach and allow relationships to be defined as top level artifacts, of which classifications are a type. Can we explore it more ? Sound pretty ambitious and worth to do but let's list consequences. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL:
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924207#comment-15924207 ] Nigel Jones commented on ATLAS-1410: Opened ATLAS-1662 to track new API needed for ranger A further comment on the proposal. What is the interaction with the Atlas ranger plugin? This is an optional component that can restrict access to metadata in atlas - more info is at https://cwiki.apache.org/confluence/display/RANGER/ATLAS+Plugin > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924065#comment-15924065 ] Nigel Jones commented on ATLAS-1410: On p6 is the intent that term12 can sit in both glossaries - ie it could for example also be linked to cat14 in Glossary 2? This makes sense p7 - How about unicode support for term names ? I think this would be valuable for any displayed name (and perhaps the description is not enough). Surely we should be able to have for example a chinese term in the glossary? More restrictive spec for an internal name is then ok p8 - in figure 3 "hive column" is meant to be an instance - so could be worth using an example like "employee salary" or similar to avoid confusion with type definitions. Also on this page it would be worth comparing to the v1 implementation. The association there between the column (entity) & term (trait) is the trait instance, which also carries additional information - parameters. That’s how we might capture the level of SPI, whilst I think with this new design that is done through the hierarchy of glossary terms. An example may help? or just a link to page 16. Question for other reviewers - is this sufficient (I think it's simpler, but do we lose additional attributes?) p13 - Homophones. This gets more complex due to dialect? (I'm not a linguist). It perhaps brings another dimension - the need for translation of names/descriptions for display purposes General - it's mentioned search is excluded. Perhaps improved DSL/UI support for glossary navigation could be the subject of an additional JIRA (revisit later) p16/17 - Apache ranger integration * We should consider how the existing tag support in ranger is affected by these changes. Perhaps the simplest suggestion is that the current ranger tagsync works only with the v1 glossary. Then I propose that for v2 we have a new tagsync which will navigate the hierarchy to allow ranger (or other enforcement engines) to pick up a simple entity:classification map as it does today, but using a new "OMAS" API. This will require an additional Atlas JIRA (to provide the new Governance API) and a ranger JIRA (to consume that API using a new tagsync process). In those JIRAs we should also update docs to clarify the interoparability I'll open these shortly. * On a related point, it could be useful to be able to identify a term as "relevant" for ranger/enforcement engines. This could come from it's membership in a category, or some other attribute such as a flag. Otherwise an excellent proposal and very much needed for Atlas to "step up" to support enterprise glossaries. > V2 Glossary API > --- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement >Reporter: David Radley >Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905259#comment-15905259 ] David Radley commented on ATLAS-1410: - Responses to comments Page numbers would help to tie these comments to the document. <> Page 2 - Asset type - defined in terms of itself. How are they used? or is this not relevant to this paper? <> Page 2 - Why do we need to know about V1 and V2? I think it is because the current interfaces works with V1 and the new one will work with V2 - it would be helpful to state this explicitly. <> Page 4 - bullets 4-5 - has-a and is-a relationships are semantic relationships. <> Page 4 - missing from list - ability to associate a semantic meaning to a classification (v2), trait (v1)? <> Page 4 - Missing from the list - "typed-by" relationship to associate terms that include meaning in context with terms that describe more pure objects. For example Home Address is typed by Address.<> Page 5 - Figure 1 - I am not comfortable with terms being owned by categories. I think each terms should be owned by a glossary and linked into 0, 1 or more categories as appropriate. This creates a much simpler deletion rule for the API/End user - particularly when you look at Figure 2 where terms are owned by multiple categories. IE, delete term from its glossary and it is deleted. In the proposed design, it raises such questions as "Is the term deleted when unlinked from all categories - or the first category it is linked to?" <> Page 6 - Figure 3 - I need more detail to understand the "classifies" relationship and how it relates to a classification. It seems redundant. Would you not relate a term to a classification which is in itself semantically classified by its definition term? Page 6 - Bullet 6) - What is the alternative to using Gremlin queries? <> Page 6 - Bullet 7) - is this an incomplete sentence - or does the paragraph that follows supposed to be a nested bullet list? Assuming it is a follow on point. My confusion is that I do not understand why the term/category hierarchy is relevant to the enhancement of classifications? The Classification object is defining the type of classification and its meaning is coming from the term? <> Is this suggesting that the relationships between classifications is coming from the term relationships in the same way we do thin in IGC today? <> If so it may help to show an example? <> Page 7 - Figure 4 and 5 - what is the difference between "Classification" and "Classification Relationship"? <> Page 7 - Maybe strange examples - the Glossaries would be for different subject areas - for example, there may be a marketing glossary, a customer care glossary, a banking glossary. These may be used for associating meaning to data assets (ie data assets). there may also be glossaries for different regulations, or standard governance approaches, and these may include terms that can be used to describe classification for data that drive operational governance? <> Page 8 - I am not sure what the proposed enhancements are - it just seems to list the problems with the current model. All relationships in metadata are bi-directional. It should be the default. This mechanism seems complicated. Really need to define relationships independent of entities so we can define attributes on these relationships. The Classification is actually an example of an independently defined relationship that includes the GUID of the 2 entities it connects. This should be the common style of relationship. <> Page 9 - on discussion point - a Taxonomy is a hierarchy of categories that the terms are placed in - I thought this was included in the proposal and we do need this for organising terms so that people can find them - and the category hierarchies (taxonomies) help to provide context to terms too. Also, the semantic relationships discussed would mean we could support a simple ontology. <> Page 9 - Fully-qualified name - What a grandparent or parent term? What does a fully qualified name mean and when is it used? The unique name is its GUID. Its path name (there may be many) is the navigation to the term through the category hierarchies. <> Page 9 - why do Atlas terms need to follow the schema in defined at this link - https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html? it seem to imply a lifecycle which is not included in this proposal and a very specific modelling of the IBM industry models that have mandatory fields that are not always applicable to all glossaries. I think this doc should describe the schema of the glossary term explicitly and explain the fields.<> page 10 - Figure 7 shows the navigation relationships and 1 way. We need to be able to navigate from the hive table to its classification to support the GAF. <> Page 11 - Figure 8 - Atlas entities box is hard to
[jira] [Commented] (ATLAS-1410) V2 Glossary API
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873650#comment-15873650 ] Mandy Chessell commented on ATLAS-1410: --- Comments on V1.0 - Page numbers would help to tie these comments to the document. - Page 2 - Asset type - defined in terms of itself. How are they used? or is this not relevant to this paper? - Page 2 - Why do we need to know about V1 and V2? I think it is because the current interfaces works with V1 and the new one will work with V2 - it would be helpful to state this explicitly. - Page 4 - bullets 4-5 - has-a and is-a relationships are semantic relationships. - Page 4 - missing from list - ability to associate a semantic meaning to a classification (v2), trait (v1)? - Page 4 - Missing from the list - "typed-by" relationship to associate terms that include meaning in context with terms that describe more pure objects. For example Home Address is typed by Address. - Page 5 - Figure 1 - I am not comfortable with terms being owned by categories. I think each terms should be owned by a glossary and linked into 0, 1 or more categories as appropriate. This creates a much simpler deletion rule for the API/End user - particularly when you look at Figure 2 where terms are owned by multiple categories. IE, delete term from its glossary and it is deleted. In the proposed design, it raises such questions as "Is the term deleted when unlinked from all categories - or the first category it is linked to?" - Page 6 - Figure 3 - I need more detail to understand the "classifies" relationship and how it relates to a classification. It seems redundant. Would you not relate a term to a classification which is in itself semantically classified by its definition term? - Page 6 - Bullet 6) - What is the alternative to using Gremlin queries? - Page 6 - Bullet 7) - is this an incomplete sentence - or does the paragraph that follows supposed to be a nested bullet list? Assuming it is a follow on point. My confusion is that I do not understand why the term/category hierarchy is relevant to the enhancement of classifications? The Classification object is defining the type of classification and its meaning is coming from the term? Is this suggesting that the relationships between classifications is coming from the term relationships in the same way we do thin in IGC today? If so it may help to show an example? - Page 7 - Figure 4 and 5 - what is the difference between "Classification" and "Classification Relationship"? - Page 7 - Maybe strange examples - the Glossaries would be for different subject areas - for example, there may be a marketing glossary, a customer care glossary, a banking glossary. These may be used for associating meaning to data assets (ie data assets). there may also be glossaries for different regulations, or standard governance approaches, and these may include terms that can be used to describe classification for data that drive operational governance? - Page 8 - I am not sure what the proposed enhancements are - it just seems to list the problems with the current model. All relationships in metadata are bi-directional. It should be the default. This mechanism seems complicated. Really need to define relationships independent of entities so we can define attributes on these relationships. The Classification is actually an example of an independently defined relationship that includes the GUID of the 2 entities it connects. This should be the common style of relationship. - Page 9 - on discussion point - a Taxonomy is a hierarchy of categories that the terms are placed in - I thought this was included in the proposal and we do need this for organising terms so that people can find them - and the category hierarchies (taxonomies) help to provide context to terms too. Also, the semantic relationships discussed would mean we could support a simple ontology. - Page 9 - Fully-qualified name - What a grandparent or parent term? What does a fully qualified name mean and when is it used? The unique name is its GUID. Its path name (there may be many) is the navigation to the term through the category hierarchies. - Page 9 - why do Atlas terms need to follow the schema in defined at this link - https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html? it seem to imply a lifecycle which is not included in this proposal and a very specific modelling of the IBM industry models that have mandatory fields that are not always applicable to all glossaries. I think this doc should describe the schema of the glossary term explicitly and explain the fields. - page 10 - Figure 7 shows the navigation relationships and 1 way. We need to be able to navigate from the hive table to its classification to support the GAF. - Page 11 - Figure 8 - Atlas entities box is hard