Re: Review Request 51939: Framework to apply updates to types in the type-system

2016-09-26 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51939/#review150510
---


Ship it!




Ship It!

- Madhan Neethiraj


On Sept. 27, 2016, 12:25 a.m., Sarath Kumar Subramanian wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51939/
> ---
> 
> (Updated Sept. 27, 2016, 12:25 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Shwetha GS, and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1174
> https://issues.apache.org/jira/browse/ATLAS-1174
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> 1. Introduce "version" attribute to all types in the type-system, this helps 
> to track changes made to the default types (hive, sqoop, falcon and storm 
> types) and user created types. If version is not mentioned during creation of 
> a type, default version "1.0" is assigned (optional attribute).
> 2. Using the version attributed for types, introduce a patch framework for 
> type system. This framework applies patches to a type using its version 
> number and can be used during upgrade - add new attributes to an existing 
> types and it will be run during atlas startup.
> The sequence of steps:
> a. During atlas startup, check $ATLAS_HOME/models/patches directory for any 
> available patch files (json files). If there any patch files handle them.
> b. Sample patch json file looks like:
> {
> "patches": [
> { 
> "action": "ADD_ATTRIBUTE",
> "typeName": "hive_column",
> "applyToVersion": "1.0",
> "updateToVersion": "2.0",
> "actionParams": [
> { "name": "position", "dataTypeName": "int", "multiplicity": "optional", 
> "isComposite": false, "isUnique": false, "isIndexable": false, 
> "reverseAttributeName": null }
> ]
> } ]
> }
> c. The framework updates the type in "typeName" for the matching version 
> number and after applying the patch, update the version to the one mentioned 
> in "updateToVersion"
> d. The json file can have more than one action (array of actions).
> e. There can be multiple patch json files in the directory and are applied in 
> the sort order of the filename. eg:
> 001-hive_column_add_position.json
> 002-hive_column_add_anotherattribute.json
> 
> 
> Diffs
> -
> 
>   common/src/main/java/org/apache/atlas/AtlasConstants.java 17ffbd7 
>   common/src/main/java/org/apache/atlas/repository/Constants.java d7f9c89 
>   
> repository/src/main/java/org/apache/atlas/repository/typestore/GraphBackedTypeStore.java
>  a94d157 
>   repository/src/main/java/org/apache/atlas/services/AtlasPatchHandler.java 
> PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/services/AtlasTypeAttributePatch.java
>  PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java 
> PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java
>  3550492 
>   
> repository/src/test/java/org/apache/atlas/service/DefaultMetadataServiceTest.java
>  6782970 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/AbstractDataType.java
>  fad091d 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/ClassType.java 
> c56987a 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumType.java 
> bdd0a13 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumTypeDefinition.java
>  6340615 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalType.java
>  7224699 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalTypeDefinition.java
>  9a299f0 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/IDataType.java 
> 85ddee7 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/StructType.java 
> 6f40c1d 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/StructTypeDefinition.java
>  f1ce1b7 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/TraitType.java 
> f23bf5b 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/TypeSystem.java 
> 70ba89b 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/utils/TypesUtil.java
>  ef8448d 
>   
> typesystem/src/main/scala/org/apache/atlas/typesystem/json/TypesSerialization.scala
>  5618938 
> 
> Diff: https://reviews.apache.org/r/51939/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sarath Kumar Subramanian
> 
>



Re: Review Request 51939: Framework to apply updates to types in the type-system

2016-09-26 Thread Sarath Kumar Subramanian


> On Sept. 20, 2016, 1:28 p.m., Suma Shivaprasad wrote:
> > typesystem/src/main/java/org/apache/atlas/typesystem/types/TypeSystem.java, 
> > line 314
> > 
> >
> > We would need to audit types as well to 
> > know what was the original type definition that was updated and the 
> > patch that was applied.

this will be addressed in separate jiira to track all type changes


- Sarath Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51939/#review149727
---


On Sept. 26, 2016, 5:25 p.m., Sarath Kumar Subramanian wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51939/
> ---
> 
> (Updated Sept. 26, 2016, 5:25 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Shwetha GS, and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1174
> https://issues.apache.org/jira/browse/ATLAS-1174
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> 1. Introduce "version" attribute to all types in the type-system, this helps 
> to track changes made to the default types (hive, sqoop, falcon and storm 
> types) and user created types. If version is not mentioned during creation of 
> a type, default version "1.0" is assigned (optional attribute).
> 2. Using the version attributed for types, introduce a patch framework for 
> type system. This framework applies patches to a type using its version 
> number and can be used during upgrade - add new attributes to an existing 
> types and it will be run during atlas startup.
> The sequence of steps:
> a. During atlas startup, check $ATLAS_HOME/models/patches directory for any 
> available patch files (json files). If there any patch files handle them.
> b. Sample patch json file looks like:
> {
> "patches": [
> { 
> "action": "ADD_ATTRIBUTE",
> "typeName": "hive_column",
> "applyToVersion": "1.0",
> "updateToVersion": "2.0",
> "actionParams": [
> { "name": "position", "dataTypeName": "int", "multiplicity": "optional", 
> "isComposite": false, "isUnique": false, "isIndexable": false, 
> "reverseAttributeName": null }
> ]
> } ]
> }
> c. The framework updates the type in "typeName" for the matching version 
> number and after applying the patch, update the version to the one mentioned 
> in "updateToVersion"
> d. The json file can have more than one action (array of actions).
> e. There can be multiple patch json files in the directory and are applied in 
> the sort order of the filename. eg:
> 001-hive_column_add_position.json
> 002-hive_column_add_anotherattribute.json
> 
> 
> Diffs
> -
> 
>   common/src/main/java/org/apache/atlas/AtlasConstants.java 17ffbd7 
>   common/src/main/java/org/apache/atlas/repository/Constants.java d7f9c89 
>   
> repository/src/main/java/org/apache/atlas/repository/typestore/GraphBackedTypeStore.java
>  a94d157 
>   repository/src/main/java/org/apache/atlas/services/AtlasPatchHandler.java 
> PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/services/AtlasTypeAttributePatch.java
>  PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java 
> PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java
>  3550492 
>   
> repository/src/test/java/org/apache/atlas/service/DefaultMetadataServiceTest.java
>  6782970 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/AbstractDataType.java
>  fad091d 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/ClassType.java 
> c56987a 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumType.java 
> bdd0a13 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumTypeDefinition.java
>  6340615 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalType.java
>  7224699 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalTypeDefinition.java
>  9a299f0 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/IDataType.java 
> 85ddee7 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/StructType.java 
> 6f40c1d 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/StructTypeDefinition.java
>  f1ce1b7 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/TraitType.java 
> f23bf5b 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/TypeSystem.java 
> 70ba89b 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/utils/TypesUtil.java
>  ef8448d 
>   
> typesystem/src/main/scala/org/apache/atlas/typesystem/json/TypesSerialization.scala
>  5618938 
> 
> Diff: https://reviews.apache.org/r/51939/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sarath 

Re: Review Request 51939: Framework to apply updates to types in the type-system

2016-09-26 Thread Sarath Kumar Subramanian


> On Sept. 26, 2016, 4:43 p.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java, 
> > line 78
> > 
> >
> > Consider using type Map for parames.

fixed in latest diff


- Sarath Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51939/#review150489
---


On Sept. 26, 2016, 5:25 p.m., Sarath Kumar Subramanian wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51939/
> ---
> 
> (Updated Sept. 26, 2016, 5:25 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Shwetha GS, and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1174
> https://issues.apache.org/jira/browse/ATLAS-1174
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> 1. Introduce "version" attribute to all types in the type-system, this helps 
> to track changes made to the default types (hive, sqoop, falcon and storm 
> types) and user created types. If version is not mentioned during creation of 
> a type, default version "1.0" is assigned (optional attribute).
> 2. Using the version attributed for types, introduce a patch framework for 
> type system. This framework applies patches to a type using its version 
> number and can be used during upgrade - add new attributes to an existing 
> types and it will be run during atlas startup.
> The sequence of steps:
> a. During atlas startup, check $ATLAS_HOME/models/patches directory for any 
> available patch files (json files). If there any patch files handle them.
> b. Sample patch json file looks like:
> {
> "patches": [
> { 
> "action": "ADD_ATTRIBUTE",
> "typeName": "hive_column",
> "applyToVersion": "1.0",
> "updateToVersion": "2.0",
> "actionParams": [
> { "name": "position", "dataTypeName": "int", "multiplicity": "optional", 
> "isComposite": false, "isUnique": false, "isIndexable": false, 
> "reverseAttributeName": null }
> ]
> } ]
> }
> c. The framework updates the type in "typeName" for the matching version 
> number and after applying the patch, update the version to the one mentioned 
> in "updateToVersion"
> d. The json file can have more than one action (array of actions).
> e. There can be multiple patch json files in the directory and are applied in 
> the sort order of the filename. eg:
> 001-hive_column_add_position.json
> 002-hive_column_add_anotherattribute.json
> 
> 
> Diffs
> -
> 
>   common/src/main/java/org/apache/atlas/AtlasConstants.java 17ffbd7 
>   common/src/main/java/org/apache/atlas/repository/Constants.java d7f9c89 
>   
> repository/src/main/java/org/apache/atlas/repository/typestore/GraphBackedTypeStore.java
>  a94d157 
>   repository/src/main/java/org/apache/atlas/services/AtlasPatchHandler.java 
> PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/services/AtlasTypeAttributePatch.java
>  PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java 
> PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java
>  3550492 
>   
> repository/src/test/java/org/apache/atlas/service/DefaultMetadataServiceTest.java
>  6782970 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/AbstractDataType.java
>  fad091d 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/ClassType.java 
> c56987a 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumType.java 
> bdd0a13 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumTypeDefinition.java
>  6340615 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalType.java
>  7224699 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalTypeDefinition.java
>  9a299f0 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/IDataType.java 
> 85ddee7 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/StructType.java 
> 6f40c1d 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/StructTypeDefinition.java
>  f1ce1b7 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/TraitType.java 
> f23bf5b 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/TypeSystem.java 
> 70ba89b 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/utils/TypesUtil.java
>  ef8448d 
>   
> typesystem/src/main/scala/org/apache/atlas/typesystem/json/TypesSerialization.scala
>  5618938 
> 
> Diff: https://reviews.apache.org/r/51939/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sarath Kumar Subramanian
> 
>



Re: Review Request 51939: Framework to apply updates to types in the type-system

2016-09-26 Thread Sarath Kumar Subramanian

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51939/
---

(Updated Sept. 26, 2016, 5:25 p.m.)


Review request for atlas, Madhan Neethiraj, Shwetha GS, and Suma Shivaprasad.


Changes
---

changed params attribute from string to map


Bugs: ATLAS-1174
https://issues.apache.org/jira/browse/ATLAS-1174


Repository: atlas


Description
---

1. Introduce "version" attribute to all types in the type-system, this helps to 
track changes made to the default types (hive, sqoop, falcon and storm types) 
and user created types. If version is not mentioned during creation of a type, 
default version "1.0" is assigned (optional attribute).
2. Using the version attributed for types, introduce a patch framework for type 
system. This framework applies patches to a type using its version number and 
can be used during upgrade - add new attributes to an existing types and it 
will be run during atlas startup.
The sequence of steps:
a. During atlas startup, check $ATLAS_HOME/models/patches directory for any 
available patch files (json files). If there any patch files handle them.
b. Sample patch json file looks like:
{
"patches": [
{ 
"action": "ADD_ATTRIBUTE",
"typeName": "hive_column",
"applyToVersion": "1.0",
"updateToVersion": "2.0",
"actionParams": [
{ "name": "position", "dataTypeName": "int", "multiplicity": "optional", 
"isComposite": false, "isUnique": false, "isIndexable": false, 
"reverseAttributeName": null }
]
} ]
}
c. The framework updates the type in "typeName" for the matching version number 
and after applying the patch, update the version to the one mentioned in 
"updateToVersion"
d. The json file can have more than one action (array of actions).
e. There can be multiple patch json files in the directory and are applied in 
the sort order of the filename. eg:
001-hive_column_add_position.json
002-hive_column_add_anotherattribute.json


Diffs (updated)
-

  common/src/main/java/org/apache/atlas/AtlasConstants.java 17ffbd7 
  common/src/main/java/org/apache/atlas/repository/Constants.java d7f9c89 
  
repository/src/main/java/org/apache/atlas/repository/typestore/GraphBackedTypeStore.java
 a94d157 
  repository/src/main/java/org/apache/atlas/services/AtlasPatchHandler.java 
PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/services/AtlasTypeAttributePatch.java 
PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java 
PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 
3550492 
  
repository/src/test/java/org/apache/atlas/service/DefaultMetadataServiceTest.java
 6782970 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/AbstractDataType.java
 fad091d 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/ClassType.java 
c56987a 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumType.java 
bdd0a13 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumTypeDefinition.java
 6340615 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalType.java
 7224699 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalTypeDefinition.java
 9a299f0 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/IDataType.java 
85ddee7 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/StructType.java 
6f40c1d 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/StructTypeDefinition.java
 f1ce1b7 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/TraitType.java 
f23bf5b 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/TypeSystem.java 
70ba89b 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/utils/TypesUtil.java 
ef8448d 
  
typesystem/src/main/scala/org/apache/atlas/typesystem/json/TypesSerialization.scala
 5618938 

Diff: https://reviews.apache.org/r/51939/diff/


Testing
---


Thanks,

Sarath Kumar Subramanian



Re: Review Request 51939: Framework to apply updates to types in the type-system

2016-09-26 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51939/#review150489
---




repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java (line 78)


Consider using type Map for parames.


- Madhan Neethiraj


On Sept. 26, 2016, 10:27 p.m., Sarath Kumar Subramanian wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51939/
> ---
> 
> (Updated Sept. 26, 2016, 10:27 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Shwetha GS, and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1174
> https://issues.apache.org/jira/browse/ATLAS-1174
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> 1. Introduce "version" attribute to all types in the type-system, this helps 
> to track changes made to the default types (hive, sqoop, falcon and storm 
> types) and user created types. If version is not mentioned during creation of 
> a type, default version "1.0" is assigned (optional attribute).
> 2. Using the version attributed for types, introduce a patch framework for 
> type system. This framework applies patches to a type using its version 
> number and can be used during upgrade - add new attributes to an existing 
> types and it will be run during atlas startup.
> The sequence of steps:
> a. During atlas startup, check $ATLAS_HOME/models/patches directory for any 
> available patch files (json files). If there any patch files handle them.
> b. Sample patch json file looks like:
> {
> "patches": [
> { 
> "action": "ADD_ATTRIBUTE",
> "typeName": "hive_column",
> "applyToVersion": "1.0",
> "updateToVersion": "2.0",
> "actionParams": [
> { "name": "position", "dataTypeName": "int", "multiplicity": "optional", 
> "isComposite": false, "isUnique": false, "isIndexable": false, 
> "reverseAttributeName": null }
> ]
> } ]
> }
> c. The framework updates the type in "typeName" for the matching version 
> number and after applying the patch, update the version to the one mentioned 
> in "updateToVersion"
> d. The json file can have more than one action (array of actions).
> e. There can be multiple patch json files in the directory and are applied in 
> the sort order of the filename. eg:
> 001-hive_column_add_position.json
> 002-hive_column_add_anotherattribute.json
> 
> 
> Diffs
> -
> 
>   common/src/main/java/org/apache/atlas/AtlasConstants.java 17ffbd7 
>   common/src/main/java/org/apache/atlas/repository/Constants.java d7f9c89 
>   
> repository/src/main/java/org/apache/atlas/repository/typestore/GraphBackedTypeStore.java
>  a94d157 
>   repository/src/main/java/org/apache/atlas/services/AtlasPatchHandler.java 
> PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/services/AtlasTypeAttributePatch.java
>  PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java 
> PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java
>  3550492 
>   
> repository/src/test/java/org/apache/atlas/service/DefaultMetadataServiceTest.java
>  6782970 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/AbstractDataType.java
>  fad091d 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/ClassType.java 
> c56987a 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumType.java 
> bdd0a13 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumTypeDefinition.java
>  6340615 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalType.java
>  7224699 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalTypeDefinition.java
>  9a299f0 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/IDataType.java 
> 85ddee7 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/StructType.java 
> 6f40c1d 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/StructTypeDefinition.java
>  f1ce1b7 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/TraitType.java 
> f23bf5b 
>   typesystem/src/main/java/org/apache/atlas/typesystem/types/TypeSystem.java 
> 70ba89b 
>   
> typesystem/src/main/java/org/apache/atlas/typesystem/types/utils/TypesUtil.java
>  ef8448d 
>   
> typesystem/src/main/scala/org/apache/atlas/typesystem/json/TypesSerialization.scala
>  5618938 
> 
> Diff: https://reviews.apache.org/r/51939/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sarath Kumar Subramanian
> 
>



[jira] [Updated] (ATLAS-1198) Move from Guice + Spring to only Spring

2016-09-26 Thread Apoorv Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apoorv Naik updated ATLAS-1198:
---
Description: Right now we're using both Guice and Spring and dependency 
injection which is not a ideal combination. As Atlas evolves as a product we 
can leverage the spring framework to avoid re-inventing the wheel like caching, 
transaction management etc.  (was: Right now we're using both Guice and Spring 
and dependency injection which is not a ideal combination. As Atlas evolves as 
a product we can leverage the spring framework to re-invent the wheel like 
caching, transaction management etc.)

> Move from Guice + Spring to only Spring
> ---
>
> Key: ATLAS-1198
> URL: https://issues.apache.org/jira/browse/ATLAS-1198
> Project: Atlas
>  Issue Type: Improvement
>Affects Versions: 0.7-incubating, 0.8-incubating
>Reporter: Apoorv Naik
>
> Right now we're using both Guice and Spring and dependency injection which is 
> not a ideal combination. As Atlas evolves as a product we can leverage the 
> spring framework to avoid re-inventing the wheel like caching, transaction 
> management etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 51939: Framework to apply updates to types in the type-system

2016-09-26 Thread Sarath Kumar Subramanian


On Sept. 21, 2016, 7:46 a.m., Sarath Kumar Subramanian wrote:
> > Add tests
> > 
> > 1. Currently, the model files(like hive_model.json) are auto generated from 
> > model definitions defined in java(like HiveDataModelGenerator). The patch 
> > files in this case has to be hand coded which is error prone
> > 2. For completeness, readability and debuggability, the type update has to 
> > be done in the corresponding model definitions like HiveDataModelGenerator. 
> > So, same data will be in two places and the model definitions and the patch 
> > files can go out of sync
> > 3. Since the model definitions(like HiveDataModelGenerator) will be updated 
> > anyways, if we modify ReservedTypesRegistrar to do type update instead of 
> > type create, the type updates will automatically be taken care with the 
> > same model json. So, model update patches are not necessary then
> > 3. This jira doesn't implement type versioning - doesn't have support for 
> > storing multiple versions of the type. But it maintains the version of the 
> > latest type definition which I think is useful for debugging, to know the 
> > version of type that the server knows. Can we maintain this info in 
> > HiveDataModelGenerator itself, and hence will be part of hive_model.json
> 
> Sarath Kumar Subramanian wrote:
> 1. patch json genarator can be addressed in a separate jiira
> 2. this is only for upgrade scenarios
> 3. lets address this in a separate jiira.
> 4. we dont want to maintain multiple versions of the same type in the 
> typesystem, will confuse users on which version to use to create entities.
> 
> Shwetha GS wrote:
> What benefit does this patch framework add when the type update can be 
> done with the existing functionality itself (Type versioning is useful and 
> can be done without the patch framework). The patch framework adds more 
> overhead, and I don't see the necessity
> 
> David Radley wrote:
> I agree - I think this capability would be better placed in separate 
> tooling that uses the Atlas REST APIs.
> 
> Madhan Neethiraj wrote:
> @Shwetha, @David - the rational for the patch framework is exactly same 
> as the need for reading contents of "models" directory during startup and 
> initializing the typesystem. This just makes it easier to deal with 
> updates/additions to typesystem during software upgrade/patch.
> 
> Sarath Kumar Subramanian wrote:
> The existing system to update types (using REST) has these limitations:
> 
> 1. Clients might add new attributes to a type and when we add a set of 
> new attributes in model.json -  the rest api model overwrites customer 
> changes.
> 
> 2. More granular type updates cannot be applied, if you add a new 
> attribute in hive_model for eg.,  During atlas startup it doesn't do a type 
> by model comparison and apply the diff. This patch can update to a specific 
> type based on its version - condition based update.
> 
> David Radley wrote:
> @Madhan @Sarath It seems to me that the model files - are system types 
> that we supply to aid integration with Hive and the like. As these do not 
> change, then I am happy these are in model files. If we put changes in patch 
> files, then patch files become the master. I think we need the Atlas 
> repository to be the master of all the types in all cases. It  should not be 
> a slave to patch files. I can see a place for patch files that tooling uses 
> at a one off upgrade of types to update the repository. I agree that care 
> needs to be taken on type shape changes, in practice I would expect that 
> these shape change operations could be restricted to admin roles.
> 
> Shwetha GS wrote:
> Sarath,
> >>Clients might add new attributes to a type and when we add a set of new 
> attributes in model.json - the rest api model overwrites customer changes
> Thats a valid point. 
> 
> The patch doesn't apply on trunk. Can you re-base the patch and add the 
> tests?
> 
> David,
> Registering these models/patches is part of set-up tool required for 
> hooks to work. These use internal methods directly which we should change to 
> make use of APIs. This set-up is required to be run before any hooks can 
> register the entities, hence its run as part of service start itself. But I 
> agree that we should move this out of repository.

Shwetha,
I have rebased the patch and added unit tests.

Thanks,
Sarath


- Sarath Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51939/#review149811
---


On Sept. 26, 2016, 3:27 p.m., Sarath Kumar Subramanian wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51939/
> ---
> 
> (Updated 

Re: Review Request 51939: Framework to apply updates to types in the type-system

2016-09-26 Thread Sarath Kumar Subramanian

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51939/
---

(Updated Sept. 26, 2016, 3:27 p.m.)


Review request for atlas, Madhan Neethiraj, Shwetha GS, and Suma Shivaprasad.


Changes
---

added unit tests and rebased the patch - updated one


Bugs: ATLAS-1174
https://issues.apache.org/jira/browse/ATLAS-1174


Repository: atlas


Description
---

1. Introduce "version" attribute to all types in the type-system, this helps to 
track changes made to the default types (hive, sqoop, falcon and storm types) 
and user created types. If version is not mentioned during creation of a type, 
default version "1.0" is assigned (optional attribute).
2. Using the version attributed for types, introduce a patch framework for type 
system. This framework applies patches to a type using its version number and 
can be used during upgrade - add new attributes to an existing types and it 
will be run during atlas startup.
The sequence of steps:
a. During atlas startup, check $ATLAS_HOME/models/patches directory for any 
available patch files (json files). If there any patch files handle them.
b. Sample patch json file looks like:
{
"patches": [
{ 
"action": "ADD_ATTRIBUTE",
"typeName": "hive_column",
"applyToVersion": "1.0",
"updateToVersion": "2.0",
"actionParams": [
{ "name": "position", "dataTypeName": "int", "multiplicity": "optional", 
"isComposite": false, "isUnique": false, "isIndexable": false, 
"reverseAttributeName": null }
]
} ]
}
c. The framework updates the type in "typeName" for the matching version number 
and after applying the patch, update the version to the one mentioned in 
"updateToVersion"
d. The json file can have more than one action (array of actions).
e. There can be multiple patch json files in the directory and are applied in 
the sort order of the filename. eg:
001-hive_column_add_position.json
002-hive_column_add_anotherattribute.json


Diffs (updated)
-

  common/src/main/java/org/apache/atlas/AtlasConstants.java 17ffbd7 
  common/src/main/java/org/apache/atlas/repository/Constants.java d7f9c89 
  
repository/src/main/java/org/apache/atlas/repository/typestore/GraphBackedTypeStore.java
 a94d157 
  repository/src/main/java/org/apache/atlas/services/AtlasPatchHandler.java 
PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/services/AtlasTypeAttributePatch.java 
PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java 
PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 
3550492 
  
repository/src/test/java/org/apache/atlas/service/DefaultMetadataServiceTest.java
 6782970 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/AbstractDataType.java
 fad091d 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/ClassType.java 
c56987a 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumType.java 
bdd0a13 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumTypeDefinition.java
 6340615 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalType.java
 7224699 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalTypeDefinition.java
 9a299f0 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/IDataType.java 
85ddee7 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/StructType.java 
6f40c1d 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/StructTypeDefinition.java
 f1ce1b7 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/TraitType.java 
f23bf5b 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/TypeSystem.java 
70ba89b 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/utils/TypesUtil.java 
ef8448d 
  
typesystem/src/main/scala/org/apache/atlas/typesystem/json/TypesSerialization.scala
 5618938 

Diff: https://reviews.apache.org/r/51939/diff/


Testing
---


Thanks,

Sarath Kumar Subramanian



[jira] [Created] (ATLAS-1198) Move from Guice + Spring to only Spring

2016-09-26 Thread Apoorv Naik (JIRA)
Apoorv Naik created ATLAS-1198:
--

 Summary: Move from Guice + Spring to only Spring
 Key: ATLAS-1198
 URL: https://issues.apache.org/jira/browse/ATLAS-1198
 Project: Atlas
  Issue Type: Improvement
Affects Versions: 0.7-incubating, 0.8-incubating
Reporter: Apoorv Naik


Right now we're using both Guice and Spring and dependency injection which is 
not a ideal combination. As Atlas evolves as a product we can leverage the 
spring framework to re-invent the wheel like caching, transaction management 
etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 47810: ATLAS-694: Update Atlas to use Graph DB abstraction layer

2016-09-26 Thread Jeff Hagelberg


> On Sept. 23, 2016, 8:13 p.m., Suma Shivaprasad wrote:
> > repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala, line 
> > 575
> > 
> >
> > Is this change, fixing a bug in current code? if so, can you pls 
> > explain?

This isn't really a bug fix.  It is part of a performance optimization that was 
put in for IBM Graph support.  For that implementation, it replaces all 
vertices in the query result with a structure that contains both the vertex id 
and the list of outgoing edges for the vertex.  Previous to that, only the id 
was coming back, and we had to make additional REST api calls to get the other 
information about the vertices.  For titan0/titan1, 
getOutputTransformationPredicate() returns an empty String.


- Jeff


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47810/#review150230
---


On Sept. 23, 2016, 3:29 a.m., Jeff Hagelberg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47810/
> ---
> 
> (Updated Sept. 23, 2016, 3:29 a.m.)
> 
> 
> Review request for atlas, David Kantor and Neeru Gupta.
> 
> 
> Bugs: ATLAS-694
> https://issues.apache.org/jira/browse/ATLAS-694
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> ATLAS-694: Update Atlas to use abstraction layer.  All of the Atlas code 
> (with the exception of the catalog, which was only updated minimally) has 
> been updated to use the graph database abstraction layer.  In addition, there 
> is now an optional Atlas configuration property that specifies the class of 
> the abstraction layer database to use.  I basically put all of the changes in 
> here with the exception of the actual Titan 1 implementation of code.  This 
> includes the changes to support Tinkerpop 3 syntax.  This is mostly to 
> expedite getting the changes into Atlas.  Originally the TP3 changes were 
> going to be brought in as part of the Titan 1 implementation task.
> 
> Change Summary:
> 
>- change Atlas classes to use AtlasGraph,AtlasVertex,AtlasEdge, etc 
> instead of TitanGraph/Vertex/Edge, etc
>- compile time dependency on titan 0.5.4/TP 2 removed (except in Catalog, 
> which was only changed to use AtlasGraphProvider/AtlasGraph) - see 
> repository\pom.xml, other pom.xmls
>- updated DSL translation to generate Gremlin that is compliant with TP3 
> when TP3 is being used.  See GremlinQuery.scala, 
> GraphPersistenceStrategies.scala
>- TitanGraphProvider replaced by AtlasGraphProvider.  Graph database 
> implementation is determined from a new optional configuration property
>- HiveTitanSample is no longer used by tests.  It has been replaced by 
> hive-instances.json (which uses normal Atlas json syntax).  The data is saved 
> with a new JSONImporter class.  This was needed because the graphson syntax 
> used by HiveTitanSample is not compatible with TP3.  
> 
> Last rebase: 9/22/2016
> 
> 
> Diffs
> -
> 
>   .gitignore e10adbc4457f6297600f0feb01eb54718b8ec406 
>   addons/falcon-bridge/pom.xml 1365bd05a388dc92f7a56c7f7427b5b85f97c7da 
>   addons/hdfs-model/pom.xml 492f39cea085c6e69781e17bcbdbc3a231806df3 
>   
> addons/hdfs-model/src/test/java/org/apache/atlas/fs/model/HDFSModelTest.java 
> ac60294e328835ba0340e150799ddfb348ccdb52 
>   addons/hive-bridge/pom.xml 6993bdb938a6095ca24482e290393eeeb3911bcb 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
>  ad7a4a5d09d8542a841701dfe04981f65f767c14 
>   addons/sqoop-bridge/pom.xml 8c9d278d43b5979ea1743d10845905c13249f8a6 
>   addons/storm-bridge/pom.xml 12c1208b448d456a923bd7309601174ddb561ba5 
>   catalog/pom.xml 2f58a8f0748de65ab78eab35df6abd2fe7c336af 
>   catalog/src/main/java/org/apache/atlas/catalog/query/BaseQuery.java 
> e7bb505075983371ca12d9bc1d8c6eb240c3d134 
>   distro/src/conf/atlas-application.properties 
> d334600dc5534840409b586157799ef3abf9abf2 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasEdge.java 
> dd4b7e614cdd9bf30f957fb6a839d8c60f3e1701 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasElement.java
>  1bc0fc38c0802897f32260520770a16795474d04 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraph.java 
> 995c5457ac7f807172f367cc8e3348b3a98dd6f3 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphIndex.java
>  41194d34f079842db0d95c73a8b099459f76ff2f 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphManagement.java
>  c8cd2842ca3090b6bbd384c773b4eb45aff149ce 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphQuery.java
>  

[jira] [Created] (ATLAS-1197) When Atlas is configured with atlas.notification.hook.numthreads > 1, CTAS tables are created twice .

2016-09-26 Thread Sharmadha Sainath (JIRA)
Sharmadha Sainath created ATLAS-1197:


 Summary: When Atlas is configured with 
atlas.notification.hook.numthreads > 1, CTAS tables are created twice .
 Key: ATLAS-1197
 URL: https://issues.apache.org/jira/browse/ATLAS-1197
 Project: Atlas
  Issue Type: Bug
Reporter: Sharmadha Sainath


When running CTAS queries by setting atlas.notification.hook.numthreads >1 like 
2 and 5, CTAS tables are created twice in Atlas (Not in hive). Both the tables 
are ACTIVE .When setting the same value to 1, it is created only once. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 47810: ATLAS-694: Update Atlas to use Graph DB abstraction layer

2016-09-26 Thread Jeff Hagelberg


> On Sept. 23, 2016, 7:58 p.m., Suma Shivaprasad wrote:
> > repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala, line 
> > 341
> > 
> >
> > Can you pls raise a jira for this?

I have created ATLAS-1196 for this.


- Jeff


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47810/#review150229
---


On Sept. 23, 2016, 3:29 a.m., Jeff Hagelberg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47810/
> ---
> 
> (Updated Sept. 23, 2016, 3:29 a.m.)
> 
> 
> Review request for atlas, David Kantor and Neeru Gupta.
> 
> 
> Bugs: ATLAS-694
> https://issues.apache.org/jira/browse/ATLAS-694
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> ATLAS-694: Update Atlas to use abstraction layer.  All of the Atlas code 
> (with the exception of the catalog, which was only updated minimally) has 
> been updated to use the graph database abstraction layer.  In addition, there 
> is now an optional Atlas configuration property that specifies the class of 
> the abstraction layer database to use.  I basically put all of the changes in 
> here with the exception of the actual Titan 1 implementation of code.  This 
> includes the changes to support Tinkerpop 3 syntax.  This is mostly to 
> expedite getting the changes into Atlas.  Originally the TP3 changes were 
> going to be brought in as part of the Titan 1 implementation task.
> 
> Change Summary:
> 
>- change Atlas classes to use AtlasGraph,AtlasVertex,AtlasEdge, etc 
> instead of TitanGraph/Vertex/Edge, etc
>- compile time dependency on titan 0.5.4/TP 2 removed (except in Catalog, 
> which was only changed to use AtlasGraphProvider/AtlasGraph) - see 
> repository\pom.xml, other pom.xmls
>- updated DSL translation to generate Gremlin that is compliant with TP3 
> when TP3 is being used.  See GremlinQuery.scala, 
> GraphPersistenceStrategies.scala
>- TitanGraphProvider replaced by AtlasGraphProvider.  Graph database 
> implementation is determined from a new optional configuration property
>- HiveTitanSample is no longer used by tests.  It has been replaced by 
> hive-instances.json (which uses normal Atlas json syntax).  The data is saved 
> with a new JSONImporter class.  This was needed because the graphson syntax 
> used by HiveTitanSample is not compatible with TP3.  
> 
> Last rebase: 9/22/2016
> 
> 
> Diffs
> -
> 
>   .gitignore e10adbc4457f6297600f0feb01eb54718b8ec406 
>   addons/falcon-bridge/pom.xml 1365bd05a388dc92f7a56c7f7427b5b85f97c7da 
>   addons/hdfs-model/pom.xml 492f39cea085c6e69781e17bcbdbc3a231806df3 
>   
> addons/hdfs-model/src/test/java/org/apache/atlas/fs/model/HDFSModelTest.java 
> ac60294e328835ba0340e150799ddfb348ccdb52 
>   addons/hive-bridge/pom.xml 6993bdb938a6095ca24482e290393eeeb3911bcb 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
>  ad7a4a5d09d8542a841701dfe04981f65f767c14 
>   addons/sqoop-bridge/pom.xml 8c9d278d43b5979ea1743d10845905c13249f8a6 
>   addons/storm-bridge/pom.xml 12c1208b448d456a923bd7309601174ddb561ba5 
>   catalog/pom.xml 2f58a8f0748de65ab78eab35df6abd2fe7c336af 
>   catalog/src/main/java/org/apache/atlas/catalog/query/BaseQuery.java 
> e7bb505075983371ca12d9bc1d8c6eb240c3d134 
>   distro/src/conf/atlas-application.properties 
> d334600dc5534840409b586157799ef3abf9abf2 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasEdge.java 
> dd4b7e614cdd9bf30f957fb6a839d8c60f3e1701 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasElement.java
>  1bc0fc38c0802897f32260520770a16795474d04 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraph.java 
> 995c5457ac7f807172f367cc8e3348b3a98dd6f3 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphIndex.java
>  41194d34f079842db0d95c73a8b099459f76ff2f 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphManagement.java
>  c8cd2842ca3090b6bbd384c773b4eb45aff149ce 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphQuery.java
>  93447495bcf18e9f19df9df68fd1cbe1427fc462 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasIndexQuery.java
>  e719d306ffe9f68e3ac6f7406baaf60a12390c34 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasPropertyKey.java
>  315ecddb861e1a1be6e0ab9b36fe4c0a52486ae8 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java
>  fff6fb79247e7d0615ce83c4cbbd93d1bf8cf29c 
>   
> 

[jira] [Created] (ATLAS-1196) Support circular lineage with Gremlin 3

2016-09-26 Thread Jeffrey Hagelberg (JIRA)
Jeffrey Hagelberg created ATLAS-1196:


 Summary: Support circular lineage with Gremlin 3
 Key: ATLAS-1196
 URL: https://issues.apache.org/jira/browse/ATLAS-1196
 Project: Atlas
  Issue Type: Sub-task
Reporter: Jeffrey Hagelberg


Circular lineage support is something that has been added to Atlas since the 
gremlin 3 translation logic was put in.  We need to update the DSL translator 
to support this with Gremlin 3 as well.  See the logic for translating 
LoopExpression in GreminQuery.scala.  In Gremlin 2, the logic was changed so 
that if "times" is not specified, the loop exits if the path already contains 
the object being processed.  Something similar needs to be added to the Gremlin 
3 logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 52077: Column level lineage in Hive

2016-09-26 Thread Vimal Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/
---

(Updated Sept. 26, 2016, 1:06 p.m.)


Review request for atlas.


Changes
---

Addressed Shwetha's review comments. I think it would make sense to address 
Type update changes in ATLAS-1184. Marked ATLAS-1184 as required for this patch.


Bugs: ATLAS-1184 and ATLAS-247
https://issues.apache.org/jira/browse/ATLAS-1184
https://issues.apache.org/jira/browse/ATLAS-247


Repository: atlas


Description
---

After a CTAS query, lineage relationship between source columns and destination 
column can be captured. This information can be used to create a column lineage 
process.


Diffs (updated)
-

  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java
 PRE-CREATION 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
a3464a0 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java
 45f0bc9 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java 
e094cb6 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
a5838b4 

Diff: https://reviews.apache.org/r/52077/diff/


Testing
---


Thanks,

Vimal Sharma



Re: Rename trait to classification

2016-09-26 Thread David Radley
Hi Hermanth, 
I appreciate your feedback and openness. It was an interesting point you 
made about which roles were authoring traits and terms. I guess this is 
not something Atlas would police.

The current traits could be :
1) locked down so only the governance team could update them; in that case 
they would be classifications that governance rules could act on. 
 or
2) Not locked down so a wider audience (business personas) could create 
them.

I am suggesting: 
- renaming traits to classifications for use by the governance team.
- using terms as glossary terms for use more widely by business users. 

Does this work - or am I missing something ? 
  all the best, David. 
 





From:   Hemanth Yamijala 
To: dev@atlas.incubator.apache.org
Date:   26/09/2016 13:09
Subject:Re: Rename trait to classification



Hi David,

Reg. the point I made about sharing traits - I don't want to give an
impression that this as an agreed upon point. Apologize if I conveyed
that sense.

It is a fact that Atlas today has two concepts that are slightly
related: Traits (aka Tags) and Business Terms. The latter was new in
0.7. IMO, it is important that the Atlas community tries to converge
on an unambiguous definition of these concepts as the product would be
driven around these.

With respect to this thread, I am trying to fit in whether
"classification" is a new concept. Or it overlaps with one of the two
existing ones (which we are trying to rename).

I am certainly not a domain expert on this in any sense :-) - so
hoping that others who are would provide guidance (@aahn - ping?).

Thanks
hemanth

On Mon, Sep 26, 2016 at 2:59 PM, David Radley  
wrote:
> Hi Hermanth and Mandy ,
> Thanks for your feedback.
>
> It does seem like these are de-facto industry terms in the governance
> industry; the reason I say this is that looking around the web I see 
quite
> a few uses of the words governance classification in different domains
> (including in the Atlas documentation!).
>
> I was not aware of the idea that traits and terms would be authored by
> different roles - thanks for your explanation. What is coming up for me 
is
> :
>
> I think business users should be able to add new business terms (maybe
> going through a workflow and a governance curator then sorting out
> inconsistencies), as they are the most expert as the language they use.
> Classifications could be authored by different teams, for example levels
> of confidentiality (in Mandy's example) would be dictated by the
> governance team. Governance rules would run on these classifications.
>
> You say "So, it is hard to use traits in a shared sense or expect to 
have
> conventional usage" . I notice the Atlas tutorial did not give me this
> impression, as the example of a trait/tag is PII.
> Your description of traits implies they are more like free form labels .
> If this is the intent for traits, then it does not make sense to rename
> them to classification. Maybe traits should be called labels; so their
> name is more in line with their expected usage. Though we should change
> the tutorial!
>
> A business term is a type of classification -a semantic classification. 
We
> could add in the concept of classification which Business term and
> Business category  (Jira 1186 ) inherit from. This would allow us to add
> in confidential classifications and classifications schemes to organize.
>
> I look forwards to your thoughts,
>   all the best, David.
>
>
>
>
> From:   Hemanth Yamijala 
> To: David Radley 
> Date:   26/09/2016 05:33
> Subject:Re: Rename trait to classification
>
>
>
> Hi,
>
> Are these de-facto industry terms in the governance industry? If yes,
> would they make more sense to explore as part of the Business Taxonomy
> feature that's currently in alpha in 0.7, rather than the trait system?
>
> One differentiation we've been trying to express is that traits (also
> referred to as tags in some places in Atlas) are free form and left to 
the
> user using them. So, it is hard to use traits in a shared sense or 
expect
> to have conventional usage. So, traits would probably be a tool for a 
data
> scientist to quickly annotate something for their own discovery usage
> later.
>
> Business taxonomy, on the other hand, is something we are thinking as 
used
> to express standard classification, even if only within an organization,
> but maybe even across industry domains etc. They would likely be created
> by data stewards with knowledge of the domain and their usage would 
follow
> established practices (authorization controlling who can do what).
>
> Not sure if what we're referring to as "classification" here fits the
> "traits" or "business taxonomy" side more - trying to understand...
>
> Thanks
> hemanth
> 
> From: Mandy Chessell 
> Sent: Sunday, September 

[jira] [Comment Edited] (ATLAS-247) Hive Column level lineage

2016-09-26 Thread Vimal Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522941#comment-15522941
 ] 

Vimal Sharma edited comment on ATLAS-247 at 9/26/16 12:43 PM:
--

Updation of Hive Type system should be supported before this patch can be 
committed. Marking the dependency from ATLAS-1184. Thanks [~shwethags] for 
pointing this out.


was (Author: svimal2106):
Updation of Hive Type system should be supported before this patch can be 
committed. Marking the dependency from ATLAS-1184

> Hive Column level lineage
> -
>
> Key: ATLAS-247
> URL: https://issues.apache.org/jira/browse/ATLAS-247
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.5-incubating
>Reporter: Herman Yu
>Assignee: Harish Butani
> Fix For: 0.8-incubating
>
> Attachments: ATLAS-247-v4.patch, ATLAS-247-v5.patch, 
> ATLAS-247.2.patch, ATLAS-247.patch
>
>
> hive_column is not inherited from DataSet, thus can't be using hive_process 
> to track column level lineages
> Is there specific reason that hive_column is not inheriting from Data Set? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-247) Hive Column level lineage

2016-09-26 Thread Vimal Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522941#comment-15522941
 ] 

Vimal Sharma commented on ATLAS-247:


Updation of Hive Type system should be supported before this patch can be 
committed. Marking the dependency from ATLAS-1184

> Hive Column level lineage
> -
>
> Key: ATLAS-247
> URL: https://issues.apache.org/jira/browse/ATLAS-247
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.5-incubating
>Reporter: Herman Yu
>Assignee: Harish Butani
> Fix For: 0.8-incubating
>
> Attachments: ATLAS-247-v4.patch, ATLAS-247-v5.patch, 
> ATLAS-247.2.patch, ATLAS-247.patch
>
>
> hive_column is not inherited from DataSet, thus can't be using hive_process 
> to track column level lineages
> Is there specific reason that hive_column is not inheriting from Data Set? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Rename trait to classification

2016-09-26 Thread Mandy Chessell
Hello Hemanth, David,
This is a great discussion.   These concepts are all related, in that they 
are linked to data descriptions (such as schemas) to characterise data. 
However, I think your probing is right, the governance classifications are 
slightly different from traits and glossary terms. 

Glossary terms are focused on the meaning of the data.  They follow the 
structure of the subject area, and link related terms together to show 
potential object, attributes, relationships that are typically found 
together.   Traits seem to offer an more informal means to characterise 
data.   These seem useful for characterising data specific for particular 
projects, or areas of special interest to the data lake team.

The governance classifications are a formal definition.  They are often 
defined as company-wide values that most employees are trained on.  So a 
deployment of Atlas in a new organization could well involve adding their 
existing classification schemes to the Atlas repository.   The values I 
shared in the earlier email are those we suggest for organizations that do 
not currently have any information governance. 

The values in each classification scheme are kept small (to keep them 
memorable) and then the governance program is built around them.  So, for 
example, each system has a set of rules for how it manages data for each 
of the classification values.   When new systems are brought in, new rules 
may be defined, but the employees still only have to know the standard 
classification schemes. 

As we continue to enhance the work of the governance enforcement, these 
classifications will be the key values encoded in the rules.  For a 
sophisticated organization with a company-wide data strategy, the 
classifications are often linked to the glossary terms and the glossary 
terms are linked to the data schemas.  This means the same classifications 
(and hence rules) are applied to the same type of data irrespective of the 
system it came from.  Alternatively, where system owners want to control 
how the data from their systems are classified, the governance 
classifications are linked directly to the schemas and so there may be 
variation in the way a certain type of data (eg credit card numbers) are 
governed.

In either case, the classifications need to be determined where data is 
accessed and so we need a fast look-up mechanism for these values.

All the best
Mandy
___
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
IBM Analytics Group CTO Office

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: mandy_chess...@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - jsbrook...@uk.ibm.com



From: 
To: dev@atlas.incubator.apache.org
Date:   26/09/2016 08:09
Subject:Re: Rename trait to classification



Hi David,

Reg. the point I made about sharing traits - I don't want to give an
impression that this as an agreed upon point. Apologize if I conveyed
that sense.

It is a fact that Atlas today has two concepts that are slightly
related: Traits (aka Tags) and Business Terms. The latter was new in
0.7. IMO, it is important that the Atlas community tries to converge
on an unambiguous definition of these concepts as the product would be
driven around these.

With respect to this thread, I am trying to fit in whether
"classification" is a new concept. Or it overlaps with one of the two
existing ones (which we are trying to rename).

I am certainly not a domain expert on this in any sense :-) - so
hoping that others who are would provide guidance (@aahn - ping?).

Thanks
hemanth

On Mon, Sep 26, 2016 at 2:59 PM, David Radley  
wrote:
> Hi Hermanth and Mandy ,
> Thanks for your feedback.
>
> It does seem like these are de-facto industry terms in the governance
> industry; the reason I say this is that looking around the web I see 
quite
> a few uses of the words governance classification in different domains
> (including in the Atlas documentation!).
>
> I was not aware of the idea that traits and terms would be authored by
> different roles - thanks for your explanation. What is coming up for me 
is
> :
>
> I think business users should be able to add new business terms (maybe
> going through a workflow and a governance curator then sorting out
> inconsistencies), as they are the most expert as the language they use.
> Classifications could be authored by different teams, for example levels
> of confidentiality (in Mandy's example) would be dictated by the
> governance team. Governance rules would run on these classifications.
>
> You say "So, it is hard to use traits in a shared sense or expect to 
have
> conventional usage" . I notice the Atlas tutorial did not give me this
> impression, as the example of a trait/tag is PII.
> Your description of traits implies they 

Re: Rename trait to classification

2016-09-26 Thread Hemanth Yamijala
Hi David,

Reg. the point I made about sharing traits - I don't want to give an
impression that this as an agreed upon point. Apologize if I conveyed
that sense.

It is a fact that Atlas today has two concepts that are slightly
related: Traits (aka Tags) and Business Terms. The latter was new in
0.7. IMO, it is important that the Atlas community tries to converge
on an unambiguous definition of these concepts as the product would be
driven around these.

With respect to this thread, I am trying to fit in whether
"classification" is a new concept. Or it overlaps with one of the two
existing ones (which we are trying to rename).

I am certainly not a domain expert on this in any sense :-) - so
hoping that others who are would provide guidance (@aahn - ping?).

Thanks
hemanth

On Mon, Sep 26, 2016 at 2:59 PM, David Radley  wrote:
> Hi Hermanth and Mandy ,
> Thanks for your feedback.
>
> It does seem like these are de-facto industry terms in the governance
> industry; the reason I say this is that looking around the web I see quite
> a few uses of the words governance classification in different domains
> (including in the Atlas documentation!).
>
> I was not aware of the idea that traits and terms would be authored by
> different roles - thanks for your explanation. What is coming up for me is
> :
>
> I think business users should be able to add new business terms (maybe
> going through a workflow and a governance curator then sorting out
> inconsistencies), as they are the most expert as the language they use.
> Classifications could be authored by different teams, for example levels
> of confidentiality (in Mandy's example) would be dictated by the
> governance team. Governance rules would run on these classifications.
>
> You say "So, it is hard to use traits in a shared sense or expect to have
> conventional usage" . I notice the Atlas tutorial did not give me this
> impression, as the example of a trait/tag is PII.
> Your description of traits implies they are more like free form labels .
> If this is the intent for traits, then it does not make sense to rename
> them to classification. Maybe traits should be called labels; so their
> name is more in line with their expected usage. Though we should change
> the tutorial!
>
> A business term is a type of classification -a semantic classification. We
> could add in the concept of classification which Business term and
> Business category  (Jira 1186 ) inherit from. This would allow us to add
> in confidential classifications and classifications schemes to organize.
>
> I look forwards to your thoughts,
>   all the best, David.
>
>
>
>
> From:   Hemanth Yamijala 
> To: David Radley 
> Date:   26/09/2016 05:33
> Subject:Re: Rename trait to classification
>
>
>
> Hi,
>
> Are these de-facto industry terms in the governance industry? If yes,
> would they make more sense to explore as part of the Business Taxonomy
> feature that's currently in alpha in 0.7, rather than the trait system?
>
> One differentiation we've been trying to express is that traits (also
> referred to as tags in some places in Atlas) are free form and left to the
> user using them. So, it is hard to use traits in a shared sense or expect
> to have conventional usage. So, traits would probably be a tool for a data
> scientist to quickly annotate something for their own discovery usage
> later.
>
> Business taxonomy, on the other hand, is something we are thinking as used
> to express standard classification, even if only within an organization,
> but maybe even across industry domains etc. They would likely be created
> by data stewards with knowledge of the domain and their usage would follow
> established practices (authorization controlling who can do what).
>
> Not sure if what we're referring to as "classification" here fits the
> "traits" or "business taxonomy" side more - trying to understand...
>
> Thanks
> hemanth
> 
> From: Mandy Chessell 
> Sent: Sunday, September 25, 2016 9:56 PM
> To: David Radley
> Cc: dev@atlas.incubator.apache.org
> Subject: Re: Rename trait to classification
>
> Hello David,
> I also like the idea of using the term classification.
> Typically classifications in governance are ordered sets of values grouped
> into a classification scheme.  Is the notion of the classification scheme
> also part of the change you are thinking of?
>
> For example, the classification scheme and "unclassified" value which is
> the default classification for any data element that has no classification
> from this scheme associated with it.  The other values are defined in
> increasing levels of sensitivity.  There are also sub-classifications.  So
> for example, confidential has sub-classifications of Business
> Confidential, Partner Confidential and Personal Confidential.  If a rule
> is defined for "confidential", 

Re: Review Request 52077: Column level lineage in Hive

2016-09-26 Thread Shwetha GS

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review150383
---



1. Can you add a test with lineage query on column?
2. ReservedTypesRegistrar should change to updateTypes, so that upgrades work
3. Once you make sure the tests work, disable the tests so that tests don't 
break with apache hive 1.2


addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java
 (line 334)


rename to query/command as this class type is also process


- Shwetha GS


On Sept. 22, 2016, 11:53 a.m., Vimal Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> ---
> 
> (Updated Sept. 22, 2016, 11:53 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
> https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> After a CTAS query, lineage relationship between source columns and 
> destination column can be captured. This information can be used to create a 
> column lineage process.
> 
> 
> Diffs
> -
> 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java
>  PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
> a3464a0 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java
>  45f0bc9 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java
>  e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
> a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>



Fw: Rename trait to classification

2016-09-26 Thread David Radley
Hi,
 I referred to Business term and Business category - business is a bit 
generic and not well defined. It would be more accurate to say Glossary 
term and Glossary category,
all the best, David. 


- Forwarded by David Radley/UK/IBM on 26/09/2016 11:13 -

From:   David Radley/UK/IBM
To: Hemanth Yamijala , Mandy 
Chessell/UK/IBM@IBMGB
Cc: dev@atlas.incubator.apache.org
Date:   26/09/2016 10:29
Subject:Re: Rename trait to classification


Hi Hermanth and Mandy ,
Thanks for your feedback.

It does seem like these are de-facto industry terms in the governance 
industry; the reason I say this is that looking around the web I see quite 
a few uses of the words governance classification in different domains 
(including in the Atlas documentation!).

I was not aware of the idea that traits and terms would be authored by 
different roles - thanks for your explanation. What is coming up for me is 
:

I think business users should be able to add new business terms (maybe 
going through a workflow and a governance curator then sorting out 
inconsistencies), as they are the most expert as the language they use. 
Classifications could be authored by different teams, for example levels 
of confidentiality (in Mandy's example) would be dictated by the 
governance team. Governance rules would run on these classifications.

You say "So, it is hard to use traits in a shared sense or expect to have 
conventional usage" . I notice the Atlas tutorial did not give me this 
impression, as the example of a trait/tag is PII. 
Your description of traits implies they are more like free form labels . 
If this is the intent for traits, then it does not make sense to rename 
them to classification. Maybe traits should be called labels; so their 
name is more in line with their expected usage. Though we should change 
the tutorial!

A business term is a type of classification -a semantic classification. We 
could add in the concept of classification which Business term and 
Business category  (Jira 1186 ) inherit from. This would allow us to add 
in confidential classifications and classifications schemes to organize.

I look forwards to your thoughts, 
  all the best, David. 




From:   Hemanth Yamijala 
To: David Radley 
Date:   26/09/2016 05:33
Subject:Re: Rename trait to classification



Hi,

Are these de-facto industry terms in the governance industry? If yes, 
would they make more sense to explore as part of the Business Taxonomy 
feature that's currently in alpha in 0.7, rather than the trait system? 

One differentiation we've been trying to express is that traits (also 
referred to as tags in some places in Atlas) are free form and left to the 
user using them. So, it is hard to use traits in a shared sense or expect 
to have conventional usage. So, traits would probably be a tool for a data 
scientist to quickly annotate something for their own discovery usage 
later.

Business taxonomy, on the other hand, is something we are thinking as used 
to express standard classification, even if only within an organization, 
but maybe even across industry domains etc. They would likely be created 
by data stewards with knowledge of the domain and their usage would follow 
established practices (authorization controlling who can do what).

Not sure if what we're referring to as "classification" here fits the 
"traits" or "business taxonomy" side more - trying to understand...

Thanks
hemanth

From: Mandy Chessell 
Sent: Sunday, September 25, 2016 9:56 PM
To: David Radley
Cc: dev@atlas.incubator.apache.org
Subject: Re: Rename trait to classification

Hello David,
I also like the idea of using the term classification.
Typically classifications in governance are ordered sets of values grouped
into a classification scheme.  Is the notion of the classification scheme
also part of the change you are thinking of?

For example, the classification scheme and "unclassified" value which is
the default classification for any data element that has no classification
from this scheme associated with it.  The other values are defined in
increasing levels of sensitivity.  There are also sub-classifications.  So
for example, confidential has sub-classifications of Business
Confidential, Partner Confidential and Personal Confidential.  If a rule
is defined for "confidential", it applies to all three of the
sub-classifications.

§Confidentiality Classification Scheme
§Confidentiality is used to classify the impact of disclosing information
to unauthorized individuals
•Unclassified
•Internal Use
•Confidential
•Business Confidential.
•Partner Confidential.
•Personal Information.
•Sensitive
•Sensitive Personal
•Sensitive Financial
•Sensitive Operational
•Restricted
•Restricted Financial
•Restricted Operational
•Trade Secret


The classification schemes 

Review Request 52257: Return system attributes in get entity definition

2016-09-26 Thread Vimal Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52257/
---

Review request for atlas.


Bugs: ATLAS-916
https://issues.apache.org/jira/browse/ATLAS-916


Repository: atlas


Description
---

Atlas should maintain system attributes created time, last modified time, 
created by user and last modified by user for every entity. This information 
should be returned in get entity definition


Diffs
-

  common/src/main/java/org/apache/atlas/repository/Constants.java d7f9c89 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
1ce87c9 
  
repository/src/main/java/org/apache/atlas/repository/graph/GraphToTypedInstanceMapper.java
 5c7cb2e 
  
repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java
 2e0414e 
  
repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepositoryTest.java
 2541541 
  typesystem/src/main/java/org/apache/atlas/typesystem/persistence/Id.java 
42280d0 
  
typesystem/src/main/scala/org/apache/atlas/typesystem/json/InstanceSerialization.scala
 73b3526 
  
typesystem/src/main/scala/org/apache/atlas/typesystem/json/Serialization.scala 
68c47ec 
  
typesystem/src/test/scala/org/apache/atlas/typesystem/json/InstanceSerializationTest.scala
 9e656a5 

Diff: https://reviews.apache.org/r/52257/diff/


Testing
---

Testing in progress


Thanks,

Vimal Sharma



[jira] [Updated] (ATLAS-916) Return system attributes in get entity definition

2016-09-26 Thread Vimal Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vimal Sharma updated ATLAS-916:
---
Attachment: ATLAS-916.patch

> Return system attributes in get entity definition
> -
>
> Key: ATLAS-916
> URL: https://issues.apache.org/jira/browse/ATLAS-916
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Shwetha G S
>Assignee: Vimal Sharma
> Attachments: ATLAS-916.patch
>
>
> Atlas should maintain system attributes created time, last modified time, 
> created by user and last modified by user for every entity. This information 
> should be returned in get entity definition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Rename trait to classification

2016-09-26 Thread David Radley
Hi Hermanth and Mandy ,
Thanks for your feedback.

It does seem like these are de-facto industry terms in the governance 
industry; the reason I say this is that looking around the web I see quite 
a few uses of the words governance classification in different domains 
(including in the Atlas documentation!).

I was not aware of the idea that traits and terms would be authored by 
different roles - thanks for your explanation. What is coming up for me is 
:

I think business users should be able to add new business terms (maybe 
going through a workflow and a governance curator then sorting out 
inconsistencies), as they are the most expert as the language they use. 
Classifications could be authored by different teams, for example levels 
of confidentiality (in Mandy's example) would be dictated by the 
governance team. Governance rules would run on these classifications.

You say "So, it is hard to use traits in a shared sense or expect to have 
conventional usage" . I notice the Atlas tutorial did not give me this 
impression, as the example of a trait/tag is PII. 
Your description of traits implies they are more like free form labels . 
If this is the intent for traits, then it does not make sense to rename 
them to classification. Maybe traits should be called labels; so their 
name is more in line with their expected usage. Though we should change 
the tutorial!

A business term is a type of classification -a semantic classification. We 
could add in the concept of classification which Business term and 
Business category  (Jira 1186 ) inherit from. This would allow us to add 
in confidential classifications and classifications schemes to organize.

I look forwards to your thoughts, 
  all the best, David. 




From:   Hemanth Yamijala 
To: David Radley 
Date:   26/09/2016 05:33
Subject:Re: Rename trait to classification



Hi,

Are these de-facto industry terms in the governance industry? If yes, 
would they make more sense to explore as part of the Business Taxonomy 
feature that's currently in alpha in 0.7, rather than the trait system? 

One differentiation we've been trying to express is that traits (also 
referred to as tags in some places in Atlas) are free form and left to the 
user using them. So, it is hard to use traits in a shared sense or expect 
to have conventional usage. So, traits would probably be a tool for a data 
scientist to quickly annotate something for their own discovery usage 
later.

Business taxonomy, on the other hand, is something we are thinking as used 
to express standard classification, even if only within an organization, 
but maybe even across industry domains etc. They would likely be created 
by data stewards with knowledge of the domain and their usage would follow 
established practices (authorization controlling who can do what).

Not sure if what we're referring to as "classification" here fits the 
"traits" or "business taxonomy" side more - trying to understand...

Thanks
hemanth

From: Mandy Chessell 
Sent: Sunday, September 25, 2016 9:56 PM
To: David Radley
Cc: dev@atlas.incubator.apache.org
Subject: Re: Rename trait to classification

Hello David,
I also like the idea of using the term classification.
Typically classifications in governance are ordered sets of values grouped
into a classification scheme.  Is the notion of the classification scheme
also part of the change you are thinking of?

For example, the classification scheme and "unclassified" value which is
the default classification for any data element that has no classification
from this scheme associated with it.  The other values are defined in
increasing levels of sensitivity.  There are also sub-classifications.  So
for example, confidential has sub-classifications of Business
Confidential, Partner Confidential and Personal Confidential.  If a rule
is defined for "confidential", it applies to all three of the
sub-classifications.

§Confidentiality Classification Scheme
§Confidentiality is used to classify the impact of disclosing information
to unauthorized individuals
•Unclassified
•Internal Use
•Confidential
•Business Confidential.
•Partner Confidential.
•Personal Information.
•Sensitive
•Sensitive Personal
•Sensitive Financial
•Sensitive Operational
•Restricted
•Restricted Financial
•Restricted Operational
•Trade Secret


The classification schemes create a graduated view of how sensitive data
is.  We would also expect to see classification schemes for other aspects
of governance such as retention, confidence (quality) and criticality.


All the best
Mandy
___
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
IBM Analytics Group CTO Office

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield

Email: 

Re: Review Request 51896: ATLAS-1171: structured, high-level APIs

2016-09-26 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51896/
---

(Updated Sept. 26, 2016, 9:21 a.m.)


Review request for atlas.


Changes
---

- added method AtlasDataType.validateValue()
- added unit bunch of unit tests


Bugs: ATLAS-1171
https://issues.apache.org/jira/browse/ATLAS-1171


Repository: atlas


Description
---

first-cut API for review


Diffs (updated)
-

  common/pom.xml e3b6465 
  common/src/main/java/org/apache/atlas/api/AtlasApiEntities.java PRE-CREATION 
  common/src/main/java/org/apache/atlas/api/AtlasApiTypes.java PRE-CREATION 
  common/src/main/java/org/apache/atlas/api/PList.java PRE-CREATION 
  common/src/main/java/org/apache/atlas/api/SearchFilter.java PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/instance/AtlasAsset.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/instance/AtlasClassification.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/instance/AtlasEntity.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/instance/AtlasObjectId.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/instance/AtlasProcess.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/instance/AtlasStruct.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/typedef/AtlasBaseTypeDef.java 
PRE-CREATION 
  
common/src/main/java/org/apache/atlas/model/typedef/AtlasClassificationDef.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/typedef/AtlasEntityDef.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/typedef/AtlasEnumDef.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/model/typedef/AtlasStructDef.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasArrayType.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasBuiltInTypes.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasClassificationType.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasDataType.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasEntityType.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasEnumType.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasMapType.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasStructType.java 
PRE-CREATION 
  common/src/main/java/org/apache/atlas/typesystem/AtlasTypeRegistry.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/model/ModelTestUtil.java PRE-CREATION 
  
common/src/test/java/org/apache/atlas/model/instance/TestAtlasClassification.java
 PRE-CREATION 
  common/src/test/java/org/apache/atlas/model/instance/TestAtlasEntity.java 
PRE-CREATION 
  
common/src/test/java/org/apache/atlas/model/typedef/TestAtlasClassificationDef.java
 PRE-CREATION 
  common/src/test/java/org/apache/atlas/model/typedef/TestAtlasEntityDef.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/model/typedef/TestAtlasEnumDef.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/model/typedef/TestAtlasStructDef.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasArrayType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasBigDecimalType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasBigIntegerType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasBooleanType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasByteType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasDateType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasDoubleType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasFloatType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasIntType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasLongType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasObjectIdType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasShortType.java 
PRE-CREATION 
  common/src/test/java/org/apache/atlas/typesystem/TestAtlasStringType.java 
PRE-CREATION 
  pom.xml ac5b042 
  webapp/src/main/java/org/apache/atlas/web/rest/TypesREST.java PRE-CREATION 

Diff: https://reviews.apache.org/r/51896/diff/


Testing
---


Thanks,

Madhan Neethiraj



Re: Review Request 52077: Column level lineage in Hive

2016-09-26 Thread Vimal Sharma


> On Sept. 21, 2016, 8:58 p.m., Suma Shivaprasad wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java,
> >  line 97
> > 
> >
> > why is column qualifiedName different from the convention we are using 
> > for hive_column instances which are referred to from the table. Why is 
> > clustername removed?
> 
> Vimal Sharma wrote:
> Cluster information is not available in Lineage information provided by 
> Hive. Further, qualifiedName used in this patch is used only while setting 
> column lineage and is not used for communication with rest of Atlas codebase.
> 
> Suma Shivaprasad wrote:
> If we do not provide the same qualifiedName as in the current 
> HMSB.getColumnQualifiedName() , it will result in a another entity being 
> created for the columns. Cluster information is available in 
> HMSB.getClusterName()

In the function populateColumnReferenceableMap, we are setting a mapping from 
column string identifier(named as column qualified name) to its corresponding 
column Referenceable object in Atlas. No new column Referenceable entity is 
created. 

Further, in buildLineageMap, we are setting a mapping from destination column 
qualified name to list of source column qualified names. Now, in the key value 
pairs of the type (LineageInfo.DependencyKey, LineageInfo.Dependency) in 
LineageInfo from Hive, there is no cluster information available. So here we 
can't use the same pattern for column qualified name as used in 
HMSB.getColumnQualifiedName.

If we set column string identifier as HMSB.getColumnQualifiedName in function 
populateColumnReferenceableMap, we won't be able to access the column 
referenceable objects from the map(created in populateColumnReferenceableMap) 
in HiveHook when we are setting up column lineage process in function 
createColumnLineageProcessInstances(lines 803 and 812).


- Vimal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149896
---


On Sept. 22, 2016, 11:53 a.m., Vimal Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> ---
> 
> (Updated Sept. 22, 2016, 11:53 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
> https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> After a CTAS query, lineage relationship between source columns and 
> destination column can be captured. This information can be used to create a 
> column lineage process.
> 
> 
> Diffs
> -
> 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java
>  PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
> a3464a0 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java
>  45f0bc9 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java
>  e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
> a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>



Re: Review Request 51939: Framework to apply updates to types in the type-system

2016-09-26 Thread Sarath Kumar Subramanian

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51939/
---

(Updated Sept. 25, 2016, 11:13 p.m.)


Review request for atlas, Madhan Neethiraj, Shwetha GS, and Suma Shivaprasad.


Changes
---

added unit tests to test add/update attributes and code refactoring in patch 
files


Bugs: ATLAS-1174
https://issues.apache.org/jira/browse/ATLAS-1174


Repository: atlas


Description
---

1. Introduce "version" attribute to all types in the type-system, this helps to 
track changes made to the default types (hive, sqoop, falcon and storm types) 
and user created types. If version is not mentioned during creation of a type, 
default version "1.0" is assigned (optional attribute).
2. Using the version attributed for types, introduce a patch framework for type 
system. This framework applies patches to a type using its version number and 
can be used during upgrade - add new attributes to an existing types and it 
will be run during atlas startup.
The sequence of steps:
a. During atlas startup, check $ATLAS_HOME/models/patches directory for any 
available patch files (json files). If there any patch files handle them.
b. Sample patch json file looks like:
{
"patches": [
{ 
"action": "ADD_ATTRIBUTE",
"typeName": "hive_column",
"applyToVersion": "1.0",
"updateToVersion": "2.0",
"actionParams": [
{ "name": "position", "dataTypeName": "int", "multiplicity": "optional", 
"isComposite": false, "isUnique": false, "isIndexable": false, 
"reverseAttributeName": null }
]
} ]
}
c. The framework updates the type in "typeName" for the matching version number 
and after applying the patch, update the version to the one mentioned in 
"updateToVersion"
d. The json file can have more than one action (array of actions).
e. There can be multiple patch json files in the directory and are applied in 
the sort order of the filename. eg:
001-hive_column_add_position.json
002-hive_column_add_anotherattribute.json


Diffs (updated)
-

  common/src/main/java/org/apache/atlas/AtlasConstants.java 17ffbd7 
  common/src/main/java/org/apache/atlas/repository/Constants.java d7f9c89 
  
repository/src/main/java/org/apache/atlas/repository/typestore/GraphBackedTypeStore.java
 a94d157 
  repository/src/main/java/org/apache/atlas/services/AtlasPatchHandler.java 
PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/services/AtlasTypeAttributePatch.java 
PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/AtlasTypePatch.java 
PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 
6a937f4 
  
repository/src/test/java/org/apache/atlas/service/DefaultMetadataServiceTest.java
 52dcfde 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/AbstractDataType.java
 fad091d 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/ClassType.java 
c56987a 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumType.java 
bdd0a13 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/EnumTypeDefinition.java
 6340615 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalType.java
 7224699 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/HierarchicalTypeDefinition.java
 9a299f0 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/IDataType.java 
85ddee7 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/StructType.java 
6f40c1d 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/StructTypeDefinition.java
 f1ce1b7 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/TraitType.java 
f23bf5b 
  typesystem/src/main/java/org/apache/atlas/typesystem/types/TypeSystem.java 
70ba89b 
  
typesystem/src/main/java/org/apache/atlas/typesystem/types/utils/TypesUtil.java 
ef8448d 
  
typesystem/src/main/scala/org/apache/atlas/typesystem/json/TypesSerialization.scala
 5618938 

Diff: https://reviews.apache.org/r/51939/diff/


Testing
---


Thanks,

Sarath Kumar Subramanian