[jira] [Assigned] (ATLAS-3736) Atlas typedef for microservices

2020-08-18 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman reassigned ATLAS-3736:
-

Assignee: (was: Barbara Eckman)

> Atlas typedef for microservices
> ---
>
> Key: ATLAS-3736
> URL: https://issues.apache.org/jira/browse/ATLAS-3736
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
> Attachments: all_entity_types.json, all_relationship_types.json
>
>
> Microservices are an increasingly important and pervasive part of modern 
> architectures.  By definition, a microservice’s APIs provide the only access 
> to a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not 
> exposed.   Therefore, it would be nice if Atlas enabled microservices to be 
> discovered along with other dataset types.  The microservice typedef would 
> include the top-level endpoint, plus each of its API resources (GETs, POSTs, 
> PUTs, DELETEs), with their URI, parameters, JSON request and response 
> objects, and http response messages (200: Success, 404: Not Found, etc). 
>  One might ask, why put this metadata into Atlas when it can already be 
> searched in a Swagger/OpenAPI repository?  Three main reasons:  1) 
> microservice metadata can be searched along with all other enterprise 
> datasets to find a complete set of datasets of interest for, say, a 
> cross-silo data science investigation; 2) we can express lineage between 
> microservices and other datasets that either feed the db underlying the 
> microservice or serve as historical repositories for transactional 
> microservices (eg S3 datalakes); 3) we can express semantic relationships 
> between microservices, eg the marketing contact microservice contains a 
> location id as a “FK” that is the “PK” of the location microservice.
> Main Entities:
>  * microservice
>  * APIResource
>  * responseMessage 
> Main Relationships:
>  * microservice to APIResources
>  * APIResource to responseMessages
>  * APIResource to the schemas of its JSON request/response objects
>  * microservice to microservice link
> Proposed typedefs are attached.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ATLAS-3570) Atlas typedefs for Machine Learning Models, Feature Sets, and Feature Engineering Engines

2020-08-18 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman reassigned ATLAS-3570:
-

Assignee: (was: Barbara Eckman)

> Atlas typedefs for Machine Learning Models, Feature Sets, and Feature 
> Engineering Engines
> -
>
> Key: ATLAS-3570
> URL: https://issues.apache.org/jira/browse/ATLAS-3570
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
> Attachments: MLModel_typedefs.tar
>
>
> Currently the base types in Atlas do not include Machine Learning (ML) Model 
> tables. It would be nice to add typedefs for them, so they could be part of 
> enterprise discovery and versioning.  
> ENTITIES COULD INCLUDE:
> MLModel (overview info), with attributes:
>  * uniqueId
>  * version
>  * businessUseCase
>  * modelFramework (eg scikit-learn)
>  * modelTypes (eg random forest regressor)
>  * modelClass (eg random forest (bagging + decision trees))
>  * isEnsemble boolean
>  * outcomeTypeDescription (eg single float)
>  * **dataScienceOwnerEmail
>  * githubRepoURL where the model code is founc
>  * modelDeploymentDate
>  * populationScored (eg in Comcast, residential or business customers)
>  * accuracyMeasures
> MLModelExecution, with attributes:
>  * exampleInputDatasetURL (URL where a sample input dataset can be found)
>  * outputTargetDatasetURLs
>  * opsOwnerEmail
>  * executionEndpointURL
>  * dockerContainerURL
>  * MLFlowPointerURL
>  * executionNotebookURL (eg Databricks, Jupyter)
> MLModelTraining, with attributes:
>  * hyperParameters
>  * trainingDatasetURLs
>  * trainingNotebookURL (eg Databricks, Jupyter)
> FeatureSet (a set of features prepared as input to an ML model), with 
> attributes:
>  * version
>  * locationURL 
> FeatureEngineeringEngine (the engine that generates the feature set for an ML 
> model), with attributes:
>  * version
>  * ownerEmail
>  * inputSourceURL
>  * processingEngineInfoURL (docs on the processing engine)
>  * githubRepoURL 
>  * outputTargetURL
> RELATIONSHIPS could include:
>  * model to  execution
>  * model to training
>  * model execution to example input dataset (eg kafka topic)
>  * model execution to output target dataset (eg S3 prefix or object)
>  * model execution to input schema
>  * model execution to output schema
>  * model execution to input feature set objects
>  * training to input training dataset objects
>  * training to input training dataset schema
>  * feature engineering engine to output feature set object
>  * feature engineering engine to input source dataset (eg kafka topic)
>  * feature engineering engine to input source dataset's schema
>  * feature engineering engine to output target dataset (eg S3 object)
>  * feature set object to its schema
> ENUMs could include:
>  * MLModel_type (eg logistic regression, random_forest_regression)
> PROCESSES related to MLModels could include:
>  * MLPipelineDependencyEdge (dependency between two models in the ML pipeline)
>  ** inputs and outputs are both MLModels
>  * MLModelEvolutionEdge (lineage between 2 versions of an ML model)
>  ** inputs and outputs are both MLModels
>  ** only attribute is an array of strings representing changes made from one 
> version to the other.  this could be made more structured as we discover how 
> it is used.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3736) Atlas typedef for microservices

2020-04-17 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3736:
--
Attachment: all_relationship_types.json
all_entity_types.json

> Atlas typedef for microservices
> ---
>
> Key: ATLAS-3736
> URL: https://issues.apache.org/jira/browse/ATLAS-3736
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
> Attachments: all_entity_types.json, all_relationship_types.json
>
>
> Microservices are an increasingly important and pervasive part of modern 
> architectures.  By definition, a microservice’s APIs provide the only access 
> to a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not 
> exposed.   Therefore, it would be nice if Atlas enabled microservices to be 
> discovered along with other dataset types.  The microservice typedef would 
> include the top-level endpoint, plus each of its API resources (GETs, POSTs, 
> PUTs, DELETEs), with their URI, parameters, JSON request and response 
> objects, and http response messages (200: Success, 404: Not Found, etc). 
>  One might ask, why put this metadata into Atlas when it can already be 
> searched in a Swagger/OpenAPI repository?  Three main reasons:  1) 
> microservice metadata can be searched along with all other enterprise 
> datasets to find a complete set of datasets of interest for, say, a 
> cross-silo data science investigation; 2) we can express lineage between 
> microservices and other datasets that either feed the db underlying the 
> microservice or serve as historical repositories for transactional 
> microservices (eg S3 datalakes); 3) we can express semantic relationships 
> between microservices, eg the marketing contact microservice contains a 
> location id as a “FK” that is the “PK” of the location microservice.
> Main Entities:
>  * microservice
>  * APIResource
>  * responseMessage 
> Main Relationships:
>  * microservice to APIResources
>  * APIResource to responseMessages
>  * APIResource to the schemas of its JSON request/response objects
>  * microservice to microservice link
> Proposed typedefs are attached.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3736) Atlas typedef for microservices

2020-04-17 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3736:
--
Description: 
Microservices are an increasingly important and pervasive part of modern 
architectures.  By definition, a microservice’s APIs provide the only access to 
a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not exposed.   
Therefore, it would be nice if Atlas enabled microservices to be discovered 
along with other dataset types.  The microservice typedef would include the 
top-level endpoint, plus each of its API resources (GETs, POSTs, PUTs, 
DELETEs), with their URI, parameters, JSON request and response objects, and 
http response messages (200: Success, 404: Not Found, etc). 

 One might ask, why put this metadata into Atlas when it can already be 
searched in a Swagger/OpenAPI repository?  Three main reasons:  1) microservice 
metadata can be searched along with all other enterprise datasets to find a 
complete set of datasets of interest for, say, a cross-silo data science 
investigation; 2) we can express lineage between microservices and other 
datasets that either feed the db underlying the microservice or serve as 
historical repositories for transactional microservices (eg S3 datalakes); 3) 
we can express semantic relationships between microservices, eg the marketing 
contact microservice contains a location id as a “FK” that is the “PK” of the 
location microservice.

Main Entities:
 * microservice
 * APIResource
 * responseMessage 

Main Relationships:
 * microservice to APIResources
 * APIResource to responseMessages
 * APIResource to the schemas of its JSON request/response objects
 * microservice to microservice link

Proposed typedefs are attached.

 

 

  was:
Microservices are an increasingly important and pervasive part of modern 
architectures.  By definition, a microservice’s APIs provide the only access to 
a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not exposed.   
Therefore, it would be nice if Atlas enabled microservices to be discovered 
along with other dataset types.  The microservice typedef would include the 
top-level endpoint, plus each of its API resources (GETs, POSTs, PUTs, 
DELETEs), with their URI, parameters, JSON request and response objects, and 
http response messages (200: Success, 404: Not Found, etc). 

 One might ask, why put this metadata into Atlas when it can already be 
searched in a Swagger/OpenAPI repository?  Three main reasons:  1) microservice 
metadata can be searched along with all other enterprise datasets to find a 
complete set of datasets of interest for, say, a cross-silo data science 
investigation; 2) we can express lineage between microservices and other 
datasets that either feed the db underlying the microservice or serve as 
historical repositories for transactional microservices (eg S3 datalakes); 3) 
we can express semantic relationships between microservices, eg the marketing 
contact microservice contains a location id as a “FK” that is the “PK” of the 
location microservice.

more to come...

 

 


> Atlas typedef for microservices
> ---
>
> Key: ATLAS-3736
> URL: https://issues.apache.org/jira/browse/ATLAS-3736
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
> Attachments: all_entity_types.json, all_relationship_types.json
>
>
> Microservices are an increasingly important and pervasive part of modern 
> architectures.  By definition, a microservice’s APIs provide the only access 
> to a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not 
> exposed.   Therefore, it would be nice if Atlas enabled microservices to be 
> discovered along with other dataset types.  The microservice typedef would 
> include the top-level endpoint, plus each of its API resources (GETs, POSTs, 
> PUTs, DELETEs), with their URI, parameters, JSON request and response 
> objects, and http response messages (200: Success, 404: Not Found, etc). 
>  One might ask, why put this metadata into Atlas when it can already be 
> searched in a Swagger/OpenAPI repository?  Three main reasons:  1) 
> microservice metadata can be searched along with all other enterprise 
> datasets to find a complete set of datasets of interest for, say, a 
> cross-silo data science investigation; 2) we can express lineage between 
> microservices and other datasets that either feed the db underlying the 
> microservice or serve as historical repositories for transactional 
> microservices (eg S3 datalakes); 3) we can express semantic relationships 
> between microservices, eg the marketing contact microservice contains a 
> location id as a “FK” that is the “PK” of the location microservice.
> Main Entities:
>  * microservice
>  * APIResource
> 

[jira] [Updated] (ATLAS-3736) Atlas typedef for microservices

2020-04-17 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3736:
--
Description: 
Microservices are an increasingly important and pervasive part of modern 
architectures.  By definition, a microservice’s APIs provide the only access to 
a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not exposed.   
Therefore, it would be nice if Atlas enabled microservices to be discovered 
along with other dataset types.  The microservice typedef would include the 
top-level endpoint, plus each of its API resources (GETs, POSTs, PUTs, 
DELETEs), with their URI, parameters, JSON request and response objects, and 
http response messages (200: Success, 404: Not Found, etc). 

 One might ask, why put this metadata into Atlas when it can already be 
searched in a Swagger/OpenAPI repository?  Three main reasons:  1) microservice 
metadata can be searched along with all other enterprise datasets to find a 
complete set of datasets of interest for, say, a cross-silo data science 
investigation; 2) we can express lineage between microservices and other 
datasets that either feed the db underlying the microservice or serve as 
historical repositories for transactional microservices (eg S3 datalakes); 3) 
we can express semantic relationships between microservices, eg the marketing 
contact microservice contains a location id as a “FK” that is the “PK” of the 
location microservice.

more to come...

 

 

  was:
Microservices are an increasingly important and pervasive part of modern 
architectures.  By definition, a microservice’s APIs provide the only access to 
a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not exposed.   
Therefore, it would be nice if Atlas enabled microservices to be discovered 
along with other dataset types.  The microservice typedef would include the 
top-level endpoint, plus each of its API resources (GETs, POSTs, PUTs, 
DELETEs), with their URI, parameters, JSON request and response objects, and 
http response messages (200: Success, 404: Not Found, etc). 

 One might ask, why put this metadata into Atlas when it can already be 
searched in a Swagger/OpenAPI repository?  Three main reasons:  1) microservice 
metadata can be searched along with all other enterprise datasets to find a 
complete set of datasets of interest for, say, a cross-silo data science 
investigation; 2) we can express lineage between microservices and other 
datasets that either feed the db underlying the microservice or serve as 
historical repositories for transactional microservices (eg S3 datalakes); 3) 
we can express semantic relationships between microservices, eg the marketing 
contact microservice contains a location id as a “FK” that is the “PK” of the 
location microservice.

 

*** more to come ***

 

 


> Atlas typedef for microservices
> ---
>
> Key: ATLAS-3736
> URL: https://issues.apache.org/jira/browse/ATLAS-3736
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
>
> Microservices are an increasingly important and pervasive part of modern 
> architectures.  By definition, a microservice’s APIs provide the only access 
> to a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not 
> exposed.   Therefore, it would be nice if Atlas enabled microservices to be 
> discovered along with other dataset types.  The microservice typedef would 
> include the top-level endpoint, plus each of its API resources (GETs, POSTs, 
> PUTs, DELETEs), with their URI, parameters, JSON request and response 
> objects, and http response messages (200: Success, 404: Not Found, etc). 
>  One might ask, why put this metadata into Atlas when it can already be 
> searched in a Swagger/OpenAPI repository?  Three main reasons:  1) 
> microservice metadata can be searched along with all other enterprise 
> datasets to find a complete set of datasets of interest for, say, a 
> cross-silo data science investigation; 2) we can express lineage between 
> microservices and other datasets that either feed the db underlying the 
> microservice or serve as historical repositories for transactional 
> microservices (eg S3 datalakes); 3) we can express semantic relationships 
> between microservices, eg the marketing contact microservice contains a 
> location id as a “FK” that is the “PK” of the location microservice.
> more to come...
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3736) Atlas typedef for microservices

2020-04-17 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3736:
--
Description: 
Microservices are an increasingly important and pervasive part of modern 
architectures.  By definition, a microservice’s APIs provide the only access to 
a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not exposed.   
Therefore, it would be nice if Atlas enabled microservices to be discovered 
along with other dataset types.  The microservice typedef would include the 
top-level endpoint, plus each of its API resources (GETs, POSTs, PUTs, 
DELETEs), with their URI, parameters, JSON request and response objects, and 
http response messages (200: Success, 404: Not Found, etc). 

 One might ask, why put this metadata into Atlas when it can already be 
searched in a Swagger/OpenAPI repository?  Three main reasons:  1) microservice 
metadata can be searched along with all other enterprise datasets to find a 
complete set of datasets of interest for, say, a cross-silo data science 
investigation; 2) we can express lineage between microservices and other 
datasets that either feed the db underlying the microservice or serve as 
historical repositories for transactional microservices (eg S3 datalakes); 3) 
we can express semantic relationships between microservices, eg the marketing 
contact microservice contains a location id as a “FK” that is the “PK” of the 
location microservice.

 

*** more to come ***

 

 

  was:
Microservices are an increasingly important and pervasive part of modern 
architectures.  By definition, a microservice’s APIs provide the only access to 
a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not exposed.   
Therefore, it would be nice if Atlas enabled microservices to be discovered 
along with other dataset types.  The microservice typedef would include the 
top-level endpoint, plus each of its API resources (GETs, POSTs, PUTs, 
DELETEs), with their URI, parameters, JSON request and response objects, and 
http response messages (200: Success, 404: Not Found, etc). 

 One might ask, why put this metadata into Atlas when it can already be 
searched in a Swagger/OpenAPI repository?  Three main reasons:  1) microservice 
metadata can be searched along with all other enterprise datasets to find a 
complete set of datasets of interest for, say, a cross-silo data science 
investigation; 2) we can express lineage between microservices and other 
datasets that either feed the db underlying the microservice or serve as 
historical repositories for transactional microservices (eg S3 datalakes); 3) 
we can express semantic relationships between microservices, eg the marketing 
contact microservice contains a location id as a “FK” that is the “PK” of the 
location microservice.


> Atlas typedef for microservices
> ---
>
> Key: ATLAS-3736
> URL: https://issues.apache.org/jira/browse/ATLAS-3736
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
>
> Microservices are an increasingly important and pervasive part of modern 
> architectures.  By definition, a microservice’s APIs provide the only access 
> to a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not 
> exposed.   Therefore, it would be nice if Atlas enabled microservices to be 
> discovered along with other dataset types.  The microservice typedef would 
> include the top-level endpoint, plus each of its API resources (GETs, POSTs, 
> PUTs, DELETEs), with their URI, parameters, JSON request and response 
> objects, and http response messages (200: Success, 404: Not Found, etc). 
>  One might ask, why put this metadata into Atlas when it can already be 
> searched in a Swagger/OpenAPI repository?  Three main reasons:  1) 
> microservice metadata can be searched along with all other enterprise 
> datasets to find a complete set of datasets of interest for, say, a 
> cross-silo data science investigation; 2) we can express lineage between 
> microservices and other datasets that either feed the db underlying the 
> microservice or serve as historical repositories for transactional 
> microservices (eg S3 datalakes); 3) we can express semantic relationships 
> between microservices, eg the marketing contact microservice contains a 
> location id as a “FK” that is the “PK” of the location microservice.
>  
> *** more to come ***
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-3736) Atlas typedef for microservices

2020-04-17 Thread Barbara Eckman (Jira)
Barbara Eckman created ATLAS-3736:
-

 Summary: Atlas typedef for microservices
 Key: ATLAS-3736
 URL: https://issues.apache.org/jira/browse/ATLAS-3736
 Project: Atlas
  Issue Type: New Feature
Reporter: Barbara Eckman
Assignee: Barbara Eckman


Microservices are an increasingly important and pervasive part of modern 
architectures.  By definition, a microservice’s APIs provide the only access to 
a dataset—the schema of the underlying RDBMS, NoSQL db, etc is not exposed.   
Therefore, it would be nice if Atlas enabled microservices to be discovered 
along with other dataset types.  The microservice typedef would include the 
top-level endpoint, plus each of its API resources (GETs, POSTs, PUTs, 
DELETEs), with their URI, parameters, JSON request and response objects, and 
http response messages (200: Success, 404: Not Found, etc). 

 One might ask, why put this metadata into Atlas when it can already be 
searched in a Swagger/OpenAPI repository?  Three main reasons:  1) microservice 
metadata can be searched along with all other enterprise datasets to find a 
complete set of datasets of interest for, say, a cross-silo data science 
investigation; 2) we can express lineage between microservices and other 
datasets that either feed the db underlying the microservice or serve as 
historical repositories for transactional microservices (eg S3 datalakes); 3) 
we can express semantic relationships between microservices, eg the marketing 
contact microservice contains a location id as a “FK” that is the “PK” of the 
location microservice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ATLAS-3570) Atlas typedefs for Machine Learning Models, Feature Sets, and Feature Engineering Engines

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman reassigned ATLAS-3570:
-

Assignee: Barbara Eckman

> Atlas typedefs for Machine Learning Models, Feature Sets, and Feature 
> Engineering Engines
> -
>
> Key: ATLAS-3570
> URL: https://issues.apache.org/jira/browse/ATLAS-3570
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
> Attachments: MLModel_typedefs.tar
>
>
> Currently the base types in Atlas do not include Machine Learning (ML) Model 
> tables. It would be nice to add typedefs for them, so they could be part of 
> enterprise discovery and versioning.  
> ENTITIES COULD INCLUDE:
> MLModel (overview info), with attributes:
>  * uniqueId
>  * version
>  * businessUseCase
>  * modelFramework (eg scikit-learn)
>  * modelTypes (eg random forest regressor)
>  * modelClass (eg random forest (bagging + decision trees))
>  * isEnsemble boolean
>  * outcomeTypeDescription (eg single float)
>  * **dataScienceOwnerEmail
>  * githubRepoURL where the model code is founc
>  * modelDeploymentDate
>  * populationScored (eg in Comcast, residential or business customers)
>  * accuracyMeasures
> MLModelExecution, with attributes:
>  * exampleInputDatasetURL (URL where a sample input dataset can be found)
>  * outputTargetDatasetURLs
>  * opsOwnerEmail
>  * executionEndpointURL
>  * dockerContainerURL
>  * MLFlowPointerURL
>  * executionNotebookURL (eg Databricks, Jupyter)
> MLModelTraining, with attributes:
>  * hyperParameters
>  * trainingDatasetURLs
>  * trainingNotebookURL (eg Databricks, Jupyter)
> FeatureSet (a set of features prepared as input to an ML model), with 
> attributes:
>  * version
>  * locationURL 
> FeatureEngineeringEngine (the engine that generates the feature set for an ML 
> model), with attributes:
>  * version
>  * ownerEmail
>  * inputSourceURL
>  * processingEngineInfoURL (docs on the processing engine)
>  * githubRepoURL 
>  * outputTargetURL
> RELATIONSHIPS could include:
>  * model to  execution
>  * model to training
>  * model execution to example input dataset (eg kafka topic)
>  * model execution to output target dataset (eg S3 prefix or object)
>  * model execution to input schema
>  * model execution to output schema
>  * model execution to input feature set objects
>  * training to input training dataset objects
>  * training to input training dataset schema
>  * feature engineering engine to output feature set object
>  * feature engineering engine to input source dataset (eg kafka topic)
>  * feature engineering engine to input source dataset's schema
>  * feature engineering engine to output target dataset (eg S3 object)
>  * feature set object to its schema
> ENUMs could include:
>  * MLModel_type (eg logistic regression, random_forest_regression)
> PROCESSES related to MLModels could include:
>  * MLPipelineDependencyEdge (dependency between two models in the ML pipeline)
>  ** inputs and outputs are both MLModels
>  * MLModelEvolutionEdge (lineage between 2 versions of an ML model)
>  ** inputs and outputs are both MLModels
>  ** only attribute is an array of strings representing changes made from one 
> version to the other.  this could be made more structured as we discover how 
> it is used.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ATLAS-3569) AWS Dynamodb type def for Atlas

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman reassigned ATLAS-3569:
-

Assignee: Barbara Eckman

> AWS Dynamodb type def for Atlas
> ---
>
> Key: ATLAS-3569
> URL: https://issues.apache.org/jira/browse/ATLAS-3569
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
> Attachments: aws_dynamodb_typedef
>
>
> Currently the base types in Atlas do not include AWS Dynamodb tables. It 
> would be nice to add typedefs for them.  
> ENTITIES could include:
> ->dynamodb table with fields:
>  * table_arn (AWS table id)
>  * create_time
>  * aws_account that the table is created in 
>  * aws_region
>  * aws_tags associated with the table
>  * cloudwatch metrics associated with the table
> ->dynamodb attribute with fields:
>  * type
>  * optional string representing the struct definition of map or list type 
> attribute
> ->dynamodb index with field:
>  * index type (Local or Global Secondary Index)
> RELATIONSHIPS could include:
>  * table to attribute that is its primary partition key
>  * table to attribute that is its primary sort key
>  * table to other attributes 
>  * table to indexes
>  * index to partition key
>  * index to sort key
>  * attribute to schema (for highly nested attributes that are best understood 
> via JSON or avro sxhemas)
>  ENUMS for:
>  * index type ((Local or Global Secondary Index))
>  * aws region name
>  * dynamodb attribute datatypes
> STRUCTS (already defined in ATLAS-2708 ) for:
>  * aws_cloud_watch_metric
>  * aws_tag



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3570) Atlas typedefs for Machine Learning Models, Feature Sets, and Feature Engineering Engines

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3570:
--
Attachment: MLModel_typedefs.tar

> Atlas typedefs for Machine Learning Models, Feature Sets, and Feature 
> Engineering Engines
> -
>
> Key: ATLAS-3570
> URL: https://issues.apache.org/jira/browse/ATLAS-3570
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
> Attachments: MLModel_typedefs.tar
>
>
> Currently the base types in Atlas do not include Machine Learning (ML) Model 
> tables. It would be nice to add typedefs for them, so they could be part of 
> enterprise discovery and versioning.  
> ENTITIES COULD INCLUDE:
> MLModel (overview info), with attributes:
>  * uniqueId
>  * version
>  * businessUseCase
>  * modelFramework (eg scikit-learn)
>  * modelTypes (eg random forest regressor)
>  * modelClass (eg random forest (bagging + decision trees))
>  * isEnsemble boolean
>  * outcomeTypeDescription (eg single float)
>  * **dataScienceOwnerEmail
>  * githubRepoURL where the model code is founc
>  * modelDeploymentDate
>  * populationScored (eg in Comcast, residential or business customers)
>  * accuracyMeasures
> MLModelExecution, with attributes:
>  * exampleInputDatasetURL (URL where a sample input dataset can be found)
>  * outputTargetDatasetURLs
>  * opsOwnerEmail
>  * executionEndpointURL
>  * dockerContainerURL
>  * MLFlowPointerURL
>  * executionNotebookURL (eg Databricks, Jupyter)
> MLModelTraining, with attributes:
>  * hyperParameters
>  * trainingDatasetURLs
>  * trainingNotebookURL (eg Databricks, Jupyter)
> FeatureSet (a set of features prepared as input to an ML model), with 
> attributes:
>  * version
>  * locationURL 
> FeatureEngineeringEngine (the engine that generates the feature set for an ML 
> model), with attributes:
>  * version
>  * ownerEmail
>  * inputSourceURL
>  * processingEngineInfoURL (docs on the processing engine)
>  * githubRepoURL 
>  * outputTargetURL
> RELATIONSHIPS could include:
>  * model to  execution
>  * model to training
>  * model execution to example input dataset (eg kafka topic)
>  * model execution to output target dataset (eg S3 prefix or object)
>  * model execution to input schema
>  * model execution to output schema
>  * model execution to input feature set objects
>  * training to input training dataset objects
>  * training to input training dataset schema
>  * feature engineering engine to output feature set object
>  * feature engineering engine to input source dataset (eg kafka topic)
>  * feature engineering engine to input source dataset's schema
>  * feature engineering engine to output target dataset (eg S3 object)
>  * feature set object to its schema
> ENUMs could include:
>  * MLModel_type (eg logistic regression, random_forest_regression)
> PROCESSES related to MLModels could include:
>  * MLPipelineDependencyEdge (dependency between two models in the ML pipeline)
>  ** inputs and outputs are both MLModels
>  * MLModelEvolutionEdge (lineage between 2 versions of an ML model)
>  ** inputs and outputs are both MLModels
>  ** only attribute is an array of strings representing changes made from one 
> version to the other.  this could be made more structured as we discover how 
> it is used.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3570) Atlas typedefs for Machine Learning Models, Feature Sets, and Feature Engineering Engines

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3570:
--
Description: 
Currently the base types in Atlas do not include Machine Learning (ML) Model 
tables. It would be nice to add typedefs for them, so they could be part of 
enterprise discovery and versioning.  

ENTITIES COULD INCLUDE:

MLModel (overview info), with attributes:
 * uniqueId
 * version
 * businessUseCase
 * modelFramework (eg scikit-learn)
 * modelTypes (eg random forest regressor)
 * modelClass (eg random forest (bagging + decision trees))
 * isEnsemble boolean
 * outcomeTypeDescription (eg single float)
 * **dataScienceOwnerEmail
 * githubRepoURL where the model code is founc
 * modelDeploymentDate
 * populationScored (eg in Comcast, residential or business customers)
 * accuracyMeasures

MLModelExecution, with attributes:
 * exampleInputDatasetURL (URL where a sample input dataset can be found)
 * outputTargetDatasetURLs
 * opsOwnerEmail
 * executionEndpointURL
 * dockerContainerURL
 * MLFlowPointerURL
 * executionNotebookURL (eg Databricks, Jupyter)

MLModelTraining, with attributes:
 * hyperParameters
 * trainingDatasetURLs
 * trainingNotebookURL (eg Databricks, Jupyter)

FeatureSet (a set of features prepared as input to an ML model), with 
attributes:
 * version
 * locationURL 

FeatureEngineeringEngine (the engine that generates the feature set for an ML 
model), with attributes:
 * version
 * ownerEmail
 * inputSourceURL
 * processingEngineInfoURL (docs on the processing engine)
 * githubRepoURL 
 * outputTargetURL

RELATIONSHIPS could include:
 * model to  execution
 * model to training
 * model execution to example input dataset (eg kafka topic)
 * model execution to output target dataset (eg S3 prefix or object)
 * model execution to input schema
 * model execution to output schema
 * model execution to input feature set objects
 * training to input training dataset objects
 * training to input training dataset schema
 * feature engineering engine to output feature set object
 * feature engineering engine to input source dataset (eg kafka topic)
 * feature engineering engine to input source dataset's schema
 * feature engineering engine to output target dataset (eg S3 object)
 * feature set object to its schema

ENUMs could include:
 * MLModel_type (eg logistic regression, random_forest_regression)

PROCESSES related to MLModels could include:
 * MLPipelineDependencyEdge (dependency between two models in the ML pipeline)
 ** inputs and outputs are both MLModels
 * MLModelEvolutionEdge (lineage between 2 versions of an ML model)
 ** inputs and outputs are both MLModels
 ** only attribute is an array of strings representing changes made from one 
version to the other.  this could be made more structured as we discover how it 
is used.

 

  was:
Currently the base types in Atlas do not include Machine Learning (ML) Model 
tables. It would be nice to add typedefs for them, so they could be part of 
enterprise discovery and versioning.  

ENTITIES COULD INCLUDE:

MLModel (overview info), with attributes:
 * uniqueId
 * version
 * businessUseCase
 * modelFramework (eg scikit-learn)
 * modelTypes (eg random forest regressor)
 * modelClass (eg random forest (bagging + decision trees))
 * isEnsemble boolean
 * outcomeTypeDescription (eg single float)
 * **dataScienceOwnerEmail
 * githubRepoURL where the model code is founc
 * modelDeploymentDate
 * populationScored (eg in Comcast, residential or business customers)
 * accuracyMeasures

MLModelExecution, with attributes:
 * exampleInputDatasetURL (URL where a sample input dataset can be found)
 * outputTargetDatasetURLs
 * opsOwnerEmail
 * executionEndpointURL
 * dockerContainerURL
 * MLFlowPointerURL
 * executionNotebookURL (eg Databricks, Jupyter)

MLModelTraining, with attributes:
 * hyperParameters
 * trainingDatasetURLs
 * trainingNotebookURL (eg Databricks, Jupyter)

FeatureSet (a set of features prepared as input to an ML model), with 
attributes:
 * version
 * locationURL 

FeatureEngineeringEngine (the engine that generates the feature set for an ML 
model), with attributes:
 * version
 * ownerEmail
 * inputSourceURL
 * processingEngineInfoURL (docs on the processing engine)
 * githubRepoURL 
 * outputTargetURL

 

 


> Atlas typedefs for Machine Learning Models, Feature Sets, and Feature 
> Engineering Engines
> -
>
> Key: ATLAS-3570
> URL: https://issues.apache.org/jira/browse/ATLAS-3570
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
>
> Currently the base types in Atlas do not include Machine Learning (ML) Model 
> tables. It would be nice to add typedefs for them, so they could be part of 
> enterprise discovery and versioning. 

[jira] [Created] (ATLAS-3570) Atlas typedefs for Machine Learning Models, Feature Sets, and Feature Engineering Engines

2019-12-20 Thread Barbara Eckman (Jira)
Barbara Eckman created ATLAS-3570:
-

 Summary: Atlas typedefs for Machine Learning Models, Feature Sets, 
and Feature Engineering Engines
 Key: ATLAS-3570
 URL: https://issues.apache.org/jira/browse/ATLAS-3570
 Project: Atlas
  Issue Type: New Feature
Reporter: Barbara Eckman


Currently the base types in Atlas do not include Machine Learning (ML) Model 
tables. It would be nice to add typedefs for them, so they could be part of 
enterprise discovery and versioning.  

ENTITIES COULD INCLUDE:

MLModel (overview info), with attributes:
 * uniqueId
 * version
 * businessUseCase
 * modelFramework (eg scikit-learn)
 * modelTypes (eg random forest regressor)
 * modelClass (eg random forest (bagging + decision trees))
 * isEnsemble boolean
 * outcomeTypeDescription (eg single float)
 * **dataScienceOwnerEmail
 * githubRepoURL where the model code is founc
 * modelDeploymentDate
 * populationScored (eg in Comcast, residential or business customers)
 * accuracyMeasures

MLModelExecution, with attributes:
 * exampleInputDatasetURL (URL where a sample input dataset can be found)
 * outputTargetDatasetURLs
 * opsOwnerEmail
 * executionEndpointURL
 * dockerContainerURL
 * MLFlowPointerURL
 * executionNotebookURL (eg Databricks, Jupyter)

MLModelTraining, with attributes:
 * hyperParameters
 * trainingDatasetURLs
 * trainingNotebookURL (eg Databricks, Jupyter)

FeatureSet (a set of features prepared as input to an ML model), with 
attributes:
 * version
 * locationURL 

FeatureEngineeringEngine (the engine that generates the feature set for an ML 
model), with attributes:
 * version
 * ownerEmail
 * inputSourceURL
 * processingEngineInfoURL (docs on the processing engine)
 * githubRepoURL 
 * outputTargetURL

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3569) AWS Dynamodb type def for Atlas

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3569:
--
Description: 
Currently the base types in Atlas do not include AWS Dynamodb tables. It would 
be nice to add typedefs for them.  

ENTITIES could include:

->dynamodb table with fields:
 * table_arn (AWS table id)
 * create_time
 * aws_account that the table is created in 
 * aws_region
 * aws_tags associated with the table
 * cloudwatch metrics associated with the table

->dynamodb attribute with fields:
 * type
 * optional string representing the struct definition of map or list type 
attribute

->dynamodb index with field:
 * index type (Local or Global Secondary Index)

RELATIONSHIPS could include:
 * table to attribute that is its primary partition key
 * table to attribute that is its primary sort key
 * table to other attributes 
 * table to indexes
 * index to partition key
 * index to sort key
 * attribute to schema (for highly nested attributes that are best understood 
via JSON or avro sxhemas)

 ENUMS for:
 * index type ((Local or Global Secondary Index))
 * aws region name
 * dynamodb attribute datatypes

STRUCTS (already defined in ATLAS-2708 ) for:
 * aws_cloud_watch_metric
 * aws_tag

  was:
Currently the base types in Atlas do not include AWS Dynamodb tables. It would 
be nice to add typedefs for them.  

ENTITIES could include:

~dynamodb table with fields:
 * table_arn (AWS table id)
 * create_time
 * aws_account that the table is created in 
 * aws_region
 * aws_tags associated with the table
 * cloudwatch metrics associated with the table

~dynamodb attribute with fields:
 * type
 * optional string representing the struct definition of map or list type 
attribute

~dynamodb index with field:
 * index type (Local or Global Secondary Index)

RELATIONSHIPS could include:
 * table to attribute that is its primary partition key
 * table to attribute that is its primary sort key
 * table to other attributes 
 * table to indexes
 * index to partition key
 * index to sort key
 * attribute to schema (for highly nested attributes that are best understood 
via JSON or avro sxhemas)

 ENUMS for:
 * index type ((Local or Global Secondary Index))
 * aws region name
 * dynamodb attribute datatypes

STRUCTS (already defined in ATLAS-2708 ) for:
 * aws_cloud_watch_metric
 * aws_tag


> AWS Dynamodb type def for Atlas
> ---
>
> Key: ATLAS-3569
> URL: https://issues.apache.org/jira/browse/ATLAS-3569
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
> Attachments: aws_dynamodb_typedef
>
>
> Currently the base types in Atlas do not include AWS Dynamodb tables. It 
> would be nice to add typedefs for them.  
> ENTITIES could include:
> ->dynamodb table with fields:
>  * table_arn (AWS table id)
>  * create_time
>  * aws_account that the table is created in 
>  * aws_region
>  * aws_tags associated with the table
>  * cloudwatch metrics associated with the table
> ->dynamodb attribute with fields:
>  * type
>  * optional string representing the struct definition of map or list type 
> attribute
> ->dynamodb index with field:
>  * index type (Local or Global Secondary Index)
> RELATIONSHIPS could include:
>  * table to attribute that is its primary partition key
>  * table to attribute that is its primary sort key
>  * table to other attributes 
>  * table to indexes
>  * index to partition key
>  * index to sort key
>  * attribute to schema (for highly nested attributes that are best understood 
> via JSON or avro sxhemas)
>  ENUMS for:
>  * index type ((Local or Global Secondary Index))
>  * aws region name
>  * dynamodb attribute datatypes
> STRUCTS (already defined in ATLAS-2708 ) for:
>  * aws_cloud_watch_metric
>  * aws_tag



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3569) AWS Dynamodb type def for Atlas

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3569:
--
Description: 
Currently the base types in Atlas do not include AWS Dynamodb tables. It would 
be nice to add typedefs for them.  

ENTITIES could include:

~dynamodb table with fields:
 * table_arn (AWS table id)
 * create_time
 * aws_account that the table is created in 
 * aws_region
 * aws_tags associated with the table
 * cloudwatch metrics associated with the table

~dynamodb attribute with fields:
 * type
 * optional string representing the struct definition of map or list type 
attribute

~dynamodb index with field:
 * index type (Local or Global Secondary Index)

RELATIONSHIPS could include:
 * table to attribute that is its primary partition key
 * table to attribute that is its primary sort key
 * table to other attributes 
 * table to indexes
 * index to partition key
 * index to sort key
 * attribute to schema (for highly nested attributes that are best understood 
via JSON or avro sxhemas)

 ENUMS for:
 * index type ((Local or Global Secondary Index))
 * aws region name
 * dynamodb attribute datatypes

STRUCTS (already defined in ATLAS-2708 ) for:
 * aws_cloud_watch_metric
 * aws_tag

  was:
Currently the base types in Atlas do not include AWS Dynamodb tables. It would 
be nice to add typedefs for them.  

ENTITIES could include:
 - dynamodb table with fields:

 * table_arn (AWS table id)
 * create_time
 * aws_account that the table is created in 
 * aws_region
 * aws_tags associated with the table
 * cloudwatch metrics associated with the table

-  dynamodb attribute with fields:
 * type
 * optional string representing the struct definition of map or list type 
attribute

-dynamodb index with field:
 * index type (Local or Global Secondary Index)

RELATIONSHIPS could include:
 * table to attribute that is its primary partition key
 * table to attribute that is its primary sort key
 * table to other attributes 
 * table to indexes
 * index to partition key
 * index to sort key
 * attribute to schema (for highly nested attributes that are best understood 
via JSON or avro sxhemas)

 ENUMS for:
 * index type ((Local or Global Secondary Index))
 * aws region name
 * dynamodb attribute datatypes

STRUCTS (already defined in ATLAS-2708 ) for:
 * aws_cloud_watch_metric
 * aws_tag


> AWS Dynamodb type def for Atlas
> ---
>
> Key: ATLAS-3569
> URL: https://issues.apache.org/jira/browse/ATLAS-3569
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
> Attachments: aws_dynamodb_typedef
>
>
> Currently the base types in Atlas do not include AWS Dynamodb tables. It 
> would be nice to add typedefs for them.  
> ENTITIES could include:
> ~dynamodb table with fields:
>  * table_arn (AWS table id)
>  * create_time
>  * aws_account that the table is created in 
>  * aws_region
>  * aws_tags associated with the table
>  * cloudwatch metrics associated with the table
> ~dynamodb attribute with fields:
>  * type
>  * optional string representing the struct definition of map or list type 
> attribute
> ~dynamodb index with field:
>  * index type (Local or Global Secondary Index)
> RELATIONSHIPS could include:
>  * table to attribute that is its primary partition key
>  * table to attribute that is its primary sort key
>  * table to other attributes 
>  * table to indexes
>  * index to partition key
>  * index to sort key
>  * attribute to schema (for highly nested attributes that are best understood 
> via JSON or avro sxhemas)
>  ENUMS for:
>  * index type ((Local or Global Secondary Index))
>  * aws region name
>  * dynamodb attribute datatypes
> STRUCTS (already defined in ATLAS-2708 ) for:
>  * aws_cloud_watch_metric
>  * aws_tag



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3569) AWS Dynamodb type def for Atlas

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3569:
--
Attachment: aws_dynamodb_typedef

> AWS Dynamodb type def for Atlas
> ---
>
> Key: ATLAS-3569
> URL: https://issues.apache.org/jira/browse/ATLAS-3569
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
> Attachments: aws_dynamodb_typedef
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3569) AWS Dynamodb type def for Atlas

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-3569:
--
Description: 
Currently the base types in Atlas do not include AWS Dynamodb tables. It would 
be nice to add typedefs for them.  

ENTITIES could include:
 - dynamodb table with fields:

 * table_arn (AWS table id)
 * create_time
 * aws_account that the table is created in 
 * aws_region
 * aws_tags associated with the table
 * cloudwatch metrics associated with the table

-  dynamodb attribute with fields:
 * type
 * optional string representing the struct definition of map or list type 
attribute

-dynamodb index with field:
 * index type (Local or Global Secondary Index)

RELATIONSHIPS could include:
 * table to attribute that is its primary partition key
 * table to attribute that is its primary sort key
 * table to other attributes 
 * table to indexes
 * index to partition key
 * index to sort key
 * attribute to schema (for highly nested attributes that are best understood 
via JSON or avro sxhemas)

 ENUMS for:
 * index type ((Local or Global Secondary Index))
 * aws region name
 * dynamodb attribute datatypes

STRUCTS (already defined in ATLAS-2708 ) for:
 * aws_cloud_watch_metric
 * aws_tag

> AWS Dynamodb type def for Atlas
> ---
>
> Key: ATLAS-3569
> URL: https://issues.apache.org/jira/browse/ATLAS-3569
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
> Attachments: aws_dynamodb_typedef
>
>
> Currently the base types in Atlas do not include AWS Dynamodb tables. It 
> would be nice to add typedefs for them.  
> ENTITIES could include:
>  - dynamodb table with fields:
>  * table_arn (AWS table id)
>  * create_time
>  * aws_account that the table is created in 
>  * aws_region
>  * aws_tags associated with the table
>  * cloudwatch metrics associated with the table
> -  dynamodb attribute with fields:
>  * type
>  * optional string representing the struct definition of map or list type 
> attribute
> -dynamodb index with field:
>  * index type (Local or Global Secondary Index)
> RELATIONSHIPS could include:
>  * table to attribute that is its primary partition key
>  * table to attribute that is its primary sort key
>  * table to other attributes 
>  * table to indexes
>  * index to partition key
>  * index to sort key
>  * attribute to schema (for highly nested attributes that are best understood 
> via JSON or avro sxhemas)
>  ENUMS for:
>  * index type ((Local or Global Secondary Index))
>  * aws region name
>  * dynamodb attribute datatypes
> STRUCTS (already defined in ATLAS-2708 ) for:
>  * aws_cloud_watch_metric
>  * aws_tag



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-3569) AWS Dynamodb type def for Atlas

2019-12-20 Thread Barbara Eckman (Jira)
Barbara Eckman created ATLAS-3569:
-

 Summary: AWS Dynamodb type def for Atlas
 Key: ATLAS-3569
 URL: https://issues.apache.org/jira/browse/ATLAS-3569
 Project: Atlas
  Issue Type: New Feature
Reporter: Barbara Eckman






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman reassigned ATLAS-2915:
-

Assignee: Barbara Eckman

> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream, inheriting from 
> DataSet.  Attributes would include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * schemas: array of schema objects associated with the kinesis stream. 
> Typically avro schemas but could be JSON schema, etc.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman reassigned ATLAS-2915:
-

Assignee: (was: Barbara Eckman)

> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream, inheriting from 
> DataSet.  Attributes would include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * schemas: array of schema objects associated with the kinesis stream. 
> Typically avro schemas but could be JSON schema, etc.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2019-12-20 Thread Barbara Eckman (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2915:
--
Description: 
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream, inheriting from DataSet.  
Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * schemas: array of schema objects associated with the kinesis stream. 
Typically avro schemas but could be JSON schema, etc.

 

  was:
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream, inheriting from DataSet.  
Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * avroSchemas: array of avro schema objects (see ATLAS-2694) associated with 
the kinesis stream.

 


> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream, inheriting from 
> DataSet.  Attributes would include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * schemas: array of schema objects associated with the kinesis stream. 
> Typically avro schemas but could be JSON schema, etc.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman reassigned ATLAS-2915:
-

Assignee: Barbara Eckman

> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Major
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream, inheriting from 
> DataSet.  Attributes would include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * avroSchemas: array of avro schema objects (see ATLAS-2694) associated with 
> the kinesis stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2915:
--
Description: 
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream, inheriting from DataSet.  
Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * avroSchemas: array of avro schema objects (see ATLAS-2694) associated with 
the kinesis stream.

 

  was:
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream.  Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * avroSchemas: array of avro schema objects (see ATLAS-2694) associated with 
the kinesis stream.

 


> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream, inheriting from 
> DataSet.  Attributes would include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * avroSchemas: array of avro schema objects (see ATLAS-2694) associated with 
> the kinesis stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2915:
--
Description: 
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream.  Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * avroSchemas: array of avro schema objects (see ATLAS-2694) associated with 
the kinesis stream.

 

  was:
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream.  Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * avroSchemas: array of avro schema objects (see ATLAS-) associated with 
the kinesis stream.

 


> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream.  Attributes would 
> include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * avroSchemas: array of avro schema objects (see ATLAS-2694) associated with 
> the kinesis stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2915:
--
Description: 
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream.  Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * avroSchemas: array of avro schema objects (see ATLAS-) associated with 
the kinesis stream.

 

  was:
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream.  Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * avroSchemas: array of avro schema objects (see ATLAS-associated with the 
kinesis stream.

 


> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Critical
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream.  Attributes would 
> include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * avroSchemas: array of avro schema objects (see ATLAS-) associated with 
> the kinesis stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2915:
--
Priority: Major  (was: Critical)

> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Major
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream.  Attributes would 
> include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * avroSchemas: array of avro schema objects (see ATLAS-) associated with 
> the kinesis stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2915:
--
Description: 
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream.  Attributes would include:
 * streamType string, eg ""Single Region Stream".
 * awsRegion string: the AWS region in which the kinesis stream endpoint is 
deployed
 * shardCount int:  number of shards (uniquely identified sequence of data 
records) in the stream
 * streamEnvironment enum.  Valid values are "unknown", "production", 
"staging", "QA" and "development"
 * containsPII boolean: does this stream's data contain Personally Identifiable 
Information?
 * aggregationFormat enum. Indicates if/how the records are aggregated within a 
single kinesis record. Valid values are "none" or "kpl".
 * contentType enum: serialization format used by the producer of the stream.  
Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", "kryo", 
"protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style avro with 
envelope that specifies schema id surrounding the payload], "thrift", "tlv", 
"xml", "other".
 * schemaURL string: A URL to the data schema used by the producer, to 
facilitate consumption.
 * avroSchemas: array of avro schema objects (see ATLAS-associated with the 
kinesis stream.

 

  was:
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream.  For example:

 


> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Critical
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream.  Attributes would 
> include:
>  * streamType string, eg ""Single Region Stream".
>  * awsRegion string: the AWS region in which the kinesis stream endpoint is 
> deployed
>  * shardCount int:  number of shards (uniquely identified sequence of data 
> records) in the stream
>  * streamEnvironment enum.  Valid values are "unknown", "production", 
> "staging", "QA" and "development"
>  * containsPII boolean: does this stream's data contain Personally 
> Identifiable Information?
>  * aggregationFormat enum. Indicates if/how the records are aggregated within 
> a single kinesis record. Valid values are "none" or "kpl".
>  * contentType enum: serialization format used by the producer of the stream. 
>  Valid values are "unknown", "avro", "bson", "csv", "json", "key-value", 
> "kryo", "protobuf", "raw" [ie no consistent schema], "sdp" [confluent-style 
> avro with envelope that specifies schema id surrounding the payload], 
> "thrift", "tlv", "xml", "other".
>  * schemaURL string: A URL to the data schema used by the producer, to 
> facilitate consumption.
>  * avroSchemas: array of avro schema objects (see ATLAS-associated with the 
> kinesis stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2915:
--
Description: 
Currently the base types in Atlas do not include AWS Kinesis Stream objects. It 
would be nice to add a typedef for a kinesis stream.  For example:

 

> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Critical
>
> Currently the base types in Atlas do not include AWS Kinesis Stream objects. 
> It would be nice to add a typedef for a kinesis stream.  For example:
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2915) AWS Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2915:
--
Summary: AWS Kinesis Stream Typedef for Atlas  (was: Kinesis Stream Typedef 
for Atlas)

> AWS Kinesis Stream Typedef for Atlas
> 
>
> Key: ATLAS-2915
> URL: https://issues.apache.org/jira/browse/ATLAS-2915
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2915) Kinesis Stream Typedef for Atlas

2018-10-08 Thread Barbara Eckman (JIRA)
Barbara Eckman created ATLAS-2915:
-

 Summary: Kinesis Stream Typedef for Atlas
 Key: ATLAS-2915
 URL: https://issues.apache.org/jira/browse/ATLAS-2915
 Project: Atlas
  Issue Type: New Feature
Reporter: Barbara Eckman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-09-20 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622266#comment-16622266
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~ayushmnnit] Agreed.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.1.0, 2.0.0
>
> Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, 
> ATLAS-2708.patch, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-09-19 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621077#comment-16621077
 ] 

Barbara Eckman commented on ATLAS-2708:
---

It doesn't do it automatically through a listener like the hive hook.  We do it 
via lambda functions, triggered, say, on the creation of S3 object or 
pseudodirectory or bucket.  We package up the info into AtlasEntities and then 
publish to the ATLAS_HOOK kafka topic.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.1.0, 2.0.0
>
> Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, 
> ATLAS-2708.patch, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-09-19 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621077#comment-16621077
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 9/19/18 7:16 PM:


[~toopt4]   It doesn't do it automatically through a listener like the hive 
hook.  We do it via lambda functions, triggered, say, on the creation of S3 
object or pseudodirectory or bucket.  We package up the info into AtlasEntities 
and then publish to the ATLAS_HOOK kafka topic.


was (Author: barbara):
It doesn't do it automatically through a listener like the hive hook.  We do it 
via lambda functions, triggered, say, on the creation of S3 object or 
pseudodirectory or bucket.  We package up the info into AtlasEntities and then 
publish to the ATLAS_HOOK kafka topic.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.1.0, 2.0.0
>
> Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, 
> ATLAS-2708.patch, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-08-27 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593833#comment-16593833
 ] 

Barbara Eckman commented on ATLAS-2724:
---

Thanks, [~kevalbhatt]!!

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.8-incubating
>Reporter: Barbara Eckman
>Assignee: Keval Bhatt
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: 0001-Add-pretty-printed-json-values-in-tables.patch, 
> ATLAS-2724-master.patch, ATLAS-2724.patch, new_table.png
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-08-24 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591767#comment-16591767
 ] 

Barbara Eckman commented on ATLAS-2724:
---

[~kevalbhatt] sure, let's go with [^ATLAS-2724.patch] . We can always revisit 
in future to tweak or add new features. Thanks a lot!  

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.8-incubating
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: 0001-Add-pretty-printed-json-values-in-tables.patch, 
> ATLAS-2724-master.patch, ATLAS-2724.patch, new_table.png
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-08-20 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586309#comment-16586309
 ] 

Barbara Eckman commented on ATLAS-2724:
---

[~kevalbhatt]  Sorry the patch didn't apply on branch-0.8.  Must have been an 
oversight. 

Does your version work on maps and structs as well as JSON-valued strings?  
Looks like it would, and we would LOVE that.   

I'm not sure about applying it to arrays, though.  I think by default most 
users would want to see all columns of a hive table, for example, without 
having to scroll through. Though we do have some RDBMS tables with over 100 
columns, and collapsing them makes a lot of sense.   Perhaps for arrays the 
cutoff shouldn't be 1 but something like 10?  What are your thoughts on this? 

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.8-incubating
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: 0001-Add-pretty-printed-json-values-in-tables.patch, 
> ATLAS-2724-master.patch, ATLAS-2724.patch, new_table.png
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-08-14 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580339#comment-16580339
 ] 

Barbara Eckman commented on ATLAS-2724:
---

[~kevalbhatt] I just uploaded the patch file, developed by [~tstefanovicz]

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.8-incubating
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: 0001-Add-pretty-printed-json-values-in-tables.patch
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-08-14 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580339#comment-16580339
 ] 

Barbara Eckman edited comment on ATLAS-2724 at 8/14/18 7:49 PM:


[~kevalbhatt] I just uploaded the patch file, developed by [~tstefanovicz].  


was (Author: barbara):
[~kevalbhatt] I just uploaded the patch file, developed by [~tstefanovicz]

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.8-incubating
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: 0001-Add-pretty-printed-json-values-in-tables.patch
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-08-14 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2724:
--
Attachment: 0001-Add-pretty-printed-json-values-in-tables.patch

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.8-incubating
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: 0001-Add-pretty-printed-json-values-in-tables.patch
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-08-14 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2724:
--
Attachment: (was: atlas0_8UIChanges.tar)

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.8-incubating
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: 0001-Add-pretty-printed-json-values-in-tables.patch
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-07-31 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564344#comment-16564344
 ] 

Barbara Eckman commented on ATLAS-2724:
---

[~kevalbhatt]  Hi, we're working on it this sprint. Sorry for the delay.  Q2 
deliverables and Q3 planning got in the way.

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.8-incubating
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: atlas0_8UIChanges.tar
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2790) Allow path expressions as attributeNames in Atlas basic search entityFilters

2018-07-17 Thread Barbara Eckman (JIRA)
Barbara Eckman created ATLAS-2790:
-

 Summary: Allow path expressions as attributeNames in Atlas basic 
search entityFilters
 Key: ATLAS-2790
 URL: https://issues.apache.org/jira/browse/ATLAS-2790
 Project: Atlas
  Issue Type: New Feature
  Components:  atlas-core
Reporter: Barbara Eckman


It would be nice if a search on a complex entity could perform a search on 
attributes of nested sub-entities or structs, as well as just top-level 
attributes.  

Here is an example of an Atlas basic search request object:
{
 "typeName": "hive_table",
 "excludeDeletedEntities": true,
 "classification" : "",
 "query": "",
 "limit": 25,
 "offset": 0,
 "entityFilters": {
 "attributeName": "name",
 "operator": "contains",
 "attributeValue": "testtable"
 },
 "tagFilters": null,
 "attributes": [""]
}

Here, attributeName must be one of the top-level attributes of the hive table 
entity.

This ticket requests that path expressions to attributes of sub-entities or 
structs also be allowed as attributeNames, eg column.name  (similar to DSL 
search).   This would execute a search on the name attribute of columns that 
are associated with the hive table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 7:31 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little earlier on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object. I have added it.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object. I have added it.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_datalake_typedefs_v2.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_AWS_common_typedefs_v2.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514229#comment-16514229
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 7:28 PM:


[~bosco] 
{quote}You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy 
is a list of Statement Structure. If we are not using it now, we should 
probably remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket
{quote}
 You're right, it is a list of statement structure.  We made it a string 
because we only need to display it, and because we didn't want to bother 
parsing the json we got from AWS API and putting it into a structured Atlas 
entity. (blush)  We are using it, so I created a placeholder S3AccessPolicy 
structure that consists of a string now, but can be expanded into the structure 
when someone needs/wants it. 


was (Author: barbara):
[~bosco] 

 bq. 

You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is a 
list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514229#comment-16514229
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 7:28 PM:


[~bosco] 
{quote}You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy 
is a list of Statement Structure. If we are not using it now, we should 
probably remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket
{quote}
 You're right, it is a list of statement structure.  We made it a string 
because we only need to display it, and because we didn't want to bother 
parsing the json we got from AWS API and putting it into a structured Atlas 
entity. (blush)  We are using it, so I created a placeholder S3AccessPolicy 
structure that consists of a string now, but can be expanded into the structure 
when someone needs/wants it. 

My new jsons are all_AWS_common_typedefs_v2.json and 
all_datalake_typedefs_v2.json


was (Author: barbara):
[~bosco] 
{quote}You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy 
is a list of Statement Structure. If we are not using it now, we should 
probably remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket
{quote}
 You're right, it is a list of statement structure.  We made it a string 
because we only need to display it, and because we didn't want to bother 
parsing the json we got from AWS API and putting it into a structured Atlas 
entity. (blush)  We are using it, so I created a placeholder S3AccessPolicy 
structure that consists of a string now, but can be expanded into the structure 
when someone needs/wants it. 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 

[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514229#comment-16514229
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~bosco] 

 bq. You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is 
a list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514229#comment-16514229
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:44 PM:


[~bosco] 

 bq. 

You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is a 
list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket


was (Author: barbara):
[~bosco] 

 bq. You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is 
a list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514222#comment-16514222
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~madhan.neethiraj] I agree with all your changes except one: 
 - is avroSchema applicable for AWSS3PseudoDir?  
 ** Yes, for those that don't model objects and stop at pseudo (like us).  We 
recommend that there be a 1:1 relationship between pseudo and avro_schema, but 
we can't enforce that, so it's an array of schemas in the pseudo type.  I have 
changed it to a single schema in the object type as you suggest.

I am breaking your file into two jsons, one datalake-specific and one general 
AWS, as discussed with [~bosco]. 

Nice to see an example of relationshipDefs.  Can you please point me to a place 
where the semantics of these attributes is documented?  I'd like to start using 
them.

 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:31 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object. I have added it.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:21 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:16 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:02 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:01 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  
[Mine|[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]]
 is just a little later on the same day!

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:00 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  
[Mine|[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]]
 is just a little later on the same day!


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-14 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512944#comment-16512944
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/14/18 10:43 PM:
-

[~bosco]  good point, I should have a backpointer from AWSS3Pseudo to 
AWSS3Bucket. 

I just uploaded two jsons: 

1) all_datalake_typedefs.json, for datalake objects specifically (including 
AWSS3Object with backpointer to AWSS3Pseudo, and AWSS3Pseudo with backpointer 
to AWSS3Bucket). 

2) all_AWS_common_typedefs.json, for general AWS entities (Tags and 
CloudWatchMetrics).  LifeCycleRules are actually for buckets only, apparently, 
so I left them in the datalake json.

I didn't change avro_schema to Schema superclass yet, because I couldn't find 
the typedef for it.


was (Author: barbara):
[~bosco]  good point, I should have a backpointer from AWSS3Pseudo to 
AWSS3Bucket.  I will add that when I add the AWSS3Object typedef (with 
backpointer to AWSS3Pseudo).  And I take your preference to be two jsons (one 
for S3-specific and one for more general AWS entities).  I will do that too. 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_AWS_common_typedefs.json, all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-14 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_AWS_common_typedefs.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_AWS_common_typedefs.json, all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-14 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: (was: all_datalake_typedefs.json)

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_AWS_common_typedefs.json, all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-14 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_datalake_typedefs.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_typedefs.json, all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-14 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512944#comment-16512944
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~bosco]  good point, I should have a backpointer from AWSS3Pseudo to 
AWSS3Bucket.  I will add that when I add the AWSS3Object typedef (with 
backpointer to AWSS3Pseudo).  And I take your preference to be two jsons (one 
for S3-specific and one for more general AWS entities).  I will do that too. 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2694) Avro schema typedef and support for Avro schema evolution in Atlas

2018-06-14 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512842#comment-16512842
 ] 

Barbara Eckman commented on ATLAS-2694:
---

Hi [~nagaraj_janardhana] I'm glad you find it useful!   We actually do have a 
parser to convert from the Avro schema to the Avro schema AtlasEntity.  I will 
create a Jira for it in the next couple of days. :)

> Avro schema typedef and support for Avro schema evolution  in Atlas
> ---
>
> Key: ATLAS-2694
> URL: https://issues.apache.org/jira/browse/ATLAS-2694
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Srikanth Venkat
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: ATLAS-2694-2.patch, ATLAS-2694.patch, 
> avro_atlas_types_08.json
>
>
> Currently the base types in Atlas do not include Avro schemas. It would be 
> nice to add typedef for Avro schema and any associated metadata to support 
> schema evolution.
>  * For example, Avro_schema type supports:
>  ** All avro types, both primitive and complex, including union types, as 
> fields of schema
>  ** All types have doc strings and defaults
>  ** A field of a schema can be another schema
>  ** Indefinite nesting of records, arrays.
>  ** Associated entities array attribute contains pointers to all datasets 
> that reflect the avro schema
>  ** Fully expanded avroNotation for use in serDe
>  ** Schema evolution features such as isLatest (Boolean) and version number
>  * Schema evolution Process
>  ** Input: avro schema
>  ** Output: new version of avro schema
>  ** Compatibility: FULL, BACKWARD, FORWARD, NONE
>  ** IsBreakingChange (Boolean): does the change produce an incompatible 
> schema? (ie its compatibility is not “FULL”)
>  *



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-13 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511218#comment-16511218
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/13/18 2:35 PM:


[~davidrad] OK... can you point me to documentation or examples of typedefs 
using this other style? I found the box and arrow diagrams but no examples.


was (Author: barbara):
OK... can you point me to documentation or examples of typedefs using this 
other style?  I found the box and arrow diagrams but no examples. 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-13 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511218#comment-16511218
 ] 

Barbara Eckman commented on ATLAS-2708:
---

OK... can you point me to documentation or examples of typedefs using this 
other style?  I found the box and arrow diagrams but no examples. 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-13 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511216#comment-16511216
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~bosco], Great, I'm glad they will be useful.  

1.  We actually didn't model each object (file), because we don't plan to store 
metadata on each object.  But I can whip up an object typedef for you.  I'd 
imagine attributes like:  pseudodir it's in, compression format, creation time. 
 For us they'd be .avro files, so I'd include an avro schema.  For you and 
others:  would you have a .csv "schema" associated with the object?  Or a JSON 
schema? I could include an attribute of type Schema rather than avro_schema, so 
any schema can be associated with it.  Any other attributes you can think of?

2. Do you mean there'd be a separate Jira ticket for AWS common Entities like 
Tags, Permissions, CloudWatchMetrics, etc, instead of lumping them together 
with the S3 entities?  Then AWSS3Bucket and AWSDynamoDB could each have 
attributes of type AWSTag.  In terms of CloudWatchMetrics, for Dynamo, are 
metric name and scope sufficient?  I'd have to check.  

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-10 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507532#comment-16507532
 ] 

Barbara Eckman commented on ATLAS-2708:
---

I'm sorry, that was stupid.  I'm attaching a file that reflects your 
suggestions.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-10 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_datalake_typedefs.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-10 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: (was: all_datalake_types.json)

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-05-29 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_datalake_types.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_datalake_types.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-05-29 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_datalake_types.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-05-29 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: (was: all_datalake_types.json)

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-05-29 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman reassigned ATLAS-2708:
-

Assignee: Barbara Eckman

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-05-29 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2724:
--

This enhancement was developed by [~tstefanovicz]

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: atlas0_8UIChanges.tar
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-05-29 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2724:
--
Attachment: atlas0_8UIChanges.tar

> UI enhancement for Avro schemas and other JSON-valued attributes
> 
>
> Key: ATLAS-2724
> URL: https://issues.apache.org/jira/browse/ATLAS-2724
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: atlas0_8UIChanges.tar
>
>
> Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2724) UI enhancement for Avro schemas and other JSON-valued attributes

2018-05-29 Thread Barbara Eckman (JIRA)
Barbara Eckman created ATLAS-2724:
-

 Summary: UI enhancement for Avro schemas and other JSON-valued 
attributes
 Key: ATLAS-2724
 URL: https://issues.apache.org/jira/browse/ATLAS-2724
 Project: Atlas
  Issue Type: New Feature
Reporter: Barbara Eckman
Assignee: Barbara Eckman


Currently JSON-valued attributes are fully displayed in-line with other 
attributes, not pretty-printed, cluttering the display.  To support a better 
display, we can display JSON-valued attributes in a one-line box that can be 
scrolled down, or fully expanded with a mouse click that pretty-prints the 
JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2694) Avro schema typedef and support for Avro schema evolution in Atlas

2018-05-29 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2694:
--
Description: 
Currently the base types in Atlas do not include Avro schemas. It would be nice 
to add typedef for Avro schema and any associated metadata to support schema 
evolution.
 * For example, Avro_schema type supports:
 ** All avro types, both primitive and complex, including union types, as 
fields of schema
 ** All types have doc strings and defaults
 ** A field of a schema can be another schema
 ** Indefinite nesting of records, arrays.
 ** Associated entities array attribute contains pointers to all datasets that 
reflect the avro schema
 ** Fully expanded avroNotation for use in serDe
 ** Schema evolution features such as isLatest (Boolean) and version number
 * Schema evolution Process
 ** Input: avro schema
 ** Output: new version of avro schema
 ** Compatibility: FULL, BACKWARD, FORWARD, NONE
 ** IsBreakingChange (Boolean): does the change produce an incompatible schema? 
(ie its compatibility is not “FULL”)
 *

  was:
Currently the base types in Atlas do not include Avro schemas. It would be nice 
to add typedef for Avro schema and any associated metadata to support schema 
evolution.
 * For example, Avro_schema type supports:
 ** All avro types, both primitive and complex, including union types, as 
fields of schema
 ** All types have doc strings and defaults
 ** A field of a schema can be another schema
 ** Indefinite nesting of records, arrays.
 ** Associated entities array attribute contains pointers to all datasets that 
reflect the avro schema
 ** Fully expanded avroNotation for use in serDe
 ** Schema evolution features such as isLatest (Boolean) and version number
 * Schema evolution Process
 ** Input: avro schema
 ** Output: new version of avro schema
 ** Compatibility: FULL, BACKWARD, FORWARD, NONE
 ** IsBreakingChange (Boolean): does the change produce an incompatible schema? 
(ie its compatibility is not “FULL”)
 * Atlas UI enhancement for JSON-valued attributes to support avro schema and 
avro schema evolution
 ** Currently JSON-valued attributes are fully displayed in-line with other 
attributes, not pretty-printed, cluttering the display.  To support a better 
display, we can display JSON-valued attributes in a one-line box that can be 
scrolled down, or fully expanded with a mouse click that pretty-prints the 
JSON. 


Moving UI enhancement for avro schemas to a separate Jira 

> Avro schema typedef and support for Avro schema evolution  in Atlas
> ---
>
> Key: ATLAS-2694
> URL: https://issues.apache.org/jira/browse/ATLAS-2694
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Srikanth Venkat
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: ATLAS-2694-2.patch, ATLAS-2694.patch, 
> avro_atlas_types_08.json
>
>
> Currently the base types in Atlas do not include Avro schemas. It would be 
> nice to add typedef for Avro schema and any associated metadata to support 
> schema evolution.
>  * For example, Avro_schema type supports:
>  ** All avro types, both primitive and complex, including union types, as 
> fields of schema
>  ** All types have doc strings and defaults
>  ** A field of a schema can be another schema
>  ** Indefinite nesting of records, arrays.
>  ** Associated entities array attribute contains pointers to all datasets 
> that reflect the avro schema
>  ** Fully expanded avroNotation for use in serDe
>  ** Schema evolution features such as isLatest (Boolean) and version number
>  * Schema evolution Process
>  ** Input: avro schema
>  ** Output: new version of avro schema
>  ** Compatibility: FULL, BACKWARD, FORWARD, NONE
>  ** IsBreakingChange (Boolean): does the change produce an incompatible 
> schema? (ie its compatibility is not “FULL”)
>  *



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2694) Avro schema typedef and support for Avro schema evolution in Atlas

2018-05-29 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493710#comment-16493710
 ] 

Barbara Eckman commented on ATLAS-2694:
---

I want to make sure to give Vadim Vaks of Hortonworks credit as a collaborator 
on this work.  He's been a great collaborator on our Atlas journey in a variety 
of ways, and has added immense value to our work.

> Avro schema typedef and support for Avro schema evolution  in Atlas
> ---
>
> Key: ATLAS-2694
> URL: https://issues.apache.org/jira/browse/ATLAS-2694
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Srikanth Venkat
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: ATLAS-2694-2.patch, ATLAS-2694.patch, 
> avro_atlas_types_08.json
>
>
> Currently the base types in Atlas do not include Avro schemas. It would be 
> nice to add typedef for Avro schema and any associated metadata to support 
> schema evolution.
>  * For example, Avro_schema type supports:
>  ** All avro types, both primitive and complex, including union types, as 
> fields of schema
>  ** All types have doc strings and defaults
>  ** A field of a schema can be another schema
>  ** Indefinite nesting of records, arrays.
>  ** Associated entities array attribute contains pointers to all datasets 
> that reflect the avro schema
>  ** Fully expanded avroNotation for use in serDe
>  ** Schema evolution features such as isLatest (Boolean) and version number
>  * Schema evolution Process
>  ** Input: avro schema
>  ** Output: new version of avro schema
>  ** Compatibility: FULL, BACKWARD, FORWARD, NONE
>  ** IsBreakingChange (Boolean): does the change produce an incompatible 
> schema? (ie its compatibility is not “FULL”)
>  * Atlas UI enhancement for JSON-valued attributes to support avro schema and 
> avro schema evolution
>  ** Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2709) RDBMS typedefs for Atlas

2018-05-25 Thread Barbara Eckman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2709:
--
Attachment: rdbms_typedefs.tar

> RDBMS typedefs for Atlas
> 
>
> Key: ATLAS-2709
> URL: https://issues.apache.org/jira/browse/ATLAS-2709
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Priority: Critical
> Attachments: rdbms_typedefs.tar
>
>
> Currently the base types in Atlas do not include RDMBS objects. It would be 
> nice to add generic typedefs for the basic objects found in virtually any 
> RDBMS.  From this, one can subclass types specific to Oracle, MS SQL Server, 
> etc if desired.  For example:
>  
>  * rdbms_instance represents the host platform that the database is running 
> on. It supports:
>  ** rdbms_type (eg Oracle, mysql) 
>  ** hostname
>  ** port
>  ** protocol
>  ** platform
>  ** contact_info for the instance owner
>  ** array of databases (schemas) associated with the instance
>  
>  * rdbms_db represents a database (schema) running on an rdbms_instance. It 
> supports:
>  ** inverse reference to the rdbms_instance
>  ** contact_info for the database owner
>  ** prodOrOther: a self-documenting attribute name representing whether the 
> database is production, development, staging, etc
>  ** array of tables in the database
>  
>  * rdbms_table represents a table in a database (schema). It supports:
>  ** inverse reference to the rdbms_db
>  ** time of creation
>  ** comment
>  ** type (e.g., table or view)
>  ** contact_info for the table owner
>  ** array of columns in the table
>  ** array of indexes on the table
>  ** array of foreign keys defined on the table
>  
>  * rdbms_column represents a column in a table. It supports:
>  ** data_type of the column
>  ** length
>  ** default_value
>  ** comment
>  ** inverse reference to the rdbms_table
>  ** isNullable boolean
>  ** isPrimaryKey boolean
>  * rdbms_index represents an index on a set of columns in a table. It 
> supports:
>  ** inverse reference to the rdbms_table
>  ** index_type (e.g., "NORMAL", "BITMAP", "DOMAIN")
>  ** isUnique boolean
>  ** ordered list of columns in the index
>  ** comment
>  
>  * rdbms_foreign_key represents a foreign key relationship between columns in 
> source and referenced tables.  It supports:
>  ** inverse reference to the source table
>  ** key_columns: ordered list of columns in the source table
>  ** references_table: table that the foreign key references
>  ** references_columns: ordered list of columns in the referenced table
>  ** comment
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2694) Avro schema typedef and support for Avro schema evolution in Atlas

2018-05-25 Thread Barbara Eckman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2694:
--
Attachment: avro_atlas_types_08.json

> Avro schema typedef and support for Avro schema evolution  in Atlas
> ---
>
> Key: ATLAS-2694
> URL: https://issues.apache.org/jira/browse/ATLAS-2694
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Srikanth Venkat
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: avro_atlas_types_08.json
>
>
> Currently the base types in Atlas do not include Avro schemas. It would be 
> nice to add typedef for Avro schema and any associated metadata to support 
> schema evolution.
>  * For example, Avro_schema type supports:
>  ** All avro types, both primitive and complex, including union types, as 
> fields of schema
>  ** All types have doc strings and defaults
>  ** A field of a schema can be another schema
>  ** Indefinite nesting of records, arrays.
>  ** Associated entities array attribute contains pointers to all datasets 
> that reflect the avro schema
>  ** Fully expanded avroNotation for use in serDe
>  ** Schema evolution features such as isLatest (Boolean) and version number
>  * Schema evolution Process
>  ** Input: avro schema
>  ** Output: new version of avro schema
>  ** Compatibility: FULL, BACKWARD, FORWARD, NONE
>  ** IsBreakingChange (Boolean): does the change produce an incompatible 
> schema? (ie its compatibility is not “FULL”)
>  * Atlas UI enhancement for JSON-valued attributes to support avro schema and 
> avro schema evolution
>  ** Currently JSON-valued attributes are fully displayed in-line with other 
> attributes, not pretty-printed, cluttering the display.  To support a better 
> display, we can display JSON-valued attributes in a one-line box that can be 
> scrolled down, or fully expanded with a mouse click that pretty-prints the 
> JSON. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2696) Typedef extensions for Kafka in Atlas

2018-05-25 Thread Barbara Eckman (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491199#comment-16491199
 ] 

Barbara Eckman commented on ATLAS-2696:
---

Notes on the attributes:

While any of these could be specified by the topic creator, many of them are by 
default calculated from the sizing params that are entered by the user.  They 
are labeled "#user input" or "# calculated" accordingly.

{
 "category": "ENTITY",
 "guid": "c268420a-86d1-498e-ae8c-451896ec0230",
 "createdBy": "atlas",
 "updatedBy": "admin",
 "createTime": 1525358038631,
 "updateTime": 1525447432952,
 "version": 2,
 "name": "kafka_topic",
 "description": "Atlas Type representing kafka topic",
 "typeVersion": "1.0",
 "attributeDefs": [
 {
 "name": "topic",       #user input
 "typeName": "string",
 "isOptional": false,
 "cardinality": "SINGLE",
 "valuesMinCount": 1,
 "valuesMaxCount": 1,
 "isUnique": true,
 "isIndexable": true
 },
 {
 "name": "uri",   # user input
 "typeName": "string",
 "isOptional": false,
 "cardinality": "SINGLE",
 "valuesMinCount": 1,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "avroSchema",        # user input; avroSchemas associated with the 
topic. Can be multiple.
 "typeName": "array",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 2147483647,
 "isUnique": false,
 "isIndexable": true
 },
 {
 "name": "replicationFactorNational",    # calculated; We have National and 
Local kafka installations and may have different replication Factors, retention 
Bytes, partitionCounts, and segmentBytes for National vs Local.
 "typeName": "int",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "replicationFactorLocal",  # calculated
 "typeName": "int",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "retentionBytesNational",  # calculated
 "typeName": "long",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "segmentBytesNational",  # calculated
 "typeName": "long",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "segmentBytesLocal",  # calculated
 "typeName": "long",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "contactInfo",  #user input
 "typeName": "string",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "type",          # user input; type of data produced to the topic. 
E.g, "avro", "json", "csv", "raw"
 "typeName": "string",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "partitionCountLocal",  # calculated
 "typeName": "int",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "partitionCountNational",  # calculated
 "typeName": "int",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "retentionBytesLocal",  # calculated
 "typeName": "long",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "retentiontimeLocalInHrs",  # calculated
 "typeName": "int",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "retentiontimeNationalInHrs",  # calculated
 "typeName": "int",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "desiredRetentionInHrs",   # user input
 "typeName": "int",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "keyClassname",    # user input; for Cloudera-style avro schemas, 
which have keyClasses and avroClasses (here = avroSchema)
 "typeName": "string",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "numberOfEventsPerDay",  # user input
 "typeName": "long",
 "isOptional": true,
 "cardinality": "SINGLE",
 "valuesMinCount": 0,
 "valuesMaxCount": 1,
 "isUnique": false,
 "isIndexable": false
 },
 {
 "name": "maxThroughputPerSec",   # user input
 "typeName": "long",
 "isOptional": 

[jira] [Updated] (ATLAS-2696) Typedef extensions for Kafka in Atlas

2018-05-25 Thread Barbara Eckman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2696:
--
Attachment: kafka_topic_typedef.json

> Typedef extensions for Kafka in Atlas
> -
>
> Key: ATLAS-2696
> URL: https://issues.apache.org/jira/browse/ATLAS-2696
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Srikanth Venkat
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: kafka_topic_typedef.json
>
>
> Kafka topic typedef in Atlas can be improved to include many useful 
> operational (config related) and contextual metadata properties:
>  * config inputs such as max throughput per sec, average message size, 
> messages per day, retention time
>  * Sizing outputs such as number and size of partitions, replication factor
>  * Contact information on the data producer
>  * Array of avro schemas that are associated with the topic (based on Avro 
> schema extensions outlined in ATLAS-2694)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)