Build failed in Jenkins: Atlas-1.0-AllTests #271

2018-06-15 Thread Apache Jenkins Server
See 

--
[...truncated 97.38 KB...]
[INFO] Working directory: 

[INFO] Storing buildNumber: 6d3f17df30f1eb3c320bfe51bafde7f84d2e0884 at 
timestamp: 1529115079119
[WARNING] Cannot get the branch information from the git repository: 
Detecting the current branch failed: fatal: ref HEAD is not a symbolic ref

[INFO] Executing: /bin/sh -c cd 
' && 'git' 
'rev-parse' '--verify' 'HEAD'
[INFO] Working directory: 

[INFO] Storing buildScmBranch: UNKNOWN
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (rat-check) @ atlas-notification ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: **/antlr4/**
[INFO] Exclude: **/dependency-reduced-pom.xml
[INFO] Exclude: **/javax.script.ScriptEngineFactory
[INFO] Exclude: .reviewboardrc
[INFO] Exclude: 3party-licenses/**
[INFO] Exclude: **/.cache
[INFO] Exclude: **/.cache-main
[INFO] Exclude: **/.cache-tests
[INFO] Exclude: **/.checkstyle
[INFO] Exclude: **/*.txt
[INFO] Exclude: **/*.json
[INFO] Exclude: .pc/**
[INFO] Exclude: debian/**
[INFO] Exclude: .svn/**
[INFO] Exclude: .git/**
[INFO] Exclude: .gitignore
[INFO] Exclude: **/.idea/**
[INFO] Exclude: **/*.twiki
[INFO] Exclude: **/*.iml
[INFO] Exclude: **/*.json
[INFO] Exclude: **/*.log
[INFO] Exclude: **/target/**
[INFO] Exclude: **/target*/**
[INFO] Exclude: **/build/**
[INFO] Exclude: **/*.patch
[INFO] Exclude: derby.log
[INFO] Exclude: **/logs/**
[INFO] Exclude: **/.classpath
[INFO] Exclude: **/.project
[INFO] Exclude: **/.settings/**
[INFO] Exclude: **/test-output/**
[INFO] Exclude: **/mock/**
[INFO] Exclude: **/data/**
[INFO] Exclude: **/maven-eclipse.xml
[INFO] Exclude: **/.externalToolBuilders/**
[INFO] Exclude: **/build.log
[INFO] Exclude: **/.bowerrc
[INFO] Exclude: *.json
[INFO] Exclude: **/overlays/**
[INFO] Exclude: dev-support/**
[INFO] Exclude: **/users-credentials.properties
[INFO] Exclude: **/public/css/animate.min.css
[INFO] Exclude: **/public/css/bootstrap-sidebar.css
[INFO] Exclude: **/public/js/external_lib/**
[INFO] Exclude: **/node_modules/**
[INFO] Exclude: **/public/js/libs/**
[INFO] Exclude: **/atlas.data/**
[INFO] Exclude: **/${sys:atlas.data}/**
[INFO] Exclude: **/policy-store.txt
[INFO] Exclude: **/*rebel*.xml
[INFO] Exclude: **/*rebel*.xml.bak
[INFO] Exclude: **/test/resources/**
[INFO] 36 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 36 licenses.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
atlas-notification ---
[INFO] 
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ 
atlas-notification ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[INFO] Copying 2 resources to META-INF
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @ 
atlas-notification ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 21 source files to 

[INFO] 
:
 Some input files use unchecked or unsafe operations.
[INFO] 
:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.7:testResources (default-testResources) @ 
atlas-notification ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.7.0:testCompile (default-testCompile) @ 
atlas-notification ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 14 source files to 

[INFO] 
:
 Some input files use unchecked or unsafe operations.
[INFO] 
:
 Recompile with -Xlint:unchecked for details.
[INFO] 

Build failed in Jenkins: PreCommit-ATLAS-Build-Test #477-master-view-lineage-fix.patch

2018-06-15 Thread Apache Jenkins Server
See 


Changes:

[madhan] ATLAS-2744: updated Atlas website to include release-notes in Downloads

[madhan] ATLAS-2757: fix for NPE in Hive hook in handling column-rename on

--
[...truncated 738.46 KB...]
[INFO] Started ServerConnector@36df565a{HTTP/1.1,[http/1.1]}{0.0.0.0:49321}
[INFO] Started @1635828ms
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.servlet.SpringServlet getContext
INFO: Using default applicationContext
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, typesREST, of type 
org.apache.atlas.web.rest.TypesREST as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, lineageREST, of type 
org.apache.atlas.web.rest.LineageREST as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, discoveryREST, of type 
org.apache.atlas.web.rest.DiscoveryREST as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, entityREST, of type 
org.apache.atlas.web.rest.EntityREST as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, relationshipREST, of type 
org.apache.atlas.web.rest.RelationshipREST as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, glossaryREST, of type 
org.apache.atlas.web.rest.GlossaryREST as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, atlasJsonProvider, of type 
org.apache.atlas.web.util.AtlasJsonProvider as a provider class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, entityResource, of type 
org.apache.atlas.web.resources.EntityResource as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, lineageResource, of type 
org.apache.atlas.web.resources.LineageResource as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, dataSetLineageResource, of type 
org.apache.atlas.web.resources.DataSetLineageResource as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, typesResource, of type 
org.apache.atlas.web.resources.TypesResource as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, adminResource, of type 
org.apache.atlas.web.resources.AdminResource as a root resource class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, metadataDiscoveryResource, of type 
org.apache.atlas.web.resources.MetadataDiscoveryResource as a root resource 
class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, atlasBaseExceptionMapper, of type 
org.apache.atlas.web.errors.AtlasBaseExceptionMapper as a provider class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, notFoundExceptionMapper, of type 
org.apache.atlas.web.errors.NotFoundExceptionMapper as a provider class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.spi.spring.container.SpringComponentProviderFactory 
registerSpringBeans
INFO: Registering Spring bean, allExceptionMapper, of type 
org.apache.atlas.web.errors.AllExceptionMapper as a provider class
Jun 16, 2018 1:55:17 AM 
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
[INFO] Started 
o.e.j.m.p.JettyWebAppContext@91357fb{/,file://
[INFO] Started ServerConnector@3af844e8{HTTP/1.1,[http/1.1]}{0.0.0.0:31000}
[INFO] Started @1668244ms
[INFO] Started Jetty 

[jira] [Commented] (ATLAS-641) Lineage for a view created from a view seems to be confusing(from user's perspective).

2018-06-15 Thread Madhan Neethiraj (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514618#comment-16514618
 ] 

Madhan Neethiraj commented on ATLAS-641:


Pre-commit test run: 
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/477/

> Lineage for a view created from a view seems to be confusing(from user's 
> perspective).
> --
>
> Key: ATLAS-641
> URL: https://issues.apache.org/jira/browse/ATLAS-641
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.8.1, 0.8.2, 1.0.0
>Reporter: Ayub Pathan
>Assignee: Madhan Neethiraj
>Priority: Major
> Fix For: 0.8.3, 1.1.0, 2.0.0
>
> Attachments: ATLAS-641.patch
>
>
> Lineage for a table created from the view seems to be confusing(from user's 
> perspective).
> Steps to reproduce:
> {noformat}
> 0: jdbc:hive2://localhost:1/default>
> 0: jdbc:hive2://localhost:1/default> create table src (x int, y int, s 
> string);
> No rows affected (0.384 seconds)
> 0: jdbc:hive2://localhost:1/default> create view view1 as select * from 
> src;
> No rows affected (0.225 seconds)
> 0: jdbc:hive2://localhost:1/default> create table view_table as select * 
> from view1;
> INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
> INFO  : number of splits:1
> INFO  : Submitting tokens for job: job_local1883260823_0021
> INFO  : The url to track the job: http://localhost:8080/
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2016-04-06 18:15:02,538 Stage-1 map = 100%,  reduce = 0%
> INFO  : Ended Job = job_local1883260823_0021
> INFO  : Stage-4 is selected by condition resolver.
> INFO  : Stage-3 is filtered out by condition resolver.
> INFO  : Stage-5 is filtered out by condition resolver.
> INFO  : Moving data to: 
> hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-04-06_18-14-59_156_3909183228338665119-12/-ext-10001
>  from 
> hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-04-06_18-14-59_156_3909183228338665119-12/-ext-10003
> INFO  : Moving data to: hdfs://localhost:9000/user/hive/warehouse/view_table 
> from 
> hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-04-06_18-14-59_156_3909183228338665119-12/-ext-10001
> INFO  : Table default.view_table stats: [numFiles=1, numRows=0, totalSize=0, 
> rawDataSize=0]
> No rows affected (3.8 seconds)
> 0: jdbc:hive2://localhost:1/default> alter view view1 as select * from t2;
> No rows affected (0.602 seconds)
> 0: jdbc:hive2://localhost:1/default>
> {noformat}
> Check the lineage of the resultant table(view_table).
> Link showing confusing lineage: 
> https://monosnap.com/file/qaYZcJRQnNX5BsyM12hkryv70RZK1I



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ATLAS-641) Lineage for a view created from a view seems to be confusing(from user's perspective).

2018-06-15 Thread Madhan Neethiraj (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj reassigned ATLAS-641:
--

Assignee: Madhan Neethiraj

> Lineage for a view created from a view seems to be confusing(from user's 
> perspective).
> --
>
> Key: ATLAS-641
> URL: https://issues.apache.org/jira/browse/ATLAS-641
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Ayub Pathan
>Assignee: Madhan Neethiraj
>Priority: Major
>
> Lineage for a table created from the view seems to be confusing(from user's 
> perspective).
> Steps to reproduce:
> {noformat}
> 0: jdbc:hive2://localhost:1/default>
> 0: jdbc:hive2://localhost:1/default> create table src (x int, y int, s 
> string);
> No rows affected (0.384 seconds)
> 0: jdbc:hive2://localhost:1/default> create view view1 as select * from 
> src;
> No rows affected (0.225 seconds)
> 0: jdbc:hive2://localhost:1/default> create table view_table as select * 
> from view1;
> INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
> INFO  : number of splits:1
> INFO  : Submitting tokens for job: job_local1883260823_0021
> INFO  : The url to track the job: http://localhost:8080/
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2016-04-06 18:15:02,538 Stage-1 map = 100%,  reduce = 0%
> INFO  : Ended Job = job_local1883260823_0021
> INFO  : Stage-4 is selected by condition resolver.
> INFO  : Stage-3 is filtered out by condition resolver.
> INFO  : Stage-5 is filtered out by condition resolver.
> INFO  : Moving data to: 
> hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-04-06_18-14-59_156_3909183228338665119-12/-ext-10001
>  from 
> hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-04-06_18-14-59_156_3909183228338665119-12/-ext-10003
> INFO  : Moving data to: hdfs://localhost:9000/user/hive/warehouse/view_table 
> from 
> hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-04-06_18-14-59_156_3909183228338665119-12/-ext-10001
> INFO  : Table default.view_table stats: [numFiles=1, numRows=0, totalSize=0, 
> rawDataSize=0]
> No rows affected (3.8 seconds)
> 0: jdbc:hive2://localhost:1/default> alter view view1 as select * from t2;
> No rows affected (0.602 seconds)
> 0: jdbc:hive2://localhost:1/default>
> {noformat}
> Check the lineage of the resultant table(view_table).
> Link showing confusing lineage: 
> https://monosnap.com/file/qaYZcJRQnNX5BsyM12hkryv70RZK1I



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2759) Atlas start up fail with error Error creating bean with name 'setupSteps' defined in URL [jar:file:/data/apache-atlas/apache-atlas-1.0.0/server /webapp/atlas/WEB-INF/lib

2018-06-15 Thread Mahesh Kakol (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Kakol updated ATLAS-2759:

Description: 
Kindly help me fix this issue,
{code:java}
2018-06-15 21:51:39,259 WARN - [main:] ~ Failed startup of context 
o.e.j.w.WebAppContext@4e25282d{/,file:///data/apache-atlas/apache-atlas-1.0.0/server/webapp/atlas/,UNAVAILABLE
}{/data/apache-atlas/apache-atlas-1.0.0/server/webapp/atlas} (WebAppContext:529)
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'setupSteps' defined in URL 
[jar:file:/data/apache-atlas/apache-atlas-1.0.0/server
/webapp/atlas/WEB-INF/lib/atlas-webapp-1.0.0.jar!/org/apache/atlas/web/setup/SetupSteps.class]:
 Unsatisfied dependency expressed through constructor parameter 0; nested 
exception
is org.springframework.beans.factory.NoSuchBeanDefinitionException: No 
qualifying bean of type 'java.util.Set' 
available: expected at least 1 b
ean which qualifies as autowire candidate. Dependency annotations: {}
at 
org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:749)
at 
org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:189)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1201)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1103)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:513)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483)
at 
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:312)
at 
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at 
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:308)
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at 
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:761)
at 
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:867)
at 
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:543)
at 
org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:443)
at 
org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:325)
at 
org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:107)
at 
org.apache.atlas.web.setup.KerberosAwareListener.contextInitialized(KerberosAwareListener.java:31)
at 
org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:843)
at 
org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:533)
at 
org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:816)
at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:345)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1404)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1366)
at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:520)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:422)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:389)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:98)
at org.apache.atlas.Atlas.main(Atlas.java:133)
Caused by: org.springframework.beans.factory.NoSuchBeanDefinitionException: No 
qualifying bean of type 'java.util.Set' 
available: expected at le
ast 1 bean which qualifies as autowire candidate. Dependency annotations: {}
at 

[jira] [Created] (ATLAS-2759) Atlas start up fail with error Error creating bean with name 'setupSteps' defined in URL [jar:file:/data/apache-atlas/apache-atlas-1.0.0/server /webapp/atlas/WEB-INF/lib

2018-06-15 Thread Mahesh Kakol (JIRA)
Mahesh Kakol created ATLAS-2759:
---

 Summary: Atlas start up fail with error  Error creating bean with 
name 'setupSteps' defined in URL 
[jar:file:/data/apache-atlas/apache-atlas-1.0.0/server 
/webapp/atlas/WEB-INF/lib/atlas-webapp-1.0.0.jar!/org/apache/atlas/web/setup/SetupSteps.class]:"
 Key: ATLAS-2759
 URL: https://issues.apache.org/jira/browse/ATLAS-2759
 Project: Atlas
  Issue Type: Bug
  Components:  atlas-core, atlas-webui
Reporter: Mahesh Kakol


Kindly help me fix this issue,
{code:java}
2018-06-15 21:51:39,259 WARN - [main:] ~ Failed startup of context 
o.e.j.w.WebAppContext@4e25282d{/,file:///data/apache-atlas/apache-atlas-1.0.0/server/webapp/atlas/,UNAVAILABLE
}{/data/apache-atlas/apache-atlas-1.0.0/server/webapp/atlas} (WebAppContext:529)
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'setupSteps' defined in URL 
[jar:file:/data/apache-atlas/apache-atlas-1.0.0/server
/webapp/atlas/WEB-INF/lib/atlas-webapp-1.0.0.jar!/org/apache/atlas/web/setup/SetupSteps.class]:
 Unsatisfied dependency expressed through constructor parameter 0; nested 
exception
is org.springframework.beans.factory.NoSuchBeanDefinitionException: No 
qualifying bean of type 'java.util.Set' 
available: expected at least 1 b
ean which qualifies as autowire candidate. Dependency annotations: {}
at 
org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:749)
at 
org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:189)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1201)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1103)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:513)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483)
at 
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:312)
at 
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at 
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:308)
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at 
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:761)
at 
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:867)
at 
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:543)
at 
org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:443)
at 
org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:325)
at 
org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:107)
at 
org.apache.atlas.web.setup.KerberosAwareListener.contextInitialized(KerberosAwareListener.java:31)
at 
org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:843)
at 
org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:533)
at 
org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:816)
at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:345)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1404)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1366)
at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:520)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:422)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:389)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:98)
at 

Jenkins build is back to normal : Atlas-0.8-snapshot-publish #56

2018-06-15 Thread Apache Jenkins Server
See 




[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 7:31 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little earlier on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object. I have added it.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object. I have added it.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_datalake_typedefs_v2.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated ATLAS-2708:
--
Attachment: all_AWS_common_typedefs_v2.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514229#comment-16514229
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 7:28 PM:


[~bosco] 
{quote}You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy 
is a list of Statement Structure. If we are not using it now, we should 
probably remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket
{quote}
 You're right, it is a list of statement structure.  We made it a string 
because we only need to display it, and because we didn't want to bother 
parsing the json we got from AWS API and putting it into a structured Atlas 
entity. (blush)  We are using it, so I created a placeholder S3AccessPolicy 
structure that consists of a string now, but can be expanded into the structure 
when someone needs/wants it. 


was (Author: barbara):
[~bosco] 

 bq. 

You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is a 
list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514229#comment-16514229
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 7:28 PM:


[~bosco] 
{quote}You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy 
is a list of Statement Structure. If we are not using it now, we should 
probably remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket
{quote}
 You're right, it is a list of statement structure.  We made it a string 
because we only need to display it, and because we didn't want to bother 
parsing the json we got from AWS API and putting it into a structured Atlas 
entity. (blush)  We are using it, so I created a placeholder S3AccessPolicy 
structure that consists of a string now, but can be expanded into the structure 
when someone needs/wants it. 

My new jsons are all_AWS_common_typedefs_v2.json and 
all_datalake_typedefs_v2.json


was (Author: barbara):
[~bosco] 
{quote}You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy 
is a list of Statement Structure. If we are not using it now, we should 
probably remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket
{quote}
 You're right, it is a list of statement structure.  We made it a string 
because we only need to display it, and because we didn't want to bother 
parsing the json we got from AWS API and putting it into a structured Atlas 
entity. (blush)  We are using it, so I created a placeholder S3AccessPolicy 
structure that consists of a string now, but can be expanded into the structure 
when someone needs/wants it. 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 

[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514229#comment-16514229
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~bosco] 

 bq. You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is 
a list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514229#comment-16514229
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:44 PM:


[~bosco] 

 bq. 

You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is a 
list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket


was (Author: barbara):
[~bosco] 

 bq. You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is 
a list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514222#comment-16514222
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~madhan.neethiraj] I agree with all your changes except one: 
 - is avroSchema applicable for AWSS3PseudoDir?  
 ** Yes, for those that don't model objects and stop at pseudo (like us).  We 
recommend that there be a 1:1 relationship between pseudo and avro_schema, but 
we can't enforce that, so it's an array of schemas in the pseudo type.  I have 
changed it to a single schema in the object type as you suggest.

I am breaking your file into two jsons, one datalake-specific and one general 
AWS, as discussed with [~bosco]. 

Nice to see an example of relationshipDefs.  Can you please point me to a place 
where the semantics of these attributes is documented?  I'd like to start using 
them.

 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:31 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object. I have added it.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:21 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!  I'll 
be at your talk.  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:16 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:02 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

Good point about Tags in AWSS3Object.


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:01 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  Mine is just a little later on the same day!  
[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  
[Mine|[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]]
 is just a little later on the same day!

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 6:00 PM:


[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!  I'll 
be at your talk!  
[Mine|[https://dataworkssummit.com/san-jose-2018/session/an-architecture-for-federated-data-discovery-and-lineage-over-on-prem-datasources-and-public-cloud-with-apache-atlas/]]
 is just a little later on the same day!


was (Author: barbara):
[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Barbara Eckman (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514168#comment-16514168
 ] 

Barbara Eckman commented on ATLAS-2708:
---

[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the demo!!

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Don Bosco Durai (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513593#comment-16513593
 ] 

Don Bosco Durai commented on ATLAS-2708:


{quote}I updated the model files for above comments, except the last one, and 
uploaded in this JIRA - [^3010-aws_model.json]. Please review.
{quote}
[~madhan.neethiraj] , seems our comments crossed path at the same time.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Don Bosco Durai (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513580#comment-16513580
 ] 

Don Bosco Durai commented on ATLAS-2708:


[~barbara] looks good. Having a common JSON for AWS makes sense.

Sorry, two more feedback :)
 # We should add AWSTags to S3Object also
 # You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is a 
list of Statement Structure. If we are not using it now, we should probably 
remove it and add it when we need to. Or we can create a placeholder 
S3BucketPolicy entity and associate that with AWSS3Bucket

 

[~madhan.neethiraj] and I are giving a talk next week in San Jose at DataWorks 
conference 
[https://dataworkssummit.com/san-jose-2018/session/securing-data-in-hybrid-environments-using-apache-ranger/].
 Are you okay if we can use your JSON for the demo?

Thanks

 

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Madhan Neethiraj (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513574#comment-16513574
 ] 

Madhan Neethiraj edited comment on ATLAS-2708 at 6/15/18 9:14 AM:
--

[~barbara] - thanks for AWS/S3 model types. Here are my comments:

- looks like following types are suitable to be modeled as a struct, instead of 
entity - given each instance will be contained within an instance of 
AWSS3Bucket; and they don't need their own separate identity outside of its 
container.
-- AWSTag
-- AWSCloudWatchMetric
-- AWSS3BucketLifeCycleRule
- is avroSchema applicable for AWSS3PseudoDir?
- consider renaming attribute AWSTag.tag to AWSTag.key - to be in sync with 
names used in 
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html
- I would suggest using attribute names that begin with a lower case letter, to 
be consistent with rest of types. AWSS3Bucket.S3AccessPolicy, 
AWSS3Bucket.AWSTags
- AWSS3Object has an array of avro_schema associated with. Wouldn't a single 
avro_schema be enough?

I updated the model files for above comments, except the last one, and uploaded 
in this JIRA -  [^3010-aws_model.json]. Please review.


was (Author: madhan.neethiraj):
[~barbara] - thanks for AWS/S3 model types. Here are my comments:

- looks like following types are suitable to be modeled as a struct, instead of 
entity - given each instance will be contained within an instance of 
AWSS3Bucket; and they don't need their own separate identity outside of its 
container.
-- AWSTag
-- AWSCloudWatchMetric
-- AWSS3BucketLifeCycleRule
- is avroSchema applicable for AWSS3PseudoDir?
- consider renaming attribute AWSTag.tag to AWSTag.key - to be in sync with 
names used in 
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html
- I would suggest using attribute names that begin with a lower case letter, to 
be consistent with rest of types. AWSS3Bucket.S3AccessPolicy, 
AWSS3Bucket.AWSTags
- AWSS3Object has an array of avro_schema associated with. Wouldn't a single 
avro_schema be enough?

I updated the model files for above comments, except the last one, and uploaded 
in this JIRA - 3010-aws_model.json. Please review.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated 

[jira] [Updated] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Madhan Neethiraj (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj updated ATLAS-2708:

Attachment: 3010-aws_model.json

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, 
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-06-15 Thread Madhan Neethiraj (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513574#comment-16513574
 ] 

Madhan Neethiraj commented on ATLAS-2708:
-

[~barbara] - thanks for AWS/S3 model types. Here are my comments:

- looks like following types are suitable to be modeled as a struct, instead of 
entity - given each instance will be contained within an instance of 
AWSS3Bucket; and they don't need their own separate identity outside of its 
container.
-- AWSTag
-- AWSCloudWatchMetric
-- AWSS3BucketLifeCycleRule
- is avroSchema applicable for AWSS3PseudoDir?
- consider renaming attribute AWSTag.tag to AWSTag.key - to be in sync with 
names used in 
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html
- I would suggest using attribute names that begin with a lower case letter, to 
be consistent with rest of types. AWSS3Bucket.S3AccessPolicy, 
AWSS3Bucket.AWSTags
- AWSS3Object has an array of avro_schema associated with. Wouldn't a single 
avro_schema be enough?

I updated the model files for above comments, except the last one, and uploaded 
in this JIRA - 3010-aws_model.json. Please review.

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Attachments: all_AWS_common_typedefs.json, all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2758) Property table 'Active', 'Deleted' status issue

2018-06-15 Thread Abhishek Kadam (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kadam updated ATLAS-2758:
--
Attachment: ATLAS-2758.patch

> Property table 'Active', 'Deleted' status issue
> ---
>
> Key: ATLAS-2758
> URL: https://issues.apache.org/jira/browse/ATLAS-2758
> Project: Atlas
>  Issue Type: Bug
>  Components: atlas-webui
>Reporter: Abhishek Kadam
>Assignee: Abhishek Kadam
>Priority: Major
> Fix For: 1.1.0, 2.0.0
>
> Attachments: ATLAS-2758.patch
>
>
> Entity status is not showing properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Atlas Hbase kerberos Authentication problems

2018-06-15 Thread Taher Koitawala
Hi All,

  I am trying to configure Atlas with Hbase on kerberos. However,
when Atlas tries to connect to Hbase via kerberos i get the following error.

*couldn't setup connection for atlas/atlas-hostname@EXAMPLE to
hbase/xx.xx.xx.xx@EXAMPLE*

The error is because when connecting to hbase atlas is using the
hbase ip in the kerberos part instead of HBase hostname, even though the
property "atlas.graph.storage.hostname" has been supplied with hbase
hostname and not IP.

I am using Atlas 0.8 and Hbase 1.2

Can someone please help me resolve the issue?





Regards,
Taher Koitawala
GS Lab Pune
+91 8407979163


[jira] [Created] (ATLAS-2758) Property table 'Active', 'Deleted' status issue

2018-06-15 Thread Abhishek Kadam (JIRA)
Abhishek Kadam created ATLAS-2758:
-

 Summary: Property table 'Active', 'Deleted' status issue
 Key: ATLAS-2758
 URL: https://issues.apache.org/jira/browse/ATLAS-2758
 Project: Atlas
  Issue Type: Bug
  Components: atlas-webui
Reporter: Abhishek Kadam
Assignee: Abhishek Kadam
 Fix For: 1.1.0, 2.0.0


Entity status is not showing properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)