[jira] [Commented] (ATLAS-4349) no lineage in hive views

2021-07-08 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377239#comment-17377239
 ] 

t oo commented on ATLAS-4349:
-

cc [~nixon] [~prasadpp13] [~sarath] [~mandar_va] [~nareshpr]

> no lineage in hive views
> 
>
> Key: ATLAS-4349
> URL: https://issues.apache.org/jira/browse/ATLAS-4349
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: t oo
>Priority: Major
>
> i ran import-hive.sh and it imported entities, but no lineage.
>  
> below says it does not support?
> [https://community.cloudera.com/t5/Support-Questions/I-run-the-script-tool-import-hive-sh-and-i-can-search-the/td-p/175545]
>  
> how to get lineage of hive views?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-4349) no lineage in hive views

2021-07-08 Thread t oo (Jira)
t oo created ATLAS-4349:
---

 Summary: no lineage in hive views
 Key: ATLAS-4349
 URL: https://issues.apache.org/jira/browse/ATLAS-4349
 Project: Atlas
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: t oo


i ran import-hive.sh and it imported entities, but no lineage.

 

below says it does not support?

[https://community.cloudera.com/t5/Support-Questions/I-run-the-script-tool-import-hive-sh-and-i-can-search-the/td-p/175545]

 

how to get lineage of hive views?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3608) Hive Bridge: Hive Metastore: Alter Table Query Not Handled Correctly

2021-07-06 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376113#comment-17376113
 ] 

t oo commented on ATLAS-3608:
-

did u solve [~ppanda-beta] ?

> Hive Bridge: Hive Metastore: Alter Table Query Not Handled Correctly
> 
>
> Key: ATLAS-3608
> URL: https://issues.apache.org/jira/browse/ATLAS-3608
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Fix For: 2.0.0, trunk
>
> Attachments: ATLAS-3608-Incorrect-processing-of-alter-table.patch
>
>
> *Background*
> DDL queries in Impala are processed via _Hive Metastore_ bridge.
>  
> *Steps to Duplicate*
> Keep HMS logs in view. Depending on installation, they can be found at this 
> location: _/var/log/hive/hadoop-cmf-HIVE-1-HIVEMETASTORE-.log.out_
> From Impala:
> - Run _impala-shell_
>  * Run _create database stocks; use stocks; create table daily (dt string, 
> open string, high string); create view daily_rpt as select * from daily; 
> create external table weekly (dt string, open string, high string);_
>  * Note within Atlas that the new entities for _stocks, daily, daily_rpt_ and 
> _weekly_ have been created. Note the columns in _weekly_ table.
>  * From _impala-shell,_ run _alter table weekly add columns ( newCol string_);
> _Expected results_
>  * HMS logs should not show _NullPointerException_.
>  * Atlas should show the table weekly with the newCol column.
>  
> _Observed results_:
> HMS logs show _NullPointerExcetion_ from Atlas hook.
> New entity _newCol_ is not seen within Atlas.
> *Root cause*
> When assessing the incoming event to determine the type of alter, Atlas uses 
> table parameter. The recent build has a new parameter for timestamp: 
> _last_modified_time_ _transient_lastDdlTime_. This results in incorrect 
> assessment. Hence the alter event is incorrectly processed, thereby causing 
> an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-4149) Presto/Trino Hook

2021-07-06 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376112#comment-17376112
 ] 

t oo commented on ATLAS-4149:
-

did u solve [~sdorgan] ?

> Presto/Trino Hook
> -
>
> Key: ATLAS-4149
> URL: https://issues.apache.org/jira/browse/ATLAS-4149
> Project: Atlas
>  Issue Type: New Feature
>  Components: atlas-intg
>Reporter: Sébastien Dorgan
>Priority: Major
>
> Would it be possible to create a hook for Presto or Trino?
> Presto and Trino supports EventListener 
> ([https://trino.io/docs/current/develop/event-listener.html?highlight=listener).]
> Thanks to this mechanism it is possible to intercepts Presto Queries and send 
> them to a Kafka topic. Here is an exemple I found on github: 
> [https://github.com/cianru/presto-kafka-emitter]
> I'm a new user of Apache Atlas, but since Atlas already offers a Hook for 
> Hive, I think a model for Presto shouldn't be too far away from the one for 
> Hive.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3148) Implement Hive Metastore hook for Atlas

2021-07-02 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373587#comment-17373587
 ] 

t oo commented on ATLAS-3148:
-

did u add support for presto views? [~ppanda-beta]

> Implement Hive Metastore hook for Atlas
> ---
>
> Key: ATLAS-3148
> URL: https://issues.apache.org/jira/browse/ATLAS-3148
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Affects Versions: 1.1.0
>Reporter: Sarath Subramanian
>Assignee: Sarath Subramanian
>Priority: Major
>  Labels: new-feature
> Fix For: 2.0.0, trunk
>
>
> Atlas has hive hook ([http://atlas.apache.org/Hook-Hive.html]) which 
> registers with HiveServer2 process to listen for create/update/delete 
> operations and updates the metadata in Atlas.
> If hive metastore is accessed using other clients like Impala shell, Hue or 
> other JDBC/ODBC client apps there is no way to capture the events in Atlas.
> This Jira will create a new atlas hook for Hive Metastore - stores the 
> metadata for Hive tables and partitions in a relational database, and 
> provides clients (including Hive) access to this information using the 
> metastore service API. 
> This hook is registered as post-listener (*hive.metastore.event.listeners*) 
> and DDL operations are captured and send to atlas kafka topic for processing 
> by Atlas server.
> The following DDL operations are captured:
>  * *CreateDatabaseEvent*
>  * *DropDatabaseEvent*
>  * *AlterDatabaseEvent*
>  * *CreateTableEvent*
>  * *DropTableEvent*
>  * *AlterTableEvent*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-2792) Apache Atlas quickstart error

2018-09-26 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628597#comment-16628597
 ] 

t oo commented on ATLAS-2792:
-

external

> Apache Atlas quickstart error
> -
>
> Key: ATLAS-2792
> URL: https://issues.apache.org/jira/browse/ATLAS-2792
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.0.0
>Reporter: t oo
>Priority: Major
>
> Env: no kerberos, no ranger, no hdfs. EC2 with ssl.
> Getting this error after running $ATLAS_HOME/bin/quick_start.py 
> https://$componentPrivateDNSRecord:21443 with correct user/pass
>  
> {code:java}
>  
> Creating sample types: Created type [DB] Created type [Table] Created type 
> [StorageDesc] Created type [Column] Created type [LoadProcess] Created type 
> [View] Created type [JdbcAccess] Created type [ETL] Created type [Metric] 
> Created type [PII] Created type [Fact] Created type [Dimension] Created type 
> [Log Data] Creating sample entities: Exception in thread "main" 
> com.sun.jersey.api.client.ClientHandlerException: 
> java.net.SocketTimeoutException: Read timed out at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>  at 
> com.sun.jersey.api.client.filter.HTTPBasicAuthFilter.handle(HTTPBasicAuthFilter.java:105)
>  at com.sun.jersey.api.client.Client.handle(Client.java:652) at 
> com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at 
> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at 
> com.sun.jersey.api.client.WebResource$Builder.method(WebResource.java:634) at 
> org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:334)
>  at 
> org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:311)
>  at org.apache.atlas.AtlasBaseClient.callAPI(AtlasBaseClient.java:199) at 
> org.apache.atlas.AtlasClientV2.createEntity(AtlasClientV2.java:277) at 
> org.apache.atlas.examples.QuickStartV2.createInstance(QuickStartV2.java:339) 
> at 
> org.apache.atlas.examples.QuickStartV2.createDatabase(QuickStartV2.java:362) 
> at 
> org.apache.atlas.examples.QuickStartV2.createEntities(QuickStartV2.java:268) 
> at 
> org.apache.atlas.examples.QuickStartV2.runQuickstart(QuickStartV2.java:150) 
> at org.apache.atlas.examples.QuickStartV2.main(QuickStartV2.java:132) Caused 
> by: java.net.SocketTimeoutException: Read timed out at 
> java.net.SocketInputStream.socketRead0(Native Method) at 
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at 
> java.net.SocketInputStream.read(SocketInputStream.java:171) at 
> java.net.SocketInputStream.read(SocketInputStream.java:141) at 
> sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at 
> sun.security.ssl.InputRecord.read(InputRecord.java:503) at 
> sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at 
> sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at 
> sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at 
> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:345) at 
> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at 
> sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
>  at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
>  at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
>  at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
>  ... 14 more No sample data added to Apache Atlas Server.
> Relevant code:
> https://github.com/apache/incubator-atlas/blob/master/webapp/src/main/java/org/apache/atlas/examples/QuickStartV2.java
> #This works quickStartV2.createTypes(); #This errors 
> quickStartV2.createEntities();
> First i thought atlas->kafka connectivity was issue but then I see:
> [ec2-user@ip-10-160-187-181 logs]$ cat atlas_kafka_setup.log 2018-07-25 
> 00:06:14,923 INFO - [main:] ~ Looking for atlas-application.properties in 
> classpath (ApplicationProperties:78) 2018-07-25 00:06:14,926 INFO - [main:] ~ 
> Loading atlas-application.properties from 
> file:/home/ec2-user/atlas/distro/target/apache-atlas-1.0.0-SNAPSHOT-bin/apache-atlas-1.0.0-SNAPSHOT/conf/atlas-application.properties
>  (ApplicationProperties:91) 2018-07-25 00:06:16,512 WARN - [main:] ~ 
> 

[jira] [Commented] (ATLAS-2889) S3 object tag import hook

2018-09-25 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626940#comment-16626940
 ] 

t oo commented on ATLAS-2889:
-

Plus ORC and Parquet. I assume this feature will run s3 get-object-tag under 
the hood. btw, do you have one example of the atlasentity s3 json u pushed to 
the topic with tags array? I got error about objectid can't be null

> S3 object tag import hook
> -
>
> Key: ATLAS-2889
> URL: https://issues.apache.org/jira/browse/ATLAS-2889
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.1.0
>Reporter: t oo
>Priority: Major
>
> https://issues.apache.org/jira/browse/ATLAS-2708 introduced support for s3 
> type.
>  
> From the comments in the jira, there are some external steps needed to import 
> object tags. This Jira is about creating the E2E functionality within Atlas. 
> ie functionality in UI or an API that when supplied with argument of an s3 
> bucket/slice then Atlas automatically imports all the object names/tags 
> recursively from under that s3 bucket/slice.
>  
> "It doesn't do it automatically through a listener like the hive hook.  We do 
> it via lambda functions, triggered, say, on the creation of S3 object or 
> pseudodirectory or bucket.  We package up the info into AtlasEntities and 
> then publish to the ATLAS_HOOK kafka topic.
> You have to create your own Lambda code that creates AtlasEntities on the fly 
> as trigerred by Lambda Function(on changes made to s3 object) and then push 
> to Kafka Queue. This particular functionality is not part of Atlas tool as of 
> now."
> cc: [~barbara] [~ayushmnnit]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-09-24 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626588#comment-16626588
 ] 

t oo commented on ATLAS-2708:
-

tracking further work in ATLAS-2889

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.1.0, 2.0.0
>
> Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, 
> ATLAS-2708.patch, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2889) S3 object tag import hook

2018-09-24 Thread t oo (JIRA)
t oo created ATLAS-2889:
---

 Summary: S3 object tag import hook
 Key: ATLAS-2889
 URL: https://issues.apache.org/jira/browse/ATLAS-2889
 Project: Atlas
  Issue Type: New Feature
  Components:  atlas-core, atlas-intg
Affects Versions: 1.1.0
Reporter: t oo


https://issues.apache.org/jira/browse/ATLAS-2708 introduced support for s3 type.

 

>From the comments in the jira, there are some external steps needed to import 
>object tags. This Jira is about creating the E2E functionality within Atlas. 
>ie functionality in UI or an API that when supplied with argument of an s3 
>bucket/slice then Atlas automatically imports all the object names/tags 
>recursively from under that s3 bucket/slice.

 

"It doesn't do it automatically through a listener like the hive hook.  We do 
it via lambda functions, triggered, say, on the creation of S3 object or 
pseudodirectory or bucket.  We package up the info into AtlasEntities and then 
publish to the ATLAS_HOOK kafka topic.

You have to create your own Lambda code that creates AtlasEntities on the fly 
as trigerred by Lambda Function(on changes made to s3 object) and then push to 
Kafka Queue. This particular functionality is not part of Atlas tool as of now."

cc: [~barbara] [~ayushmnnit]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-09-19 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621191#comment-16621191
 ] 

t oo commented on ATLAS-2708:
-

[~barbara] what config/steps should I run to enable this? or are you saying 
this is homegrown and not included in the atlas tool?

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.1.0, 2.0.0
>
> Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, 
> ATLAS-2708.patch, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2760) Update Hive hook to create AWS S3 entities for S3 path references

2018-09-19 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620569#comment-16620569
 ] 

t oo commented on ATLAS-2760:
-

Can atlas import 's3 object tags' from all s3 objects under a given s3 bucket?

> Update Hive hook to create AWS S3 entities for S3 path references 
> --
>
> Key: ATLAS-2760
> URL: https://issues.apache.org/jira/browse/ATLAS-2760
> Project: Atlas
>  Issue Type: Bug
>  Components: atlas-intg
>Affects Versions: 1.0.0
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Fix For: 1.1.0, 2.0.0
>
> Attachments: ATLAS-2760-2.patch
>
>
> Entity types for AWS S3 have been added recently via ATLAS-2708. Hive hook 
> should be updated to create AWS S3 entities for references to S3 paths - for 
> example in 'insert overwrite' operation. This will capture the lineage 
> between Hive tables and S3 paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas

2018-09-19 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620568#comment-16620568
 ] 

t oo commented on ATLAS-2708:
-

Can atlas import 's3 object tags' from all s3 objects under a given s3 bucket?

> AWS S3 data lake typedefs for Atlas
> ---
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core
>Reporter: Barbara Eckman
>Assignee: Barbara Eckman
>Priority: Critical
> Fix For: 1.1.0, 2.0.0
>
> Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, 
> ATLAS-2708.patch, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2881) Cassandra metadata ingestion to Atlas

2018-09-19 Thread t oo (JIRA)
t oo created ATLAS-2881:
---

 Summary: Cassandra metadata ingestion to Atlas
 Key: ATLAS-2881
 URL: https://issues.apache.org/jira/browse/ATLAS-2881
 Project: Atlas
  Issue Type: New Feature
Reporter: t oo
 Fix For: 1.2.0


Cassandra metadata ingestion to Atlas

 

Ability to import keyspace metadata from  Cassandra into Atlas



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ATLAS-2828) DOCUMENTATION - architecture diagram still shows Titan instead of Janus

2018-08-24 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo resolved ATLAS-2828.
-
Resolution: Duplicate

# ATLAS-2832

> DOCUMENTATION - architecture diagram still shows Titan instead of Janus
> ---
>
> Key: ATLAS-2828
> URL: https://issues.apache.org/jira/browse/ATLAS-2828
> Project: Atlas
>  Issue Type: Improvement
>Reporter: t oo
>Priority: Trivial
>
> Image in this link still shows 'Titan' instead of 'Janus':
> https://atlas.apache.org/Architecture.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2829) Publish atlas jars to a mirror on each release

2018-08-17 Thread t oo (JIRA)
t oo created ATLAS-2829:
---

 Summary: Publish atlas jars to a mirror on each release
 Key: ATLAS-2829
 URL: https://issues.apache.org/jira/browse/ATLAS-2829
 Project: Atlas
  Issue Type: Improvement
Reporter: t oo


Publish atlas jars to a mirror on each release. This saves all users from 
building the jars themself on each release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2828) DOCUMENTATION - architecture diagram still shows Titan instead of Janus

2018-08-17 Thread t oo (JIRA)
t oo created ATLAS-2828:
---

 Summary: DOCUMENTATION - architecture diagram still shows Titan 
instead of Janus
 Key: ATLAS-2828
 URL: https://issues.apache.org/jira/browse/ATLAS-2828
 Project: Atlas
  Issue Type: Improvement
Reporter: t oo


Image in this link still shows 'Titan' instead of 'Janus':

https://atlas.apache.org/Architecture.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2792) Apache Atlas quickstart error

2018-07-27 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559365#comment-16559365
 ] 

t oo commented on ATLAS-2792:
-

This resolved it, but why have 9027 in app properties as default??


sed -i 
's/atlas.kafka.bootstrap.servers=localhost:9027/atlas.kafka.bootstrap.servers=localhost:9092/'
 $ATLAS_HOME/conf/atlas-application.properties

> Apache Atlas quickstart error
> -
>
> Key: ATLAS-2792
> URL: https://issues.apache.org/jira/browse/ATLAS-2792
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.0.0
>Reporter: t oo
>Priority: Major
>
> Env: no kerberos, no ranger, no hdfs. EC2 with ssl.
> Getting this error after running $ATLAS_HOME/bin/quick_start.py 
> https://$componentPrivateDNSRecord:21443 with correct user/pass
>  
> {code:java}
>  
> Creating sample types: Created type [DB] Created type [Table] Created type 
> [StorageDesc] Created type [Column] Created type [LoadProcess] Created type 
> [View] Created type [JdbcAccess] Created type [ETL] Created type [Metric] 
> Created type [PII] Created type [Fact] Created type [Dimension] Created type 
> [Log Data] Creating sample entities: Exception in thread "main" 
> com.sun.jersey.api.client.ClientHandlerException: 
> java.net.SocketTimeoutException: Read timed out at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>  at 
> com.sun.jersey.api.client.filter.HTTPBasicAuthFilter.handle(HTTPBasicAuthFilter.java:105)
>  at com.sun.jersey.api.client.Client.handle(Client.java:652) at 
> com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at 
> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at 
> com.sun.jersey.api.client.WebResource$Builder.method(WebResource.java:634) at 
> org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:334)
>  at 
> org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:311)
>  at org.apache.atlas.AtlasBaseClient.callAPI(AtlasBaseClient.java:199) at 
> org.apache.atlas.AtlasClientV2.createEntity(AtlasClientV2.java:277) at 
> org.apache.atlas.examples.QuickStartV2.createInstance(QuickStartV2.java:339) 
> at 
> org.apache.atlas.examples.QuickStartV2.createDatabase(QuickStartV2.java:362) 
> at 
> org.apache.atlas.examples.QuickStartV2.createEntities(QuickStartV2.java:268) 
> at 
> org.apache.atlas.examples.QuickStartV2.runQuickstart(QuickStartV2.java:150) 
> at org.apache.atlas.examples.QuickStartV2.main(QuickStartV2.java:132) Caused 
> by: java.net.SocketTimeoutException: Read timed out at 
> java.net.SocketInputStream.socketRead0(Native Method) at 
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at 
> java.net.SocketInputStream.read(SocketInputStream.java:171) at 
> java.net.SocketInputStream.read(SocketInputStream.java:141) at 
> sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at 
> sun.security.ssl.InputRecord.read(InputRecord.java:503) at 
> sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at 
> sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at 
> sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at 
> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:345) at 
> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at 
> sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
>  at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
>  at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
>  at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
>  ... 14 more No sample data added to Apache Atlas Server.
> Relevant code:
> https://github.com/apache/incubator-atlas/blob/master/webapp/src/main/java/org/apache/atlas/examples/QuickStartV2.java
> #This works quickStartV2.createTypes(); #This errors 
> quickStartV2.createEntities();
> First i thought atlas->kafka connectivity was issue but then I see:
> [ec2-user@ip-10-160-187-181 logs]$ cat atlas_kafka_setup.log 2018-07-25 
> 00:06:14,923 INFO - [main:] ~ Looking for atlas-application.properties in 
> classpath (ApplicationProperties:78) 2018-07-25 00:06:14,926 INFO - [main:] ~ 
> Loading atlas-application.properties from 
> 

[jira] [Created] (ATLAS-2792) Apache Atlas quickstart error

2018-07-24 Thread t oo (JIRA)
t oo created ATLAS-2792:
---

 Summary: Apache Atlas quickstart error
 Key: ATLAS-2792
 URL: https://issues.apache.org/jira/browse/ATLAS-2792
 Project: Atlas
  Issue Type: Bug
  Components:  atlas-core, atlas-intg
Affects Versions: 1.0.0
Reporter: t oo


Env: no kerberos, no ranger, no hdfs. EC2 with ssl.

Getting this error after running $ATLAS_HOME/bin/quick_start.py 
https://$componentPrivateDNSRecord:21443 with correct user/pass

 
{code:java}
 
Creating sample types: Created type [DB] Created type [Table] Created type 
[StorageDesc] Created type [Column] Created type [LoadProcess] Created type 
[View] Created type [JdbcAccess] Created type [ETL] Created type [Metric] 
Created type [PII] Created type [Fact] Created type [Dimension] Created type 
[Log Data] Creating sample entities: Exception in thread "main" 
com.sun.jersey.api.client.ClientHandlerException: 
java.net.SocketTimeoutException: Read timed out at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
 at 
com.sun.jersey.api.client.filter.HTTPBasicAuthFilter.handle(HTTPBasicAuthFilter.java:105)
 at com.sun.jersey.api.client.Client.handle(Client.java:652) at 
com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at 
com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at 
com.sun.jersey.api.client.WebResource$Builder.method(WebResource.java:634) at 
org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:334) 
at 
org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:311) 
at org.apache.atlas.AtlasBaseClient.callAPI(AtlasBaseClient.java:199) at 
org.apache.atlas.AtlasClientV2.createEntity(AtlasClientV2.java:277) at 
org.apache.atlas.examples.QuickStartV2.createInstance(QuickStartV2.java:339) at 
org.apache.atlas.examples.QuickStartV2.createDatabase(QuickStartV2.java:362) at 
org.apache.atlas.examples.QuickStartV2.createEntities(QuickStartV2.java:268) at 
org.apache.atlas.examples.QuickStartV2.runQuickstart(QuickStartV2.java:150) at 
org.apache.atlas.examples.QuickStartV2.main(QuickStartV2.java:132) Caused by: 
java.net.SocketTimeoutException: Read timed out at 
java.net.SocketInputStream.socketRead0(Native Method) at 
java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at 
java.net.SocketInputStream.read(SocketInputStream.java:171) at 
java.net.SocketInputStream.read(SocketInputStream.java:141) at 
sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at 
sun.security.ssl.InputRecord.read(InputRecord.java:503) at 
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at 
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at 
sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at 
java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at 
java.io.BufferedInputStream.read(BufferedInputStream.java:345) at 
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at 
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
 at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
 at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
 at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
 at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
 ... 14 more No sample data added to Apache Atlas Server.
Relevant code:
https://github.com/apache/incubator-atlas/blob/master/webapp/src/main/java/org/apache/atlas/examples/QuickStartV2.java
#This works quickStartV2.createTypes(); #This errors 
quickStartV2.createEntities();
First i thought atlas->kafka connectivity was issue but then I see:
[ec2-user@ip-10-160-187-181 logs]$ cat atlas_kafka_setup.log 2018-07-25 
00:06:14,923 INFO - [main:] ~ Looking for atlas-application.properties in 
classpath (ApplicationProperties:78) 2018-07-25 00:06:14,926 INFO - [main:] ~ 
Loading atlas-application.properties from 
file:/home/ec2-user/atlas/distro/target/apache-atlas-1.0.0-SNAPSHOT-bin/apache-atlas-1.0.0-SNAPSHOT/conf/atlas-application.properties
 (ApplicationProperties:91) 2018-07-25 00:06:16,512 WARN - [main:] ~ Attempting 
to create topic ATLAS_HOOK (AtlasTopicCreator:72) 2018-07-25 00:06:17,004 WARN 
- [main:] ~ Created topic ATLAS_HOOK with partitions 1 and replicas 1 
(AtlasTopicCreator:119) 2018-07-25 00:06:17,004 WARN - [main:] ~ Attempting to 
create topic ATLAS_ENTITIES (AtlasTopicCreator:72) 2018-07-25 00:06:17,024 WARN 
- [main:] ~ Created topic ATLAS_ENTITIES with 

[jira] [Commented] (ATLAS-1047) Atlas startup failed with ZkTimeoutException exception

2018-07-16 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546003#comment-16546003
 ] 

t oo commented on ATLAS-1047:
-

I am facing same issue, with below env. Did you find root cause?

export hbase_release_version=1.1.13
export solr_release_version=5.5.1
export zookeeper_release_version=3.4.12
export kafka_release_version=1.1.0
export kafka_scala_version=2.11
export atlas_release_version=1.0.0
export hadoop_release_version=2.7.6

> Atlas startup failed with ZkTimeoutException exception
> --
>
> Key: ATLAS-1047
> URL: https://issues.apache.org/jira/browse/ATLAS-1047
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Ayub Pathan
>Priority: Major
>
> Atlas failed to comeup with below exception and after restart Atlas start up 
> succeeded.
> *Please note, this issue is not reproducible as of now, so I have kept the 
> priority  as Major*
> {noformat}
> 2016-07-25 07:20:46,432 WARN  - [main:] ~ Failed startup of context 
> o.e.j.w.WebAppContext@3b8837fa{/,file:/grid/0/hdp/2.5.0.0-1061/atlas/server/webapp/atlas/,STARTING}{/usr/hdp/current/atlas-server/server/webapp/atlas}
>  (WebAppContext:514)
> java.lang.RuntimeException: org.I0Itec.zkclient.exception.ZkTimeoutException: 
> Unable to connect to zookeeper server within timeout: 200
> at org.apache.atlas.service.Services.start(Services.java:48)
> at 
> org.apache.atlas.web.listeners.GuiceServletConfig.startServices(GuiceServletConfig.java:142)
> at 
> org.apache.atlas.web.listeners.GuiceServletConfig.contextInitialized(GuiceServletConfig.java:136)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:800)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:444)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:791)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:294)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1349)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1342)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741)
> at 
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:505)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
> at org.eclipse.jetty.server.Server.start(Server.java:387)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
> at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
> at org.eclipse.jetty.server.Server.doStart(Server.java:354)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at 
> org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:93)
> at org.apache.atlas.Atlas.main(Atlas.java:113)
> Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to 
> connect to zookeeper server within timeout: 200
> at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1232)
> at org.I0Itec.zkclient.ZkClient.(ZkClient.java:156)
> at org.I0Itec.zkclient.ZkClient.(ZkClient.java:130)
> at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:75)
> at kafka.utils.ZkUtils$.apply(ZkUtils.scala:57)
> at 
> kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:207)
> at 
> kafka.consumer.ZookeeperConsumerConnector.(ZookeeperConsumerConnector.scala:155)
> at 
> kafka.javaapi.consumer.ZookeeperConsumerConnector.(ZookeeperConsumerConnector.scala:66)
> at 
> kafka.javaapi.consumer.ZookeeperConsumerConnector.(ZookeeperConsumerConnector.scala:69)
> at 
> kafka.consumer.Consumer$.createJavaConsumerConnector(ConsumerConnector.scala:120)
> at 
> kafka.consumer.Consumer.createJavaConsumerConnector(ConsumerConnector.scala)
> at 
> org.apache.atlas.kafka.KafkaNotification.createConsumerConnector(KafkaNotification.java:264)
> at 
> org.apache.atlas.kafka.KafkaNotification.createConsumers(KafkaNotification.java:182)
> at 
> org.apache.atlas.kafka.KafkaNotification.createConsumers(KafkaNotification.java:169)
> at 
> org.apache.atlas.notification.NotificationHookConsumer.startConsumers(NotificationHookConsumer.java:89)
> at 
>