Re: Review Request 69587: ATLAS-3002: added instrumentation to collect time taken for sub-tasks during entity create/update

2018-12-18 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69587/
---

(Updated Dec. 18, 2018, 11:18 p.m.)


Review request for atlas, Abhishek Kadam, Ashutosh Mestry, keval bhatt, 
Kapildeo Nayak, Mehul Parikh, Nixon Rodrigues, and Sarath Subramanian.


Changes
---

Ashutosh - thanks for reviewing and catching the cumulative counter issue in 
case of import. This update addresses this issue.


Bugs: ATLAS-3002
https://issues.apache.org/jira/browse/ATLAS-3002


Repository: atlas


Description
---

- added instrumentation to collect time taken for sub-tasks during entity 
create/update - like walkEntityGraph, resolveReferences, preCreateOrUpdate, 
mapAttributesAndClassifications, fullTextMapping, graphCommit, 
entityNotification, entityAudit
- updated default atlas-log4j.xml to include loggers for METRICS
- optimization: updated Hive hook to avoid including quertText in 
hive_column_lineage.name attribute value, as this can unnecessarily bloat the 
message size for large queries
- optimziation: updated fullTextMapper to skip going through object-id 
collections; this saves cycles for example while processing hive_table entities 
with large number of columns
- here is the sample metrics:
```json
{
  "walkEntityGraph": { "count":   1, "timeTaken":1 },
  "resolveReferences":   { "count":   1, "timeTaken":  670 },
  "findByUniqueAttributes":  { "count": 217, "timeTaken":  668 },
  "findByTypeAndPropertyName":   { "count": 217, "timeTaken":  641 },
  "findBySuperTypeAndPropertyName":  { "count":  74, "timeTaken":   27 },
  "createOrUpdate":  { "count":   1, "timeTaken": 1637 },
  "preCreateOrUpdate":   { "count":   1, "timeTaken":  720 },
  "mapAttributesAndClassifications": { "count":   1, "timeTaken":   90 },
  "fullTextMapping": { "count":   1, "timeTaken":  195 },
  "notification-getReferenceable":   { "count":   1, "timeTaken":   37 },
  "entityAudit": { "count":   1, "timeTaken":   31 },
  "entityNotification":  { "count":   2, "timeTaken":8 },
  "graphCommit": { "count":  75, "timeTaken":  315 }
}
```


Diffs (updated)
-

  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java
 2ccfff4d1 
  common/src/main/java/org/apache/atlas/utils/AtlasPerfMetrics.java 
PRE-CREATION 
  distro/src/conf/atlas-log4j.xml c183871eb 
  notification/src/main/java/org/apache/atlas/kafka/KafkaNotification.java 
4bec91709 
  repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java 
4c436779e 
  
repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditListener.java
 dfacb3817 
  
repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditListenerV2.java
 8ca8c9a0b 
  
repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java
 08ccd9c73 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java
 a8c3363d5 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityGraphDiscoveryV2.java
 6580beecd 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java
 d97b74d9d 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasGraphUtilsV2.java
 25770a334 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java
 6c830bafe 
  server-api/src/main/java/org/apache/atlas/RequestContext.java 099d713f6 
  
webapp/src/main/java/org/apache/atlas/notification/EntityNotificationListenerV2.java
 e0a60a133 
  
webapp/src/main/java/org/apache/atlas/notification/NotificationEntityChangeListener.java
 b5e7ed871 
  
webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java
 b344c50e6 


Diff: https://reviews.apache.org/r/69587/diff/2/

Changes: https://reviews.apache.org/r/69587/diff/1-2/


Testing
---

- verified the instumentation output in metric.log
- pre-commit tests run: 
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/869


Thanks,

Madhan Neethiraj



[jira] [Commented] (ATLAS-3002) add instrumentation to enable troubleshooting and optimization of ingest

2018-12-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724553#comment-16724553
 ] 

ASF subversion and git services commented on ATLAS-3002:


Commit 900f99bb47278fbc1ca6d7ed4f55fd715aac6ad2 in atlas's branch 
refs/heads/branch-0.8 from Madhan Neethiraj
[ https://git-wip-us.apache.org/repos/asf?p=atlas.git;h=900f99b ]

ATLAS-3002: added instrumentation to collect time taken for sub-tasks during 
entity create/update


> add instrumentation to enable troubleshooting and optimization of ingest
> 
>
> Key: ATLAS-3002
> URL: https://issues.apache.org/jira/browse/ATLAS-3002
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Affects Versions: 0.8.3, 1.1.0
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Fix For: 0.8.4, 2.0.0
>
> Attachments: ATLAS-3002-branch-0.8.patch, ATLAS-3002.patch
>
>
> For easier troubleshooting and optimization of data ingest into Atlas i.e. 
> entity create/update/delete, it will help to instrument the code path to 
> measure the time taken in various phases. This will further help to 
> prioritize the modules to optimize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3002) add instrumentation to enable troubleshooting and optimization of ingest

2018-12-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724550#comment-16724550
 ] 

ASF subversion and git services commented on ATLAS-3002:


Commit d3feb04090c5a3398693efb6b90dcf8cc369d0b6 in atlas's branch 
refs/heads/branch-1.0 from Madhan Neethiraj
[ https://git-wip-us.apache.org/repos/asf?p=atlas.git;h=d3feb04 ]

ATLAS-3002: added instrumentation to collect time taken for sub-tasks during 
entity create/update

(cherry picked from commit beb34506a15379af4a306902da049f37b445a2f5)


> add instrumentation to enable troubleshooting and optimization of ingest
> 
>
> Key: ATLAS-3002
> URL: https://issues.apache.org/jira/browse/ATLAS-3002
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Affects Versions: 0.8.3, 1.1.0
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Fix For: 0.8.4, 2.0.0
>
> Attachments: ATLAS-3002-branch-0.8.patch, ATLAS-3002.patch
>
>
> For easier troubleshooting and optimization of data ingest into Atlas i.e. 
> entity create/update/delete, it will help to instrument the code path to 
> measure the time taken in various phases. This will further help to 
> prioritize the modules to optimize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3002) add instrumentation to enable troubleshooting and optimization of ingest

2018-12-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724545#comment-16724545
 ] 

ASF subversion and git services commented on ATLAS-3002:


Commit beb34506a15379af4a306902da049f37b445a2f5 in atlas's branch 
refs/heads/master from Madhan Neethiraj
[ https://git-wip-us.apache.org/repos/asf?p=atlas.git;h=beb3450 ]

ATLAS-3002: added instrumentation to collect time taken for sub-tasks during 
entity create/update


> add instrumentation to enable troubleshooting and optimization of ingest
> 
>
> Key: ATLAS-3002
> URL: https://issues.apache.org/jira/browse/ATLAS-3002
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Affects Versions: 0.8.3, 1.1.0
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Fix For: 0.8.4, 2.0.0
>
> Attachments: ATLAS-3002-branch-0.8.patch, ATLAS-3002.patch
>
>
> For easier troubleshooting and optimization of data ingest into Atlas i.e. 
> entity create/update/delete, it will help to instrument the code path to 
> measure the time taken in various phases. This will further help to 
> prioritize the modules to optimize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2996) Message Processing: Conditionally Prevent Message Processing

2018-12-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724522#comment-16724522
 ] 

ASF subversion and git services commented on ATLAS-2996:


Commit 1584139c24b0fce0e0ba62ca13922cd21f7bc8b5 in atlas's branch 
refs/heads/branch-0.8 from Ashutosh Mestry
[ https://git-wip-us.apache.org/repos/asf?p=atlas.git;h=1584139 ]

ATLAS-2996: Conditionally Prevent Notification Processing. With support for HA 
mode.


> Message Processing: Conditionally Prevent Message Processing
> 
>
> Key: ATLAS-2996
> URL: https://issues.apache.org/jira/browse/ATLAS-2996
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: trunk
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Fix For: trunk
>
> Attachments: 
> ATLAS-2996-Conditionally-Prevent-Notification-Proces-HA-Mode.patch, 
> ATLAS-2996-Conditionally-Prevent-Notification-Proces.patch
>
>
> *Background*
> There is a need to start Atlas with its message processing ability shut off. 
> This is necessary when ingesting data via REST calls or using utility like 
> _import-hive.sh_. 
> Once message processing ability is shut off, it is possible to be certain 
> that no changes have been made during the manual import process.
> *Approach Guidance*
>  * Introduce new property say, _atlas.notification.consumer.disabled_
>  * Within _NotificationHookConsumer_, check the value of this property and 
> decide on starting the hook processing thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69587: ATLAS-3002: added instrumentation to collect time taken for sub-tasks during entity create/update

2018-12-18 Thread Ashutosh Mestry

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69587/#review211414
---


Ship it!




Current implementation provides lot of value in getting metrics per request. 
The numbers are cumulative. If same operation is called then the time taken 
becomes additive. Per request time can be calculated using: timeTaken/count.

While this is helpful, it would be worthwhile having per request metrics as 
well. This will help in understanding call to call variations.

- Ashutosh Mestry


On Dec. 18, 2018, 7:49 p.m., Madhan Neethiraj wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69587/
> ---
> 
> (Updated Dec. 18, 2018, 7:49 p.m.)
> 
> 
> Review request for atlas, Abhishek Kadam, Ashutosh Mestry, keval bhatt, 
> Kapildeo Nayak, Mehul Parikh, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3002
> https://issues.apache.org/jira/browse/ATLAS-3002
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> - added instrumentation to collect time taken for sub-tasks during entity 
> create/update - like walkEntityGraph, resolveReferences, preCreateOrUpdate, 
> mapAttributesAndClassifications, fullTextMapping, graphCommit, 
> entityNotification, entityAudit
> - updated default atlas-log4j.xml to include loggers for METRICS
> - optimization: updated Hive hook to avoid including quertText in 
> hive_column_lineage.name attribute value, as this can unnecessarily bloat the 
> message size for large queries
> - optimziation: updated fullTextMapper to skip going through object-id 
> collections; this saves cycles for example while processing hive_table 
> entities with large number of columns
> - here is the sample metrics:
> ```json
> {
>   "walkEntityGraph": { "count":   1, "timeTaken":1 },
>   "resolveReferences":   { "count":   1, "timeTaken":  670 },
>   "findByUniqueAttributes":  { "count": 217, "timeTaken":  668 },
>   "findByTypeAndPropertyName":   { "count": 217, "timeTaken":  641 },
>   "findBySuperTypeAndPropertyName":  { "count":  74, "timeTaken":   27 },
>   "createOrUpdate":  { "count":   1, "timeTaken": 1637 },
>   "preCreateOrUpdate":   { "count":   1, "timeTaken":  720 },
>   "mapAttributesAndClassifications": { "count":   1, "timeTaken":   90 },
>   "fullTextMapping": { "count":   1, "timeTaken":  195 },
>   "notification-getReferenceable":   { "count":   1, "timeTaken":   37 },
>   "entityAudit": { "count":   1, "timeTaken":   31 },
>   "entityNotification":  { "count":   2, "timeTaken":8 },
>   "graphCommit": { "count":  75, "timeTaken":  315 }
> }
> ```
> 
> 
> Diffs
> -
> 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java
>  2ccfff4d1 
>   common/src/main/java/org/apache/atlas/utils/AtlasPerfMetrics.java 
> PRE-CREATION 
>   distro/src/conf/atlas-log4j.xml c183871eb 
>   notification/src/main/java/org/apache/atlas/kafka/KafkaNotification.java 
> 4bec91709 
>   repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java 
> 4c436779e 
>   
> repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditListener.java
>  dfacb3817 
>   
> repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditListenerV2.java
>  8ca8c9a0b 
>   
> repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java
>  08ccd9c73 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java
>  a8c3363d5 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityGraphDiscoveryV2.java
>  6580beecd 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java
>  d97b74d9d 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasGraphUtilsV2.java
>  25770a334 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java
>  6c830bafe 
>   server-api/src/main/java/org/apache/atlas/RequestContext.java 099d713f6 
>   
> webapp/src/main/java/org/apache/atlas/notification/EntityNotificationListenerV2.java
>  e0a60a133 
>   
> webapp/src/main/java/org/apache/atlas/notification/NotificationEntityChangeListener.java
>  b5e7ed871 
>   
> webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java
>  b95594831 
> 
> 
> Diff: https://reviews.apache.org/r/69587/diff/1/
> 
> 
> Testing
> ---
> 
> - verified the instumentation output in metric.log
> - pre-commit tests run: 
> 

[jira] [Commented] (ATLAS-2996) Message Processing: Conditionally Prevent Message Processing

2018-12-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724498#comment-16724498
 ] 

ASF subversion and git services commented on ATLAS-2996:


Commit bd0c5a8a8895b0ab5e47beeefaab994a1099 in atlas's branch 
refs/heads/master from Ashutosh Mestry
[ https://git-wip-us.apache.org/repos/asf?p=atlas.git;h=bd0c5a8 ]

ATLAS-2996: Conditionally Prevent Notification Processing. With support for HA 
mode.


> Message Processing: Conditionally Prevent Message Processing
> 
>
> Key: ATLAS-2996
> URL: https://issues.apache.org/jira/browse/ATLAS-2996
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: trunk
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Fix For: trunk
>
> Attachments: 
> ATLAS-2996-Conditionally-Prevent-Notification-Proces-HA-Mode.patch, 
> ATLAS-2996-Conditionally-Prevent-Notification-Proces.patch
>
>
> *Background*
> There is a need to start Atlas with its message processing ability shut off. 
> This is necessary when ingesting data via REST calls or using utility like 
> _import-hive.sh_. 
> Once message processing ability is shut off, it is possible to be certain 
> that no changes have been made during the manual import process.
> *Approach Guidance*
>  * Introduce new property say, _atlas.notification.consumer.disabled_
>  * Within _NotificationHookConsumer_, check the value of this property and 
> decide on starting the hook processing thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2996) Message Processing: Conditionally Prevent Message Processing

2018-12-18 Thread Ashutosh Mestry (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Mestry updated ATLAS-2996:
---
Attachment: 
ATLAS-2996-Conditionally-Prevent-Notification-Proces-HA-Mode.patch

> Message Processing: Conditionally Prevent Message Processing
> 
>
> Key: ATLAS-2996
> URL: https://issues.apache.org/jira/browse/ATLAS-2996
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: trunk
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Fix For: trunk
>
> Attachments: 
> ATLAS-2996-Conditionally-Prevent-Notification-Proces-HA-Mode.patch, 
> ATLAS-2996-Conditionally-Prevent-Notification-Proces.patch
>
>
> *Background*
> There is a need to start Atlas with its message processing ability shut off. 
> This is necessary when ingesting data via REST calls or using utility like 
> _import-hive.sh_. 
> Once message processing ability is shut off, it is possible to be certain 
> that no changes have been made during the manual import process.
> *Approach Guidance*
>  * Introduce new property say, _atlas.notification.consumer.disabled_
>  * Within _NotificationHookConsumer_, check the value of this property and 
> decide on starting the hook processing thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2996) Message Processing: Conditionally Prevent Message Processing

2018-12-18 Thread Madhan Neethiraj (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724479#comment-16724479
 ] 

Madhan Neethiraj commented on ATLAS-2996:
-

+1 for HA-mode patch.

> Message Processing: Conditionally Prevent Message Processing
> 
>
> Key: ATLAS-2996
> URL: https://issues.apache.org/jira/browse/ATLAS-2996
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: trunk
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Fix For: trunk
>
> Attachments: 
> ATLAS-2996-Conditionally-Prevent-Notification-Proces-HA-Mode.patch, 
> ATLAS-2996-Conditionally-Prevent-Notification-Proces.patch
>
>
> *Background*
> There is a need to start Atlas with its message processing ability shut off. 
> This is necessary when ingesting data via REST calls or using utility like 
> _import-hive.sh_. 
> Once message processing ability is shut off, it is possible to be certain 
> that no changes have been made during the manual import process.
> *Approach Guidance*
>  * Introduce new property say, _atlas.notification.consumer.disabled_
>  * Within _NotificationHookConsumer_, check the value of this property and 
> decide on starting the hook processing thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ATLAS-3002) add instrumentation to enable troubleshooting and optimization of ingest

2018-12-18 Thread Madhan Neethiraj (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723587#comment-16723587
 ] 

Madhan Neethiraj edited comment on ATLAS-3002 at 12/18/18 7:52 PM:
---

- [Review board for branch-0.8 patch|https://reviews.apache.org/r/69576/]
- [Review board for master patch|https://reviews.apache.org/r/69587/]


was (Author: madhan.neethiraj):
- [Review board for branch-0.8 patch|https://reviews.apache.org/r/69576/]

> add instrumentation to enable troubleshooting and optimization of ingest
> 
>
> Key: ATLAS-3002
> URL: https://issues.apache.org/jira/browse/ATLAS-3002
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Affects Versions: 0.8.3, 1.1.0
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Fix For: 0.8.4, 2.0.0
>
> Attachments: ATLAS-3002-branch-0.8.patch, ATLAS-3002.patch
>
>
> For easier troubleshooting and optimization of data ingest into Atlas i.e. 
> entity create/update/delete, it will help to instrument the code path to 
> measure the time taken in various phases. This will further help to 
> prioritize the modules to optimize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3002) add instrumentation to enable troubleshooting and optimization of ingest

2018-12-18 Thread Madhan Neethiraj (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj updated ATLAS-3002:

Fix Version/s: 2.0.0

> add instrumentation to enable troubleshooting and optimization of ingest
> 
>
> Key: ATLAS-3002
> URL: https://issues.apache.org/jira/browse/ATLAS-3002
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Affects Versions: 0.8.3, 1.1.0
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Fix For: 0.8.4, 2.0.0
>
> Attachments: ATLAS-3002-branch-0.8.patch, ATLAS-3002.patch
>
>
> For easier troubleshooting and optimization of data ingest into Atlas i.e. 
> entity create/update/delete, it will help to instrument the code path to 
> measure the time taken in various phases. This will further help to 
> prioritize the modules to optimize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3002) add instrumentation to enable troubleshooting and optimization of ingest

2018-12-18 Thread Madhan Neethiraj (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj updated ATLAS-3002:

Attachment: ATLAS-3002.patch

> add instrumentation to enable troubleshooting and optimization of ingest
> 
>
> Key: ATLAS-3002
> URL: https://issues.apache.org/jira/browse/ATLAS-3002
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Affects Versions: 0.8.3, 1.1.0
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Fix For: 0.8.4
>
> Attachments: ATLAS-3002-branch-0.8.patch, ATLAS-3002.patch
>
>
> For easier troubleshooting and optimization of data ingest into Atlas i.e. 
> entity create/update/delete, it will help to instrument the code path to 
> measure the time taken in various phases. This will further help to 
> prioritize the modules to optimize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 69587: ATLAS-3002: added instrumentation to collect time taken for sub-tasks during entity create/update

2018-12-18 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69587/
---

Review request for atlas, Abhishek Kadam, Ashutosh Mestry, keval bhatt, 
Kapildeo Nayak, Mehul Parikh, Nixon Rodrigues, and Sarath Subramanian.


Bugs: ATLAS-2003
https://issues.apache.org/jira/browse/ATLAS-2003


Repository: atlas


Description
---

- added instrumentation to collect time taken for sub-tasks during entity 
create/update - like walkEntityGraph, resolveReferences, preCreateOrUpdate, 
mapAttributesAndClassifications, fullTextMapping, graphCommit, 
entityNotification, entityAudit
- updated default atlas-log4j.xml to include loggers for METRICS
- optimization: updated Hive hook to avoid including quertText in 
hive_column_lineage.name attribute value, as this can unnecessarily bloat the 
message size for large queries
- optimziation: updated fullTextMapper to skip going through object-id 
collections; this saves cycles for example while processing hive_table entities 
with large number of columns
- here is the sample metrics:
```json
{
  "walkEntityGraph": { "count":   1, "timeTaken":1 },
  "resolveReferences":   { "count":   1, "timeTaken":  670 },
  "findByUniqueAttributes":  { "count": 217, "timeTaken":  668 },
  "findByTypeAndPropertyName":   { "count": 217, "timeTaken":  641 },
  "findBySuperTypeAndPropertyName":  { "count":  74, "timeTaken":   27 },
  "createOrUpdate":  { "count":   1, "timeTaken": 1637 },
  "preCreateOrUpdate":   { "count":   1, "timeTaken":  720 },
  "mapAttributesAndClassifications": { "count":   1, "timeTaken":   90 },
  "fullTextMapping": { "count":   1, "timeTaken":  195 },
  "notification-getReferenceable":   { "count":   1, "timeTaken":   37 },
  "entityAudit": { "count":   1, "timeTaken":   31 },
  "entityNotification":  { "count":   2, "timeTaken":8 },
  "graphCommit": { "count":  75, "timeTaken":  315 }
}
```


Diffs
-

  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java
 2ccfff4d1 
  common/src/main/java/org/apache/atlas/utils/AtlasPerfMetrics.java 
PRE-CREATION 
  distro/src/conf/atlas-log4j.xml c183871eb 
  notification/src/main/java/org/apache/atlas/kafka/KafkaNotification.java 
4bec91709 
  repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java 
4c436779e 
  
repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditListener.java
 dfacb3817 
  
repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditListenerV2.java
 8ca8c9a0b 
  
repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java
 08ccd9c73 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java
 a8c3363d5 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityGraphDiscoveryV2.java
 6580beecd 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java
 d97b74d9d 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasGraphUtilsV2.java
 25770a334 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java
 6c830bafe 
  server-api/src/main/java/org/apache/atlas/RequestContext.java 099d713f6 
  
webapp/src/main/java/org/apache/atlas/notification/EntityNotificationListenerV2.java
 e0a60a133 
  
webapp/src/main/java/org/apache/atlas/notification/NotificationEntityChangeListener.java
 b5e7ed871 
  
webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java
 b95594831 


Diff: https://reviews.apache.org/r/69587/diff/1/


Testing
---

- verified the instumentation output in metric.log
- pre-commit tests run: 
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/869


Thanks,

Madhan Neethiraj



Re: Review Request 69392: ATLAS-2810 - Add notifications from RelationshipStore

2018-12-18 Thread Graham Wallis

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69392/#review211403
---



Hi Nikhil

I downloaded this and tested it and it works pretty well.

I'm not sure whether you intended it or not, but when a relationship is added 
it fires two notifications. I think this is because both the relationship 
store's createRelationship() and create() methods invoke sendNotifications(). 
You maybe only wanted one of them. I also tested the update and delete 
operations and they each generate one notification.


You may want to add a full implementation of toString() so that it provides 
more information in a log. For example, I temporarily modified it like this:

@Override
public StringBuilder toString(StringBuilder sb) {
if (sb == null) {
sb = new StringBuilder();
}

sb.append("AtlasRelationshipHeader{");
sb.append("guid='").append(guid).append('\'');
sb.append(", status=").append(status);
sb.append(", label=").append(label);
sb.append(", propagateTags=").append(propagateTags);
sb.append(", end1=").append(end1);
sb.append(", end2=").append(end2);
super.toString(sb);
sb.append('}');

return sb;
}

- Graham Wallis


On Dec. 17, 2018, 5:13 a.m., Nikhil Bonte wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69392/
> ---
> 
> (Updated Dec. 17, 2018, 5:13 a.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Nixon Rodrigues, and Sarath 
> Subramanian.
> 
> 
> Bugs: ATLAS-2810
> https://issues.apache.org/jira/browse/ATLAS-2810
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Notifications for relationship mutations.
> This will add message to kafka topic when relationship is 
> added/updated/deleted.
> 
> Sample messages :
> 
> ADD
> {"version":{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgSourceIP":"127.0.1.1","msgCreatedBy":"","msgCreationTime":1544787083448,"message":{"type":"ENTITY_NOTIFICATION_V2","relationship":{"typeName":"hive_table_db","guid":"e4949325-562a-43bf-a980-c0c562c8759e","status":"ACTIVE","label":"__hive_table.db","end1":{"guid":"c8792b2f-b550-47c2-baf9-1ddc0f25bac1","typeName":"hive_table"},"end2":{"guid":"414b7277-06f5-4a50-869c-3af62655e799","typeName":"hive_db"}},"operationType":"RELATIONSHIP_ADD","eventTime":1544787072298}}
> 
> UPDATE
> {"version":{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgSourceIP":"127.0.1.1","msgCreatedBy":"","msgCreationTime":1544787404692,"message":{"type":"ENTITY_NOTIFICATION_V2","relationship":{"typeName":"hive_table_db","guid":"e4949325-562a-43bf-a980-c0c562c8759e","status":"ACTIVE","label":"__hive_table.db","end1":{"guid":"c8792b2f-b550-47c2-baf9-1ddc0f25bac1","typeName":"hive_table"},"end2":{"guid":"414b7277-06f5-4a50-869c-3af62655e799","typeName":"hive_db"}},"operationType":"RELATIONSHIP_UPDATE","eventTime":1544787404679}}
> 
> DELETE
> {"version":{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgSourceIP":"127.0.1.1","msgCreatedBy":"","msgCreationTime":1544787460081,"message":{"type":"ENTITY_NOTIFICATION_V2","relationship":{"typeName":"hive_table_db","guid":"e4949325-562a-43bf-a980-c0c562c8759e","status":"DELETED","label":"__hive_table.db","end1":{"guid":"c8792b2f-b550-47c2-baf9-1ddc0f25bac1","typeName":"hive_table"},"end2":{"guid":"414b7277-06f5-4a50-869c-3af62655e799","typeName":"hive_db"}},"operationType":"RELATIONSHIP_DELETE","eventTime":1544787459708}}
> 
> 
> Diffs
> -
> 
>   intg/src/main/java/org/apache/atlas/listener/EntityChangeListenerV2.java 
> cccf0d4 
>   
> intg/src/main/java/org/apache/atlas/model/instance/AtlasRelationshipHeader.java
>  PRE-CREATION 
>   
> intg/src/main/java/org/apache/atlas/model/notification/EntityNotification.java
>  1eae100 
>   
> repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditListenerV2.java
>  8ca8c9a 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java
>  a8c3363 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
>  86cc98c 
>   
> webapp/src/main/java/org/apache/atlas/notification/EntityNotificationListenerV2.java
>  e0a60a1 
> 
> 
> Diff: https://reviews.apache.org/r/69392/diff/5/
> 
> 
> Testing
> ---
> 
> Tested from REST client(Postman).
> 1) Added/updated/deleted relationship by hitting REST APIs(see details below)
> 2) Verified message is sent to kafka using kafka-console-consumer
>./kafka-console-consumer.sh --bootstrap-server localhost:9027 --topic 

Re: [DRAFT] Board report for Apache Atlas: December 2018

2018-12-18 Thread David Radley
Hi Madhan,
That is a shame. I hope we can work together so by the next report, the 
Egeria connector for Atlas will be in the Atlas code, so we will have an 
even bigger good news story, 
 all the best, David. 



From:   Madhan Neethiraj 
To: "dev@atlas.apache.org" 
Date:   13/12/2018 22:42
Subject:Re: [DRAFT] Board report for Apache Atlas: December 2018



David,

Thanks for reviewing the report and suggestion to reference Egeria. This 
would indeed be a good item to include. However, the report was sent to 
the board yesterday - as it was the last day. I will include Egeria 
reference in the next report. Hope it is alright with you.

Thanks,
Madhan


On 12/13/18, 1:23 AM, "David Radley"  wrote:

Hi Madhan,
Enhancements have gone into Atlas to enable Atlas to be a reference 
implementation for Egeria. The Egeria Atlas connector is out for 
review. 
This is a very positive story for Atlas, that we could include in the 
draft board report,all the best, David. 
 
 
 
From:   Madhan Neethiraj 
To: "dev@atlas.apache.org" 
Date:   11/12/2018 05:41
Subject:[DRAFT] Board report for Apache Atlas: December 2018
 
 
 
Atlas team,
 
 
 
Please review the draft board report below and send your 
feedback/comments.
 
 
 
Thanks,
 
Madhan
 
 
 
 
 
## Description:
 
  Apache Atlas is a scalable and extensible set of core foundational
 
  governance services that enables enterprises to effectively and 
efficiently
 
  meet their compliance requirements within Hadoop and allows 
integration 
with
 
  the complete enterprise data ecosystem
 
 
 
## Issues:
 
  There are no issues requiring board attention at this time.
 
 
 
## Activity:
 
  - released 0.8.3 on 10/31/2018
 
  - released 1.1.0 on 09/17/2018 
 
  - working on 2.0.0 release, to support Hadoop 3, HBase 2, Solr 7, 
Kafka 
2, Hive 3
 
  - updated to support Hadoop trusted-proxy authentication
 
  - updated lineage UI to support entity-type specific icons, 
customizable 
depth, option to hide process entities
 
  - performance related fixes in Hive hook and notification processing
 
  - model enhancements to support soft-ref
 
  - export/import enhancements to create audit entries containing 
summary 
of the operation
 
 
 
## Health report:
 
  - 1 new contributor added in last 3 months: Nikhil Bonte
 
 
 
## PMC changes:
 
  - Currently 33 PMC members
 
  - No new PMC members added in last 3 months
 
  - Last PMC member addition was on 6/21/2017
 
 
 
## Committer base changes:
 
  - Currently 38 committers
 
  - 1 new committer was added in last 3 months: Ramesh Mani
 
  - Last addition to committer role was on 10/15/2018
 
 
 
## Releases:
 
  2.0.0plan to release by 12/31/2018
 
  0.8.3was released on 10/31/2018
 
  1.1.0was released on 09/17/2018
 
  1.0.0was released on 06/02/2018
 
  0.8.2was released on 02/05/2018
 
  1.0.0-alpha  was released on 01/25/2018
 
  0.8.1was released on 08/29/2017
 
  0.8-incubating   was released on 03/16/2017
 
  0.7.1-incubating was released on 01/26/2017
 
  0.7-incubating   was released on 07/09/2016
 
  0.6-incubating   was released on 12/31/2015
 
  0.5-incubating   was released on 07/11/2015
 
 
 
 
 
 
 
 
 
 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with 
number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
 





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU