Re: Review Request 72287: Edge Creation Improvements

2020-04-02 Thread Ashutosh Mestry via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/
---

(Updated April 2, 2020, 6:14 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and 
Sarath Subramanian.


Changes
---

Updates include: New PC build details.


Bugs: ATLAS-3706
https://issues.apache.org/jira/browse/ATLAS-3706


Repository: atlas


Description
---

**Approach**

1. Added Metrics to most of the methods in entity creation. (The patch does not 
include the additional metrics added to additional places.)
2. Started importing large number of entities using the 
_ZipFileMigrationImporter_.
3. Observed behavior of import over 24 hours. Observations included CPU usage, 
memory usage and the import throughput using the _metric.log_.
4. Changes were added to the one at a time. Impact of the change was observed 
for performance (via metric.log) and accuracy before next change was added.

**Observations**
* Relationship creation took inordinately large amount of time under load. The 
time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation 
also caused memory build up of _AtlasEdge_ objects which stayed in memory for 
long time. This had the secondary effect of slowing down entity creation 
operations after about 6 hours (this duration differed with node configuration).
* _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time 
consuming.
* _GraphBackedSearchIndexer_ edge label index. Majority of edge creation 
operation included lookup by edge label.

**Configuration**
Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
Atlas configuration: 32 GB RAM.


Diffs
-

  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
5ab9f4d13 


Diff: https://reviews.apache.org/r/72287/diff/2/


Testing (updated)
---

**Manual tests**
(See above).
Accuracy verification.

**Unit tests**
Executed existing unit tests.

**Pre-commit build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1782/


Thanks,

Ashutosh Mestry



Re: Review Request 72287: Edge Creation Improvements

2020-04-02 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220193
---


Ship it!




Ship It!

- Madhan Neethiraj


On April 2, 2020, 3:27 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> ---
> 
> (Updated April 2, 2020, 3:27 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
> https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does 
> not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the 
> _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU 
> usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed 
> for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. 
> The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This 
> implementation also caused memory build up of _AtlasEdge_ objects which 
> stayed in memory for long time. This had the secondary effect of slowing down 
> entity creation operations after about 6 hours (this duration differed with 
> node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is 
> time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation 
> operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/2/
> 
> 
> Testing
> ---
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Re: Review Request 72287: Edge Creation Improvements

2020-04-02 Thread Ashutosh Mestry via Review Board


> On April 2, 2020, 5:39 a.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
> > Lines 344 (patched)
> > 
> >
> > edgeLabel is typicallu used to find subset of edges from a given 
> > vertex. Having an edge-index on the label probably won't help improve the 
> > performance; however, need to understand the impact of creating this index 
> > in an existing Atlas instance having large number of edges. 1) Would index 
> > be populated with existing edge labels? 2) If yes, how long would the index 
> > creation take - say for 1m edges? 3) If no, would search ignore edges that 
> > were not indexd?
> > 
> > I suggest to find the performace impact of not having this index.

I did a run last night without the index and it did not have impact on the 
performance. I have removed this change.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220181
---


On March 30, 2020, 11:19 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> ---
> 
> (Updated March 30, 2020, 11:19 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
> https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does 
> not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the 
> _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU 
> usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed 
> for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. 
> The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This 
> implementation also caused memory build up of _AtlasEdge_ objects which 
> stayed in memory for long time. This had the secondary effect of slowing down 
> entity creation operations after about 6 hours (this duration differed with 
> node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is 
> time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation 
> operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -
> 
>   
> repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
>  647e3040c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/1/
> 
> 
> Testing
> ---
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Re: Review Request 72287: Edge Creation Improvements

2020-04-02 Thread Ashutosh Mestry via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/
---

(Updated April 2, 2020, 3:27 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and 
Sarath Subramanian.


Changes
---

Updates include: Addressed review comments.


Bugs: ATLAS-3706
https://issues.apache.org/jira/browse/ATLAS-3706


Repository: atlas


Description
---

**Approach**

1. Added Metrics to most of the methods in entity creation. (The patch does not 
include the additional metrics added to additional places.)
2. Started importing large number of entities using the 
_ZipFileMigrationImporter_.
3. Observed behavior of import over 24 hours. Observations included CPU usage, 
memory usage and the import throughput using the _metric.log_.
4. Changes were added to the one at a time. Impact of the change was observed 
for performance (via metric.log) and accuracy before next change was added.

**Observations**
* Relationship creation took inordinately large amount of time under load. The 
time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation 
also caused memory build up of _AtlasEdge_ objects which stayed in memory for 
long time. This had the secondary effect of slowing down entity creation 
operations after about 6 hours (this duration differed with node configuration).
* _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time 
consuming.
* _GraphBackedSearchIndexer_ edge label index. Majority of edge creation 
operation included lookup by edge label.

**Configuration**
Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
Atlas configuration: 32 GB RAM.


Diffs (updated)
-

  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
5ab9f4d13 


Diff: https://reviews.apache.org/r/72287/diff/2/

Changes: https://reviews.apache.org/r/72287/diff/1-2/


Testing
---

**Manual tests**
(See above).
Accuracy verification.

**Unit tests**
Executed existing unit tests.

**Pre-commit build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/


Thanks,

Ashutosh Mestry



Re: Review Request 72287: Edge Creation Improvements

2020-04-01 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220181
---


Fix it, then Ship it!





repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
Lines 344 (patched)


edgeLabel is typicallu used to find subset of edges from a given vertex. 
Having an edge-index on the label probably won't help improve the performance; 
however, need to understand the impact of creating this index in an existing 
Atlas instance having large number of edges. 1) Would index be populated with 
existing edge labels? 2) If yes, how long would the index creation take - say 
for 1m edges? 3) If no, would search ignore edges that were not indexd?

I suggest to find the performace impact of not having this index.


- Madhan Neethiraj


On March 30, 2020, 11:19 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> ---
> 
> (Updated March 30, 2020, 11:19 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
> https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does 
> not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the 
> _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU 
> usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed 
> for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. 
> The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This 
> implementation also caused memory build up of _AtlasEdge_ objects which 
> stayed in memory for long time. This had the secondary effect of slowing down 
> entity creation operations after about 6 hours (this duration differed with 
> node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is 
> time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation 
> operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -
> 
>   
> repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
>  647e3040c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/1/
> 
> 
> Testing
> ---
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Re: Review Request 72287: Edge Creation Improvements

2020-04-01 Thread Nikhil Bonte

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220180
---


Ship it!




Ship It!

- Nikhil Bonte


On March 30, 2020, 11:19 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> ---
> 
> (Updated March 30, 2020, 11:19 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
> https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does 
> not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the 
> _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU 
> usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed 
> for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. 
> The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This 
> implementation also caused memory build up of _AtlasEdge_ objects which 
> stayed in memory for long time. This had the secondary effect of slowing down 
> entity creation operations after about 6 hours (this duration differed with 
> node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is 
> time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation 
> operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -
> 
>   
> repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
>  647e3040c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/1/
> 
> 
> Testing
> ---
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>