Re: Review Request 71919: ATLAS-3563: Improve tag propagation performance using in-memory traversal

2019-12-17 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71919/#review219049
---


Ship it!




Ship It!

- Madhan Neethiraj


On Dec. 18, 2019, 1:31 a.m., Sarath Subramanian wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71919/
> ---
> 
> (Updated Dec. 18, 2019, 1:31 a.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, 
> Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer 
> Shaikh, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3563
> https://issues.apache.org/jira/browse/ATLAS-3563
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Tag propagation uses gremlin query to find entities to which the tag has to 
> be propagated to.
> 
> Gremlin query doesn't scale well for entities with large lineage (with many 
> depth). In-memory traversal seems to have improved performance significantly 
> since it avoids the overhead added by gremlin script engine initialization, 
> query execution time.
> 
>  
> 
> Performance improvement in tag propagation from 3004 ms to 180 ms is seen
> 
> 
> Diffs
> -
> 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java
>  6de4dcf10 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
>  71b285731 
>   intg/src/main/java/org/apache/atlas/AtlasErrorCode.java 7a2aae2e9 
>   intg/src/main/java/org/apache/atlas/type/AtlasEntityType.java 928ac0d8b 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 1e7acf1e7 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v1/DeleteHandlerV1.java
>  c9ed79750 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
>  1c8b057ba 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java
>  a415d3084 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
>  8a24fa127 
>   
> repository/src/main/java/org/apache/atlas/util/AtlasGremlin3QueryProvider.java
>  20c570f7f 
>   
> repository/src/main/java/org/apache/atlas/util/AtlasGremlinQueryProvider.java 
> d201db338 
>   
> repository/src/test/java/org/apache/atlas/repository/tagpropagation/ClassificationPropagationTest.java
>  6f9c05e7a 
> 
> 
> Diff: https://reviews.apache.org/r/71919/diff/4/
> 
> 
> Testing
> ---
> 
> Manually validated tag propagation works.
> 
> * Add classification
> * Block propagation
> * Change Propagation direction
> * Remove Classification
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>



Re: Review Request 71919: ATLAS-3563: Improve tag propagation performance using in-memory traversal

2019-12-17 Thread Sarath Subramanian

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71919/
---

(Updated Dec. 17, 2019, 5:31 p.m.)


Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, 
Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, 
and Sarath Subramanian.


Bugs: ATLAS-3563
https://issues.apache.org/jira/browse/ATLAS-3563


Repository: atlas


Description
---

Tag propagation uses gremlin query to find entities to which the tag has to be 
propagated to.

Gremlin query doesn't scale well for entities with large lineage (with many 
depth). In-memory traversal seems to have improved performance significantly 
since it avoids the overhead added by gremlin script engine initialization, 
query execution time.

 

Performance improvement in tag propagation from 3004 ms to 180 ms is seen


Diffs (updated)
-

  
graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java 
6de4dcf10 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
 71b285731 
  intg/src/main/java/org/apache/atlas/AtlasErrorCode.java 7a2aae2e9 
  intg/src/main/java/org/apache/atlas/type/AtlasEntityType.java 928ac0d8b 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
1e7acf1e7 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v1/DeleteHandlerV1.java
 c9ed79750 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
 1c8b057ba 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java
 a415d3084 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
 8a24fa127 
  
repository/src/main/java/org/apache/atlas/util/AtlasGremlin3QueryProvider.java 
20c570f7f 
  repository/src/main/java/org/apache/atlas/util/AtlasGremlinQueryProvider.java 
d201db338 
  
repository/src/test/java/org/apache/atlas/repository/tagpropagation/ClassificationPropagationTest.java
 6f9c05e7a 


Diff: https://reviews.apache.org/r/71919/diff/4/

Changes: https://reviews.apache.org/r/71919/diff/3-4/


Testing
---

Manually validated tag propagation works.

* Add classification
* Block propagation
* Change Propagation direction
* Remove Classification


Thanks,

Sarath Subramanian



Re: Review Request 71919: ATLAS-3563: Improve tag propagation performance using in-memory traversal

2019-12-17 Thread Sarath Subramanian

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71919/
---

(Updated Dec. 17, 2019, 10:05 a.m.)


Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, 
Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, 
and Sarath Subramanian.


Bugs: ATLAS-3563
https://issues.apache.org/jira/browse/ATLAS-3563


Repository: atlas


Description
---

Tag propagation uses gremlin query to find entities to which the tag has to be 
propagated to.

Gremlin query doesn't scale well for entities with large lineage (with many 
depth). In-memory traversal seems to have improved performance significantly 
since it avoids the overhead added by gremlin script engine initialization, 
query execution time.

 

Performance improvement in tag propagation from 3004 ms to 180 ms is seen


Diffs (updated)
-

  
graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java 
6de4dcf10 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
 71b285731 
  intg/src/main/java/org/apache/atlas/type/AtlasEntityType.java 928ac0d8b 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
1e7acf1e7 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v1/DeleteHandlerV1.java
 c9ed79750 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
 1c8b057ba 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java
 a415d3084 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
 8a24fa127 
  
repository/src/main/java/org/apache/atlas/util/AtlasGremlin3QueryProvider.java 
20c570f7f 
  repository/src/main/java/org/apache/atlas/util/AtlasGremlinQueryProvider.java 
d201db338 


Diff: https://reviews.apache.org/r/71919/diff/3/

Changes: https://reviews.apache.org/r/71919/diff/2-3/


Testing
---

Manually validated tag propagation works.

* Add classification
* Block propagation
* Change Propagation direction
* Remove Classification


Thanks,

Sarath Subramanian



Re: Review Request 71919: ATLAS-3563: Improve tag propagation performance using in-memory traversal

2019-12-17 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71919/#review219043
---




graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
Lines 76 (patched)


Given the underlying vertex classes expect a string array, consider using 
"String[]"  as the type for parameter "edgeLabels", instead of 
"Collection".



intg/src/main/java/org/apache/atlas/type/AtlasEntityType.java
Lines 284 (patched)


LOG.info ==> LOG.debug



intg/src/main/java/org/apache/atlas/type/AtlasEntityType.java
Lines 407 (patched)


";;" => ";"



repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
Lines 412 (patched)


impactedEntityVertices => propagatedEntities
  // entity vertices to which the classification is currently propagated to

impactedEntityVerticesWithRestrictions => impactedEntities
  // entity vertices to which the classifications must be propagated to



repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
Lines 418 (patched)


- is 'ret' in #416 the list of propagations to be added?
- is 'ret' in #418 the list of propagations to be removed?

Consider adding a comment for this method. Looking at the caller of this 
method in AtlasRelationshipStoreV2.handleBlockedClassifications(), the list 
returned from this method seems to be used to both remove and add propagations. 
Please review and refactor/rename as neceessary.



repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
Lines 466 (patched)


classificationIdToExclude => classificationId
  in #466 and #474



repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
Lines 517 (patched)


getAdjacentVertex() => getOtherVertex() // to be inline with 
JanusGraphEdge.otherVertex()


- Madhan Neethiraj


On Dec. 17, 2019, 8:29 a.m., Sarath Subramanian wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71919/
> ---
> 
> (Updated Dec. 17, 2019, 8:29 a.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, 
> Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer 
> Shaikh, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3563
> https://issues.apache.org/jira/browse/ATLAS-3563
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Tag propagation uses gremlin query to find entities to which the tag has to 
> be propagated to.
> 
> Gremlin query doesn't scale well for entities with large lineage (with many 
> depth). In-memory traversal seems to have improved performance significantly 
> since it avoids the overhead added by gremlin script engine initialization, 
> query execution time.
> 
>  
> 
> Performance improvement in tag propagation from 3004 ms to 180 ms is seen
> 
> 
> Diffs
> -
> 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java
>  6de4dcf10 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
>  71b285731 
>   intg/src/main/java/org/apache/atlas/type/AtlasEntityType.java 928ac0d8b 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 1e7acf1e7 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v1/DeleteHandlerV1.java
>  c9ed79750 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
>  1c8b057ba 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java
>  a415d3084 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
>  8a24fa127 
>   
> repository/src/main/java/org/apache/atlas/util/AtlasGremlin3QueryProvider.java
>  20c570f7f 
>   
> repository/src/main/java/org/apache/atlas/util/AtlasGremlinQueryProvider.java 
> d201db338 
> 
> 
> Diff: https://reviews.apache.org/r/71919/diff/2/
> 
> 
> Testing
> ---
> 
> Manually validated tag propagation works.
> 
> * Add classification
> * Block propagation
> * Change Propagation direction
> * Remove Classification
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>