[jira] [Commented] (ATLAS-3762) Entity Creation: Improve Edges Fetch Between Vertices

2020-04-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096104#comment-17096104
 ] 

ASF subversion and git services commented on ATLAS-3762:


Commit 25f3002e0e84927eb39cebb5708d77ef81755d79 in atlas's branch 
refs/heads/master from Ashutosh Mestry
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=25f3002 ]

ATLAS-3762: Improve Edge creator using Genuine iterator.


> Entity Creation: Improve Edges Fetch Between Vertices
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments: 
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with 
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all 
> contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Edge Fetch Refactoring*
> Change the _getEdge_ to iterate on smaller dataset. 
> Here are the scenarios:
> - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that 
> outgoing edges from _fromVertex_ will be many more than incoming edges to 
> _toVertex_.
> - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This 
> means that outgoing edges from _fromVertex_ will be fewer than incoming edges 
> _hive_table_.
> Approach:
>  * Search it is a linear search, it will be more efficient to iterate over 
> fewer items than more items.
>  * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the 
> count is 0, return NULL, since it will not result in anything being found.
>  * If either of the counts is not 0, take the one with fewer elements and 
> perform a search.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72452: Efficiently Searching for Edges Between Vertices

2020-04-29 Thread Ashutosh Mestry via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72452/
---

(Updated April 30, 2020, 4:04 a.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
Sarath Subramanian, and Sidharth Mishra.


Changes
---

Updates include: Updated with latest PC build details.


Bugs: ATLAS-3762
https://issues.apache.org/jira/browse/ATLAS-3762


Repository: atlas


Description
---

**Problem Definition**
Please refer to JIRA for details.

**Updates**
- Modifed: _AtlasJanusGraph.wrapVertices_ and _AtlasJanusGraph.wrapEdges_ : Now 
uses genuine iterators. This reduces number of elements fetched, since the 
search is linear.
- New: _AtlasVertex.getEdgeCount_ fetches edge count using iterator returned 
from _JanusVertex_. Fetching the count is effecient using stream support.


Diffs
-

  
graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java 
9406e26ff 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
 eb0206271 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
 fdc9fd0b5 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
2b8227a7e 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
 d1c1f1255 


Diff: https://reviews.apache.org/r/72452/diff/4/


Testing (updated)
---

**Volume testing**
High volume testing makes the edge fetching effcient. Cases where incoming 
edges was in 1000s and outgoing edges was handful.

Memory footprint has improved since JanusGrpah caches edges and then expires 
it. Fetching fewer edges will reduce number of items in memory.

**Pre-commit Build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1860/


Thanks,

Ashutosh Mestry



Re: Review Request 72452: Efficiently Searching for Edges Between Vertices

2020-04-29 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72452/#review220558
---


Fix it, then Ship it!





repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
Lines 786 (patched)


Consider removing 'metric1', to avoid any little impact in this heavily 
used code path.


- Madhan Neethiraj


On April 29, 2020, 9:17 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72452/
> ---
> 
> (Updated April 29, 2020, 9:17 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> Sarath Subramanian, and Sidharth Mishra.
> 
> 
> Bugs: ATLAS-3762
> https://issues.apache.org/jira/browse/ATLAS-3762
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Problem Definition**
> Please refer to JIRA for details.
> 
> **Updates**
> - Modifed: _AtlasJanusGraph.wrapVertices_ and _AtlasJanusGraph.wrapEdges_ : 
> Now uses genuine iterators. This reduces number of elements fetched, since 
> the search is linear.
> - New: _AtlasVertex.getEdgeCount_ fetches edge count using iterator returned 
> from _JanusVertex_. Fetching the count is effecient using stream support.
> 
> 
> Diffs
> -
> 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java
>  9406e26ff 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
>  eb0206271 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
>  fdc9fd0b5 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 2b8227a7e 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
>  d1c1f1255 
> 
> 
> Diff: https://reviews.apache.org/r/72452/diff/3/
> 
> 
> Testing
> ---
> 
> **Volume testing**
> High volume testing makes the edge fetching effcient. Cases where incoming 
> edges was in 1000s and outgoing edges was handful.
> 
> Memory footprint has improved since JanusGrpah caches edges and then expires 
> it. Fetching fewer edges will reduce number of items in memory.
> 
> **Pre-commit Build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1858/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Re: Review Request 72452: Efficiently Searching for Edges Between Vertices

2020-04-29 Thread Sidharth Mishra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72452/#review220557
---


Ship it!




Ship It!

- Sidharth Mishra


On April 29, 2020, 9:17 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72452/
> ---
> 
> (Updated April 29, 2020, 9:17 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> Sarath Subramanian, and Sidharth Mishra.
> 
> 
> Bugs: ATLAS-3762
> https://issues.apache.org/jira/browse/ATLAS-3762
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Problem Definition**
> Please refer to JIRA for details.
> 
> **Updates**
> - Modifed: _AtlasJanusGraph.wrapVertices_ and _AtlasJanusGraph.wrapEdges_ : 
> Now uses genuine iterators. This reduces number of elements fetched, since 
> the search is linear.
> - New: _AtlasVertex.getEdgeCount_ fetches edge count using iterator returned 
> from _JanusVertex_. Fetching the count is effecient using stream support.
> 
> 
> Diffs
> -
> 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java
>  9406e26ff 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
>  eb0206271 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
>  fdc9fd0b5 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 2b8227a7e 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
>  d1c1f1255 
> 
> 
> Diff: https://reviews.apache.org/r/72452/diff/3/
> 
> 
> Testing
> ---
> 
> **Volume testing**
> High volume testing makes the edge fetching effcient. Cases where incoming 
> edges was in 1000s and outgoing edges was handful.
> 
> Memory footprint has improved since JanusGrpah caches edges and then expires 
> it. Fetching fewer edges will reduce number of items in memory.
> 
> **Pre-commit Build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1858/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Re: Review Request 72452: Efficiently Searching for Edges Between Vertices

2020-04-29 Thread Ashutosh Mestry via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72452/
---

(Updated April 29, 2020, 9:17 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
Sarath Subramanian, and Sidharth Mishra.


Changes
---

Updates include: 
- Addressed review comments.
- Addressed readability, simplified approach to detect if iterating over edges 
is needed using the _vertex.hasEdges_ method.


Bugs: ATLAS-3762
https://issues.apache.org/jira/browse/ATLAS-3762


Repository: atlas


Description
---

**Problem Definition**
Please refer to JIRA for details.

**Updates**
- Modifed: _AtlasJanusGraph.wrapVertices_ and _AtlasJanusGraph.wrapEdges_ : Now 
uses genuine iterators. This reduces number of elements fetched, since the 
search is linear.
- New: _AtlasVertex.getEdgeCount_ fetches edge count using iterator returned 
from _JanusVertex_. Fetching the count is effecient using stream support.


Diffs (updated)
-

  
graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java 
9406e26ff 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
 eb0206271 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
 fdc9fd0b5 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
2b8227a7e 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
 d1c1f1255 


Diff: https://reviews.apache.org/r/72452/diff/2/

Changes: https://reviews.apache.org/r/72452/diff/1-2/


Testing
---

**Volume testing**
High volume testing makes the edge fetching effcient. Cases where incoming 
edges was in 1000s and outgoing edges was handful.

Memory footprint has improved since JanusGrpah caches edges and then expires 
it. Fetching fewer edges will reduce number of items in memory.

**Pre-commit Build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1858/


Thanks,

Ashutosh Mestry



[jira] [Comment Edited] (ATLAS-3654) Support solr in standalone (http) mode

2020-04-29 Thread Damian Warszawski (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095892#comment-17095892
 ] 

Damian Warszawski edited comment on ATLAS-3654 at 4/29/20, 9:10 PM:


[~nixon],

it is controlled with following application property 
`_atlas.graph.index.search.solr.mode_` which is also used by JanusGraph. 

Package is build with the profile `_embedded-hbase-solr_` as it used to be for 
`cloud` mode for compatibility reasons.

Perhaps, it would useful to create another profile for `_embedded-solr_` only. 

 


was (Author: dwarszawski):
[~nixon],

it is controlled with following application property 
`atlas.graph.index.search.solr.mode` which is also used by JanusGraph. 

Package is build with the profile `embedded-hbase-solr` as it used to be for 
`cloud` mode for compatibility reasons.

Perhaps, it would useful to create another profile for `embedded-solr` only. 

 

> Support solr in standalone (http) mode
> --
>
> Key: ATLAS-3654
> URL: https://issues.apache.org/jira/browse/ATLAS-3654
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 3.0.0
>Reporter: Damian Warszawski
>Priority: Minor
> Attachments: ATLAS-3654.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Problem description*
> Atlas does not support running Solr in standalone(http) mode.
> *Goals*
>  It is especially useful for testing purposes to make setup as simple as 
> possible without  Zookeeper. It also enables full integration with JanusGraph 
> as it support both mode of running Solr `cloud` and `http` 
> [https://docs.janusgraph.org/index-backend/solr/]. Additional benefit is to 
> decouple hbase and solr while running embedded mode so that solr can be run 
> in embbeded mode with external hbase.
> *Proposed solution*
>  * call solr V1 API  while creating/updating request handlers in standalone 
> solr
>  * update atlas start script to enable standalone embedded solr
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3654) Support solr in standalone (http) mode

2020-04-29 Thread Damian Warszawski (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095892#comment-17095892
 ] 

Damian Warszawski commented on ATLAS-3654:
--

it is controlled with following application property 
`atlas.graph.index.search.solr.mode` which is also used by JanusGraph. 

Package is build with the profile `embedded-hbase-solr` as it used to be for 
`cloud` mode for compatibility reasons.

Perhaps, it would useful to create another profile for `embedded-solr` only. 

 

> Support solr in standalone (http) mode
> --
>
> Key: ATLAS-3654
> URL: https://issues.apache.org/jira/browse/ATLAS-3654
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 3.0.0
>Reporter: Damian Warszawski
>Priority: Minor
> Attachments: ATLAS-3654.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Problem description*
> Atlas does not support running Solr in standalone(http) mode.
> *Goals*
>  It is especially useful for testing purposes to make setup as simple as 
> possible without  Zookeeper. It also enables full integration with JanusGraph 
> as it support both mode of running Solr `cloud` and `http` 
> [https://docs.janusgraph.org/index-backend/solr/]. Additional benefit is to 
> decouple hbase and solr while running embedded mode so that solr can be run 
> in embbeded mode with external hbase.
> *Proposed solution*
>  * call solr V1 API  while creating/updating request handlers in standalone 
> solr
>  * update atlas start script to enable standalone embedded solr
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ATLAS-3654) Support solr in standalone (http) mode

2020-04-29 Thread Damian Warszawski (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095892#comment-17095892
 ] 

Damian Warszawski edited comment on ATLAS-3654 at 4/29/20, 9:09 PM:


[~nixon],

it is controlled with following application property 
`atlas.graph.index.search.solr.mode` which is also used by JanusGraph. 

Package is build with the profile `embedded-hbase-solr` as it used to be for 
`cloud` mode for compatibility reasons.

Perhaps, it would useful to create another profile for `embedded-solr` only. 

 


was (Author: dwarszawski):
it is controlled with following application property 
`atlas.graph.index.search.solr.mode` which is also used by JanusGraph. 

Package is build with the profile `embedded-hbase-solr` as it used to be for 
`cloud` mode for compatibility reasons.

Perhaps, it would useful to create another profile for `embedded-solr` only. 

 

> Support solr in standalone (http) mode
> --
>
> Key: ATLAS-3654
> URL: https://issues.apache.org/jira/browse/ATLAS-3654
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 3.0.0
>Reporter: Damian Warszawski
>Priority: Minor
> Attachments: ATLAS-3654.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Problem description*
> Atlas does not support running Solr in standalone(http) mode.
> *Goals*
>  It is especially useful for testing purposes to make setup as simple as 
> possible without  Zookeeper. It also enables full integration with JanusGraph 
> as it support both mode of running Solr `cloud` and `http` 
> [https://docs.janusgraph.org/index-backend/solr/]. Additional benefit is to 
> decouple hbase and solr while running embedded mode so that solr can be run 
> in embbeded mode with external hbase.
> *Proposed solution*
>  * call solr V1 API  while creating/updating request handlers in standalone 
> solr
>  * update atlas start script to enable standalone embedded solr
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72452: Efficiently Searching for Edges Between Vertices

2020-04-29 Thread Ashutosh Mestry via Review Board


> On April 29, 2020, 7:57 p.m., Sidharth Mishra wrote:
> > repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
> > Lines 780 (patched)
> > 
> >
> > Just thinking the below might be more easy to understand if someone is 
> > not aware of why we do both edge count.
> > 
> >AtlasPerfMetrics.MetricRecorder metric = 
> > RequestContext.get().startMetricRecord("getRelationshipEdge");
> > 
> > AtlasEdge   ret = null;
> > long toVertexIncomingEdgesCount = 
> > graphHelper.getInComingEdgesByLabelCount(toVertex, relationshipLabel);
> > 
> > if(toVertexIncomingEdgesCount > 0) {
> > long fromVertexOutgoingEdgesCount = 
> > graphHelper.getOutGoingEdgesByLabelCount(fromVertex, relationshipLabel);
> > 
> > if(toVertexIncomingEdgesCount < 
> > fromVertexOutgoingEdgesCount) {
> > Iterator edgesIterator = 
> > graphHelper.getIncomingEdgesByLabel(toVertex, relationshipLabel);
> > ret = getActiveEdgeFromList(fromVertex.getId(), 
> > edgesIterator);
> > } else if(fromVertexOutgoingEdgesCount > 0) {
> > Iterator edgesIterator = 
> > graphHelper.getOutGoingEdgesByLabel(fromVertex, relationshipLabel);
> > ret = getActiveEdgeFromList(toVertex.getId(), 
> > edgesIterator);
> > }
> > }
> > 
> > RequestContext.get().endMetricRecord(metric);
> > return ret;

I have changed my approach. Kindly review my latest patch.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72452/#review220553
---


On April 29, 2020, 5:51 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72452/
> ---
> 
> (Updated April 29, 2020, 5:51 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> Sarath Subramanian, and Sidharth Mishra.
> 
> 
> Bugs: ATLAS-3762
> https://issues.apache.org/jira/browse/ATLAS-3762
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Problem Definition**
> Please refer to JIRA for details.
> 
> **Updates**
> - Modifed: _AtlasJanusGraph.wrapVertices_ and _AtlasJanusGraph.wrapEdges_ : 
> Now uses genuine iterators. This reduces number of elements fetched, since 
> the search is linear.
> - New: _AtlasVertex.getEdgeCount_ fetches edge count using iterator returned 
> from _JanusVertex_. Fetching the count is effecient using stream support.
> 
> 
> Diffs
> -
> 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java
>  9406e26ff 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
>  eb0206271 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
>  fdc9fd0b5 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 2b8227a7e 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
>  d1c1f1255 
> 
> 
> Diff: https://reviews.apache.org/r/72452/diff/1/
> 
> 
> Testing
> ---
> 
> **Volume testing**
> High volume testing makes the edge fetching effcient. Cases where incoming 
> edges was in 1000s and outgoing edges was handful.
> 
> Memory footprint has improved since JanusGrpah caches edges and then expires 
> it. Fetching fewer edges will reduce number of items in memory.
> 
> **Pre-commit Build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1858/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



[jira] [Commented] (ATLAS-3760) Optimize FreeTextSearchProcessor to apply exclude deleted entity filter on solr side.

2020-04-29 Thread Damian Warszawski (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095884#comment-17095884
 ] 

Damian Warszawski commented on ATLAS-3760:
--

[~madhan] thanks for getting this done so quickly.

> Optimize FreeTextSearchProcessor to apply exclude deleted entity  filter on 
> solr side.
> --
>
> Key: ATLAS-3760
> URL: https://issues.apache.org/jira/browse/ATLAS-3760
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Reporter: Damian Warszawski
>Priority: Minor
> Fix For: 2.1.0, 3.0.0
>
>
> *Problem description*
> Current implementation of FreeTextSearchProcessor applies filtering in memory 
> to exclude deleted entities.
> This introduces significant performance overhead by generating redundant 
> calls to solr index. 
> *Goals*
> Improve performance of FreeTextSearchProcessor by applying filter in solr 
> query.
> *Proposed solution*
>  * replace in-memory filtering with filter in solr query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72452: Efficiently Searching for Edges Between Vertices

2020-04-29 Thread Sidharth Mishra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72452/#review220553
---




repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
Lines 780 (patched)


Just thinking the below might be more easy to understand if someone is not 
aware of why we do both edge count.

   AtlasPerfMetrics.MetricRecorder metric = 
RequestContext.get().startMetricRecord("getRelationshipEdge");

AtlasEdge   ret = null;
long toVertexIncomingEdgesCount = 
graphHelper.getInComingEdgesByLabelCount(toVertex, relationshipLabel);

if(toVertexIncomingEdgesCount > 0) {
long fromVertexOutgoingEdgesCount = 
graphHelper.getOutGoingEdgesByLabelCount(fromVertex, relationshipLabel);

if(toVertexIncomingEdgesCount < fromVertexOutgoingEdgesCount) {
Iterator edgesIterator = 
graphHelper.getIncomingEdgesByLabel(toVertex, relationshipLabel);
ret = getActiveEdgeFromList(fromVertex.getId(), 
edgesIterator);
} else if(fromVertexOutgoingEdgesCount > 0) {
Iterator edgesIterator = 
graphHelper.getOutGoingEdgesByLabel(fromVertex, relationshipLabel);
ret = getActiveEdgeFromList(toVertex.getId(), 
edgesIterator);
}
}

RequestContext.get().endMetricRecord(metric);
return ret;


- Sidharth Mishra


On April 29, 2020, 5:51 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72452/
> ---
> 
> (Updated April 29, 2020, 5:51 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> Sarath Subramanian, and Sidharth Mishra.
> 
> 
> Bugs: ATLAS-3762
> https://issues.apache.org/jira/browse/ATLAS-3762
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Problem Definition**
> Please refer to JIRA for details.
> 
> **Updates**
> - Modifed: _AtlasJanusGraph.wrapVertices_ and _AtlasJanusGraph.wrapEdges_ : 
> Now uses genuine iterators. This reduces number of elements fetched, since 
> the search is linear.
> - New: _AtlasVertex.getEdgeCount_ fetches edge count using iterator returned 
> from _JanusVertex_. Fetching the count is effecient using stream support.
> 
> 
> Diffs
> -
> 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java
>  9406e26ff 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
>  eb0206271 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
>  fdc9fd0b5 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 2b8227a7e 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
>  d1c1f1255 
> 
> 
> Diff: https://reviews.apache.org/r/72452/diff/1/
> 
> 
> Testing
> ---
> 
> **Volume testing**
> High volume testing makes the edge fetching effcient. Cases where incoming 
> edges was in 1000s and outgoing edges was handful.
> 
> Memory footprint has improved since JanusGrpah caches edges and then expires 
> it. Fetching fewer edges will reduce number of items in memory.
> 
> **Pre-commit Build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1858/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Re: Review Request 72450: ATLAS-3763 Add "serviceType" in AtlasEntityHeader

2020-04-29 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72450/#review220552
---



Mandar - UI already caches type-def details, to render in the left-hand side 
pane. UI should be able to find the service-type from its typedef cache. I 
suggest to not introduce AtlasEntityHeader.serviceType field to address this 
need.

- Madhan Neethiraj


On April 29, 2020, 12:24 p.m., Mandar Ambawane wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72450/
> ---
> 
> (Updated April 29, 2020, 12:24 p.m.)
> 
> 
> Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nixon Rodrigues, 
> and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3763
> https://issues.apache.org/jira/browse/ATLAS-3763
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Added "serviceType" in AtlasEntityHeader, so from UI 
> https://issues.apache.org/jira/browse/ATLAS-3366 this can be handled
> 
> 
> Diffs
> -
> 
>   intg/src/main/java/org/apache/atlas/model/instance/AtlasEntityHeader.java 
> 7d2476a 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
>  757fcb1 
> 
> 
> Diff: https://reviews.apache.org/r/72450/diff/1/
> 
> 
> Testing
> ---
> 
> Pre-commit: 
> https://builds.apache.org/job/PreCommit-ATLAS-Build-Test/1856/console
> 
> 
> Thanks,
> 
> Mandar Ambawane
> 
>



[jira] [Comment Edited] (ATLAS-3755) Allow system attributes to be updated when policy allows

2020-04-29 Thread Bolke de Bruin (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095795#comment-17095795
 ] 

Bolke de Bruin edited comment on ATLAS-3755 at 4/29/20, 7:09 PM:
-

[~madhan] I don't think that would work for the KafkaConsumer, or does it? 
Please note that also Glossary needs to be updated to contain all system 
attributes as it is missing at least "homeId". In addition I don't think the 
risk is that high that system attributes get updated inadvertent. The default 
authorization model denies access to these attributes. Next to that it would 
require the incoming message to include those system properties. In this case 
you could argue that it should not be consumed at all if not allowed as it is a 
bad behaving client. We would like to integrate Atlas with another metadata 
system, which would actually make the occurrence much more frequent in around 
50% of the updates. Given the fact that system attributes are part of the 
vertex there is not much additional cost as far as I can see in doing this 
during entity updates.

On your second point. I'm not strongly bound to them so I can merge them. I do 
think there might be cases that you would like to allow an update but disallow 
a create.

Authorization per attribute allows end-users to edit a particular attribute 
(say description) without allowing editing of all properties of the entities. 
This is actually a very common use case as you would like users to be able to 
enrich the metadata without adjusting some core attributes or system generated 
attributes. I understand your point about performance. What I could do is to 
submit an ArrayList and create a RangerCollectionResourceMatcher that requires 
all items in the submitted array to be matches (as opposed to 
RangerDefaultResourceMatcher) that should resolve the issue of CPU cycles and 
audit logs.

What do you think?


was (Author: bolke):
[~madhan] I don't think that would work for the KafkaConsumer, or does it? 
Please note that also Glossary needs to be updated to contain all system 
attributes as it is missing at least "homeId". In addition I don't think the 
risk is that high that system attributes get updated inadvertent. The default 
authorization model denies access to these attributes. Next to that it would 
require the incoming message to include those system properties. In this case 
you could argue that it should not be consumed at all if not allowed as it is a 
bad behaving client. We would like to integrate Atlas with another metadata 
system, which would actually make the occurrence much more frequent in around 
50% of the updates. Given the fact that system attributes are part of the 
vertex there is not much additional cost as far as I can see in doing this 
during entity updates.

On your second point. I'm not strongly bound to them so I can merge them. I do 
think there might be cases that you would like to allow an update but disallow 
a create.

Authorization per attribute allows end-users to edit a particular attribute 
(say description) without allowing editing of all properties of the entities. 
This is actually a very common use case as you would like users to be able to 
enrich the metadata without adjusting some core attributes or system generated 
attributes. I understand your point about performance. What I could do is to 
submit an Array in string format (e.g. "attribute1;attribute2;attribute4") and 
create a RangerArrayResourceMatcher that allows matching on 1 item which should 
resolve the issue of CPU cycles and audit logs.

What do you think?

> Allow system attributes to be updated when policy allows
> 
>
> Key: ATLAS-3755
> URL: https://issues.apache.org/jira/browse/ATLAS-3755
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Critical
> Attachments: 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch
>
>
> Atlas does not operate in a isolated environment, this is one of the reasons 
> the "homeId" system attribute was introduced. Unfortunately system attributes 
> can only be updated when importing. This means any integration with other 
> services is significantly limited (Kafka, Rest API will not work). (See also 
> ATLAS-3754)
> To resolve this I propose to make it possible to update the system attributes 
> when policy allows it. This introduces new 
> AtlasPrivilege.ENTITY_UPDATE_SYSTEM_ATTRIBUTE and 
> AtlasPrivilege.ENTITY_CREATE_SYSTEM_ATTRIBUTE next to 
> AtlasPrivilege.ENTITY_UPDATE_ATTRIBUTE and 
> 

[jira] [Commented] (ATLAS-3755) Allow system attributes to be updated when policy allows

2020-04-29 Thread Bolke de Bruin (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095795#comment-17095795
 ] 

Bolke de Bruin commented on ATLAS-3755:
---

[~madhan] I don't think that would work for the KafkaConsumer, or does it? 
Please note that also Glossary needs to be updated to contain all system 
attributes as it is missing at least "homeId". In addition I don't think the 
risk is that high that system attributes get updated inadvertent. The default 
authorization model denies access to these attributes. Next to that it would 
require the incoming message to include those system properties. In this case 
you could argue that it should not be consumed at all if not allowed as it is a 
bad behaving client. We would like to integrate Atlas with another metadata 
system, which would actually make the occurrence much more frequent in around 
50% of the updates. Given the fact that system attributes are part of the 
vertex there is not much additional cost as far as I can see in doing this 
during entity updates.

On your second point. I'm not strongly bound so I can merge them. I do think 
there might be cases that you would like to allow an update but disallow a 
create.

Authorization per attribute allows end-users to edit a particular attribute 
(say description) without allowing editing of all properties of the entities. 
This is actually a very common use case as you would like users to be able to 
enrich the metadata without adjusting some core attributes or system generated 
attributes. I understand your point about performance. What I could do is to 
submit an Array in string format (e.g. "attribute1;attribute2;attribute4") and 
create a RangerArrayResourceMatcher that allows matching on 1 item which should 
resolve the issue of CPU cycles and audit logs.

What do you think?

> Allow system attributes to be updated when policy allows
> 
>
> Key: ATLAS-3755
> URL: https://issues.apache.org/jira/browse/ATLAS-3755
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Critical
> Attachments: 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch
>
>
> Atlas does not operate in a isolated environment, this is one of the reasons 
> the "homeId" system attribute was introduced. Unfortunately system attributes 
> can only be updated when importing. This means any integration with other 
> services is significantly limited (Kafka, Rest API will not work). (See also 
> ATLAS-3754)
> To resolve this I propose to make it possible to update the system attributes 
> when policy allows it. This introduces new 
> AtlasPrivilege.ENTITY_UPDATE_SYSTEM_ATTRIBUTE and 
> AtlasPrivilege.ENTITY_CREATE_SYSTEM_ATTRIBUTE next to 
> AtlasPrivilege.ENTITY_UPDATE_ATTRIBUTE and 
> AtlasPrivilege.ENTITY_CREATE_ATTRIBUTE rather than just checking on the 
> entity level. In certain places we will then drop the requirement for an 
> import to be active as this can now happen through other channels as well.
> This allows operators to specify policies that allow granular controls over 
> attributes and system attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ATLAS-3755) Allow system attributes to be updated when policy allows

2020-04-29 Thread Bolke de Bruin (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095795#comment-17095795
 ] 

Bolke de Bruin edited comment on ATLAS-3755 at 4/29/20, 6:51 PM:
-

[~madhan] I don't think that would work for the KafkaConsumer, or does it? 
Please note that also Glossary needs to be updated to contain all system 
attributes as it is missing at least "homeId". In addition I don't think the 
risk is that high that system attributes get updated inadvertent. The default 
authorization model denies access to these attributes. Next to that it would 
require the incoming message to include those system properties. In this case 
you could argue that it should not be consumed at all if not allowed as it is a 
bad behaving client. We would like to integrate Atlas with another metadata 
system, which would actually make the occurrence much more frequent in around 
50% of the updates. Given the fact that system attributes are part of the 
vertex there is not much additional cost as far as I can see in doing this 
during entity updates.

On your second point. I'm not strongly bound to them so I can merge them. I do 
think there might be cases that you would like to allow an update but disallow 
a create.

Authorization per attribute allows end-users to edit a particular attribute 
(say description) without allowing editing of all properties of the entities. 
This is actually a very common use case as you would like users to be able to 
enrich the metadata without adjusting some core attributes or system generated 
attributes. I understand your point about performance. What I could do is to 
submit an Array in string format (e.g. "attribute1;attribute2;attribute4") and 
create a RangerArrayResourceMatcher that allows matching on 1 item which should 
resolve the issue of CPU cycles and audit logs.

What do you think?


was (Author: bolke):
[~madhan] I don't think that would work for the KafkaConsumer, or does it? 
Please note that also Glossary needs to be updated to contain all system 
attributes as it is missing at least "homeId". In addition I don't think the 
risk is that high that system attributes get updated inadvertent. The default 
authorization model denies access to these attributes. Next to that it would 
require the incoming message to include those system properties. In this case 
you could argue that it should not be consumed at all if not allowed as it is a 
bad behaving client. We would like to integrate Atlas with another metadata 
system, which would actually make the occurrence much more frequent in around 
50% of the updates. Given the fact that system attributes are part of the 
vertex there is not much additional cost as far as I can see in doing this 
during entity updates.

On your second point. I'm not strongly bound so I can merge them. I do think 
there might be cases that you would like to allow an update but disallow a 
create.

Authorization per attribute allows end-users to edit a particular attribute 
(say description) without allowing editing of all properties of the entities. 
This is actually a very common use case as you would like users to be able to 
enrich the metadata without adjusting some core attributes or system generated 
attributes. I understand your point about performance. What I could do is to 
submit an Array in string format (e.g. "attribute1;attribute2;attribute4") and 
create a RangerArrayResourceMatcher that allows matching on 1 item which should 
resolve the issue of CPU cycles and audit logs.

What do you think?

> Allow system attributes to be updated when policy allows
> 
>
> Key: ATLAS-3755
> URL: https://issues.apache.org/jira/browse/ATLAS-3755
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Critical
> Attachments: 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch
>
>
> Atlas does not operate in a isolated environment, this is one of the reasons 
> the "homeId" system attribute was introduced. Unfortunately system attributes 
> can only be updated when importing. This means any integration with other 
> services is significantly limited (Kafka, Rest API will not work). (See also 
> ATLAS-3754)
> To resolve this I propose to make it possible to update the system attributes 
> when policy allows it. This introduces new 
> AtlasPrivilege.ENTITY_UPDATE_SYSTEM_ATTRIBUTE and 
> AtlasPrivilege.ENTITY_CREATE_SYSTEM_ATTRIBUTE next to 
> AtlasPrivilege.ENTITY_UPDATE_ATTRIBUTE and 
> 

Re: Review Request 72453: ATLAS-3764 : Set default value for "atlas.graph.index.search.max-result-set-size" in ApplicationProperties

2020-04-29 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72453/#review220551
---


Fix it, then Ship it!





intg/src/main/java/org/apache/atlas/ApplicationProperties.java
Lines 359 (patched)


2147483647 => Integer.MAX_VALUE


- Madhan Neethiraj


On April 29, 2020, 6:38 p.m., Nixon Rodrigues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72453/
> ---
> 
> (Updated April 29, 2020, 6:38 p.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Nikhil Bonte, 
> and Sarath Subramanian.
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Inconsistency in search result is seen when 
> atlas.graph.index.search.max-result-set-size is set lower to max result size 
> when solr replication factor is 2. 
> To overcome this issue set default value "2147483647" for 
> "atlas.graph.index.search.max-result-set-size" property in Atlas 
> ApplicationProperties.
> 
> 
> Diffs
> -
> 
>   intg/src/main/java/org/apache/atlas/ApplicationProperties.java 1f1f3771b 
> 
> 
> Diff: https://reviews.apache.org/r/72453/diff/1/
> 
> 
> Testing
> ---
> 
> PC build :- 
> https://builds.apache.org/job/PreCommit-ATLAS-Build-Test/1857/console
> 
> 
> Scenario - 1
> 
> When atlas-application.properties is set with  
> atlas.graph.index.search.max-result-set-size=200 [inconsistency seen in 
> search result ]
> 2020-04-29 18:10:39,799 INFO  - [main:] ~ Loading 
> atlas-application.properties from 
> file:/usr/hdp/current/atlas-server/conf/atlas-application.properties 
> (ApplicationProperties:133)
> 2020-04-29 18:10:39,809 INFO  - [main:] ~ Using graphdb backend 'janus' 
> (ApplicationProperties:285)
> 2020-04-29 18:10:39,809 INFO  - [main:] ~ Using storage backend 'hbase2' 
> (ApplicationProperties:296)
> 2020-04-29 18:10:39,809 INFO  - [main:] ~ Using index backend 'solr' 
> (ApplicationProperties:307)
> 2020-04-29 18:10:39,809 INFO  - [main:] ~ Atlas is running in MODE: PROD. 
> (ApplicationProperties:311)
> 2020-04-29 18:10:39,810 INFO  - [main:] ~ Setting solr-wait-searcher property 
> 'true' (ApplicationProperties:317)
> 2020-04-29 18:10:39,810 INFO  - [main:] ~ Setting index.search.map-name 
> property 'false' (ApplicationProperties:321)
> 2020-04-29 18:10:39,810 INFO  - [main:] ~ Setting 
> atlas.graph.index.search.max-result-set-size = 200 
> (ApplicationProperties:331)(ApplicationProperties:343)
> 
> 
> Scenario 2 - 
> When custom property atlas.graph.index.search.max-result-set-size is removed 
> from Ambari [ search results are consistent in size]
> 
> 2020-04-29 18:17:01,883 INFO  - [main:] ~ Loading 
> atlas-application.properties from 
> file:/usr/hdp/current/atlas-server/conf/atlas-application.properties 
> (ApplicationProperties:133)
> 2020-04-29 18:17:01,892 INFO  - [main:] ~ Using graphdb backend 'janus' 
> (ApplicationProperties:285)
> 2020-04-29 18:17:01,892 INFO  - [main:] ~ Using storage backend 'hbase2' 
> (ApplicationProperties:296)
> 2020-04-29 18:17:01,892 INFO  - [main:] ~ Using index backend 'solr' 
> (ApplicationProperties:307)
> 2020-04-29 18:17:01,893 INFO  - [main:] ~ Atlas is running in MODE: PROD. 
> (ApplicationProperties:311)
> 2020-04-29 18:17:01,893 INFO  - [main:] ~ Setting solr-wait-searcher property 
> 'true' (ApplicationProperties:317)
> 2020-04-29 18:17:01,893 INFO  - [main:] ~ Setting index.search.map-name 
> property 'false' (ApplicationProperties:321)
> 2020-04-29 18:17:01,893 INFO  - [main:] ~ Setting 
> atlas.graph.index.search.max-result-set-size = 2147483647 
> (ApplicationProperties:331)2020-04-29 18:17:01,893 INFO  - [main:] ~ Property 
> (set to default) atlas.graph.cache.db-cache = true (ApplicationProperties:343)
> 
> 
> Thanks,
> 
> Nixon Rodrigues
> 
>



Review Request 72453: ATLAS-3764 : Set default value for "atlas.graph.index.search.max-result-set-size" in ApplicationProperties

2020-04-29 Thread Nixon Rodrigues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72453/
---

Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Nikhil Bonte, and 
Sarath Subramanian.


Repository: atlas


Description
---

Inconsistency in search result is seen when 
atlas.graph.index.search.max-result-set-size is set lower to max result size 
when solr replication factor is 2. 
To overcome this issue set default value "2147483647" for 
"atlas.graph.index.search.max-result-set-size" property in Atlas 
ApplicationProperties.


Diffs
-

  intg/src/main/java/org/apache/atlas/ApplicationProperties.java 1f1f3771b 


Diff: https://reviews.apache.org/r/72453/diff/1/


Testing
---

PC build :- 
https://builds.apache.org/job/PreCommit-ATLAS-Build-Test/1857/console


Scenario - 1

When atlas-application.properties is set with  
atlas.graph.index.search.max-result-set-size=200 [inconsistency seen in search 
result ]
2020-04-29 18:10:39,799 INFO  - [main:] ~ Loading atlas-application.properties 
from file:/usr/hdp/current/atlas-server/conf/atlas-application.properties 
(ApplicationProperties:133)
2020-04-29 18:10:39,809 INFO  - [main:] ~ Using graphdb backend 'janus' 
(ApplicationProperties:285)
2020-04-29 18:10:39,809 INFO  - [main:] ~ Using storage backend 'hbase2' 
(ApplicationProperties:296)
2020-04-29 18:10:39,809 INFO  - [main:] ~ Using index backend 'solr' 
(ApplicationProperties:307)
2020-04-29 18:10:39,809 INFO  - [main:] ~ Atlas is running in MODE: PROD. 
(ApplicationProperties:311)
2020-04-29 18:10:39,810 INFO  - [main:] ~ Setting solr-wait-searcher property 
'true' (ApplicationProperties:317)
2020-04-29 18:10:39,810 INFO  - [main:] ~ Setting index.search.map-name 
property 'false' (ApplicationProperties:321)
2020-04-29 18:10:39,810 INFO  - [main:] ~ Setting 
atlas.graph.index.search.max-result-set-size = 200 
(ApplicationProperties:331)(ApplicationProperties:343)


Scenario 2 - 
When custom property atlas.graph.index.search.max-result-set-size is removed 
from Ambari [ search results are consistent in size]

2020-04-29 18:17:01,883 INFO  - [main:] ~ Loading atlas-application.properties 
from file:/usr/hdp/current/atlas-server/conf/atlas-application.properties 
(ApplicationProperties:133)
2020-04-29 18:17:01,892 INFO  - [main:] ~ Using graphdb backend 'janus' 
(ApplicationProperties:285)
2020-04-29 18:17:01,892 INFO  - [main:] ~ Using storage backend 'hbase2' 
(ApplicationProperties:296)
2020-04-29 18:17:01,892 INFO  - [main:] ~ Using index backend 'solr' 
(ApplicationProperties:307)
2020-04-29 18:17:01,893 INFO  - [main:] ~ Atlas is running in MODE: PROD. 
(ApplicationProperties:311)
2020-04-29 18:17:01,893 INFO  - [main:] ~ Setting solr-wait-searcher property 
'true' (ApplicationProperties:317)
2020-04-29 18:17:01,893 INFO  - [main:] ~ Setting index.search.map-name 
property 'false' (ApplicationProperties:321)
2020-04-29 18:17:01,893 INFO  - [main:] ~ Setting 
atlas.graph.index.search.max-result-set-size = 2147483647 
(ApplicationProperties:331)2020-04-29 18:17:01,893 INFO  - [main:] ~ Property 
(set to default) atlas.graph.cache.db-cache = true (ApplicationProperties:343)


Thanks,

Nixon Rodrigues



Re: Review Request 72452: Efficiently Searching for Edges Between Vertices

2020-04-29 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72452/#review220550
---




graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
Lines 84 (patched)


edgeLabels => edgeLabel



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java
Lines 272 (patched)


getAdjacentEdgesByLabelCount() => getAdjacentEdgesCountByLabel()



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java
Lines 455 (patched)


getOutGoingEdgesByLabelCount() => getOutGoingEdgesCountByLabel()



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java
Lines 459 (patched)


getInComingEdgesByLabelCount() => getInComingEdgesCountByLabel()



repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
Lines 786 (patched)


value assigned in #786 is overwritten by #793 or #796 or #799. It looks 
like the intent is to skip call to graphHelper.getOutGoingEdgesByLabelCount() 
with incomingEdgeCount is 0. Consider rearragning the code for clarity and 
readability:

  AtlasEdge ret   = null;
  long  incomingEdgeCount = 
graphHelper.getInComingEdgesByLabelCount(toVertex, relationshipLabel);

  if (incomingEdgeCount > 0) {
long outgoingEdgeCount = 
graphHelper.getOutGoingEdgesByLabelCount(fromVertex, relationshipLabel);

if (outgoingEdgeCount > 0) {
  if (incomingEdgeCount < outgoingEdgeCount) {
Iterator edgesIterator = 
graphHelper.getIncomingEdgesByLabel(toVertex, relationshipLabel);

ret = getActiveEdgeFromList(fromVertex.getId(), edgesIterator);
  } else {
Iterator edgesIterator = 
graphHelper.getOutGoingEdgesByLabel(fromVertex, relationshipLabel);

ret = getActiveEdgeFromList(toVertex.getId(), edgesIterator);
  }
}
  }
  
  return ret;

Also, given we are looking an edge from 'fromVertex' to 'toVertex' - isnt 
looking for 'incomingEdgeCount > 0' enough? What is the need for 
'outgoingEdgeCount'?


- Madhan Neethiraj


On April 29, 2020, 5:51 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72452/
> ---
> 
> (Updated April 29, 2020, 5:51 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> Sarath Subramanian, and Sidharth Mishra.
> 
> 
> Bugs: ATLAS-3762
> https://issues.apache.org/jira/browse/ATLAS-3762
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Problem Definition**
> Please refer to JIRA for details.
> 
> **Updates**
> - Modifed: _AtlasJanusGraph.wrapVertices_ and _AtlasJanusGraph.wrapEdges_ : 
> Now uses genuine iterators. This reduces number of elements fetched, since 
> the search is linear.
> - New: _AtlasVertex.getEdgeCount_ fetches edge count using iterator returned 
> from _JanusVertex_. Fetching the count is effecient using stream support.
> 
> 
> Diffs
> -
> 
>   
> graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java
>  9406e26ff 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
>  eb0206271 
>   
> graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
>  fdc9fd0b5 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
> 2b8227a7e 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
>  d1c1f1255 
> 
> 
> Diff: https://reviews.apache.org/r/72452/diff/1/
> 
> 
> Testing
> ---
> 
> **Volume testing**
> High volume testing makes the edge fetching effcient. Cases where incoming 
> edges was in 1000s and outgoing edges was handful.
> 
> Memory footprint has improved since JanusGrpah caches edges and then expires 
> it. Fetching fewer edges will reduce number of items in memory.
> 
> **Pre-commit Build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1858/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Review Request 72452: Efficiently Searching for Edges Between Vertices

2020-04-29 Thread Ashutosh Mestry via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72452/
---

Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
Sarath Subramanian, and Sidharth Mishra.


Bugs: ATLAS-3762
https://issues.apache.org/jira/browse/ATLAS-3762


Repository: atlas


Description
---

**Problem Definition**
Please refer to JIRA for details.

**Updates**
- Modifed: _AtlasJanusGraph.wrapVertices_ and _AtlasJanusGraph.wrapEdges_ : Now 
uses genuine iterators. This reduces number of elements fetched, since the 
search is linear.
- New: _AtlasVertex.getEdgeCount_ fetches edge count using iterator returned 
from _JanusVertex_. Fetching the count is effecient using stream support.


Diffs
-

  
graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasVertex.java 
9406e26ff 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
 eb0206271 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusVertex.java
 fdc9fd0b5 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 
2b8227a7e 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java
 d1c1f1255 


Diff: https://reviews.apache.org/r/72452/diff/1/


Testing
---

**Volume testing**
High volume testing makes the edge fetching effcient. Cases where incoming 
edges was in 1000s and outgoing edges was handful.

Memory footprint has improved since JanusGrpah caches edges and then expires 
it. Fetching fewer edges will reduce number of items in memory.

**Pre-commit Build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1858/


Thanks,

Ashutosh Mestry



[jira] [Updated] (ATLAS-3762) Entity Creation: Improve Edges Fetch Between Vertices

2020-04-29 Thread Ashutosh Mestry (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Mestry updated ATLAS-3762:
---
Description: 
*Background*

One of the earlier commits replaced vertices and edges fetch with 
_StreamSupport.stream_. This uses _Collect(toList),_ which causes all contents 
to be fetched. 

Using this causes large amount of data to be fetched.

*Solution*

Switch to iterators that will use lazy loading.

*Edge Fetch Refactoring*

Change the _getEdge_ to iterate on smaller dataset. 

Here are the scenarios:

- _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that 
outgoing edges from _fromVertex_ will be many more than incoming edges to 
_toVertex_.

- _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This 
means that outgoing edges from _fromVertex_ will be fewer than incoming edges 
_hive_table_.

Approach:
 * Search it is a linear search, it will be more efficient to iterate over 
fewer items than more items.
 * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the 
count is 0, return NULL, since it will not result in anything being found.
 * If either of the counts is not 0, take the one with fewer elements and 
perform a search.

[~sidharthkmishra] Thanks for this simple but effective fix.

  was:
*Background*

One of the earlier commits replaced vertices and edges fetch with 
_StreamSupport.stream_. This uses _Collect(toList),_ which causes all contents 
to be fetched. 

Using this causes large amount of data to be fetched.

*Solution*

Switch to iterators that will use lazy loading.

*Minor Refactoring*

Change the _getEdge_ to iterate on smaller dataset.

[~sidharthkmishra] Thanks for this simple but effective fix.

Summary: Entity Creation: Improve Edges Fetch Between Vertices  (was: 
Entity Creation: Improve Vertices and Edges Fetch Using Genuine Iterators)

> Entity Creation: Improve Edges Fetch Between Vertices
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments: 
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with 
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all 
> contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Edge Fetch Refactoring*
> Change the _getEdge_ to iterate on smaller dataset. 
> Here are the scenarios:
> - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that 
> outgoing edges from _fromVertex_ will be many more than incoming edges to 
> _toVertex_.
> - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This 
> means that outgoing edges from _fromVertex_ will be fewer than incoming edges 
> _hive_table_.
> Approach:
>  * Search it is a linear search, it will be more efficient to iterate over 
> fewer items than more items.
>  * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the 
> count is 0, return NULL, since it will not result in anything being found.
>  * If either of the counts is not 0, take the one with fewer elements and 
> perform a search.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-3764) Set default value for "atlas.graph.index.search.max-result-set-size" in ApplicationProperties

2020-04-29 Thread Nixon Rodrigues (Jira)
Nixon Rodrigues created ATLAS-3764:
--

 Summary: Set default value for 
"atlas.graph.index.search.max-result-set-size" in ApplicationProperties
 Key: ATLAS-3764
 URL: https://issues.apache.org/jira/browse/ATLAS-3764
 Project: Atlas
  Issue Type: Improvement
  Components: atlas-intg
Affects Versions: 2.0.0
Reporter: Nixon Rodrigues
Assignee: Nixon Rodrigues
 Fix For: 2.1.0


Set Default value "2147483647" for 
"*atlas.graph.index.search.max-result-set-size*" property in Atlas 
ApplicationProperties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3762) Entity Creation: Improve Vertices and Edges Fetch Using Genuine Iterators

2020-04-29 Thread Ashutosh Mestry (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Mestry updated ATLAS-3762:
---
Attachment: ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch

> Entity Creation: Improve Vertices and Edges Fetch Using Genuine Iterators
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments: 
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with 
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all 
> contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Minor Refactoring*
> Change the _getEdge_ to iterate on smaller dataset.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3762) Entity Creation: Improve Vertices and Edges Fetch Using Genuine Iterators

2020-04-29 Thread Ashutosh Mestry (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Mestry updated ATLAS-3762:
---
Attachment: (was: ATLAS-3762-iterators-improvement.patch)

> Entity Creation: Improve Vertices and Edges Fetch Using Genuine Iterators
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments: 
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with 
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all 
> contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Minor Refactoring*
> Change the _getEdge_ to iterate on smaller dataset.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3654) Support solr in standalone (http) mode

2020-04-29 Thread Nixon Rodrigues (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095692#comment-17095692
 ] 

Nixon Rodrigues commented on ATLAS-3654:


[~dwarszawski],

Can you provide steps to install Atlas with solr in standalone (http) mode ?

How is Atlas packaged ? which profile is selected ?

> Support solr in standalone (http) mode
> --
>
> Key: ATLAS-3654
> URL: https://issues.apache.org/jira/browse/ATLAS-3654
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 3.0.0
>Reporter: Damian Warszawski
>Priority: Minor
> Attachments: ATLAS-3654.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Problem description*
> Atlas does not support running Solr in standalone(http) mode.
> *Goals*
>  It is especially useful for testing purposes to make setup as simple as 
> possible without  Zookeeper. It also enables full integration with JanusGraph 
> as it support both mode of running Solr `cloud` and `http` 
> [https://docs.janusgraph.org/index-backend/solr/]. Additional benefit is to 
> decouple hbase and solr while running embedded mode so that solr can be run 
> in embbeded mode with external hbase.
> *Proposed solution*
>  * call solr V1 API  while creating/updating request handlers in standalone 
> solr
>  * update atlas start script to enable standalone embedded solr
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3755) Allow system attributes to be updated when policy allows

2020-04-29 Thread Madhan Neethiraj (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095559#comment-17095559
 ] 

Madhan Neethiraj commented on ATLAS-3755:
-

[~bolke] - I suggest to introduce an API to update system attributes like 
homeId, isProxy, provenanceType - just as APIs to change entity 
classifications, labels and business-metadata. This is to prevent inadvertent 
update of system attributes by entity-update calls i.e. current API users don't 
explicitly provide system attribute values, but Atlas server will receive 
default values FALSE (for boolean), and 0 (for numbers). Also, updates to such 
system attributes should be infrequent, compared to entity-updates; hence it 
will help to not take the additional cost of updating these attributes during 
entity-updates.

Is it necessary to have separate permissions for create and update i.e. 
{{entity-create-system-attribute and entity-create-system-attribute}}? I 
suggest to have only one - {{entity-update-system-attribute}}.

Also, the patch introduces authorization for each attribute update. What is the 
use case for this? This can be very expensive - both in terms of CPU cycles and 
amount of audit logs generated (in Ranger). Hence I suggest sticking to 
{{entity-update}} permission to cover update to any attribute of the entity.

> Allow system attributes to be updated when policy allows
> 
>
> Key: ATLAS-3755
> URL: https://issues.apache.org/jira/browse/ATLAS-3755
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Critical
> Attachments: 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch
>
>
> Atlas does not operate in a isolated environment, this is one of the reasons 
> the "homeId" system attribute was introduced. Unfortunately system attributes 
> can only be updated when importing. This means any integration with other 
> services is significantly limited (Kafka, Rest API will not work). (See also 
> ATLAS-3754)
> To resolve this I propose to make it possible to update the system attributes 
> when policy allows it. This introduces new 
> AtlasPrivilege.ENTITY_UPDATE_SYSTEM_ATTRIBUTE and 
> AtlasPrivilege.ENTITY_CREATE_SYSTEM_ATTRIBUTE next to 
> AtlasPrivilege.ENTITY_UPDATE_ATTRIBUTE and 
> AtlasPrivilege.ENTITY_CREATE_ATTRIBUTE rather than just checking on the 
> entity level. In certain places we will then drop the requirement for an 
> import to be active as this can now happen through other channels as well.
> This allows operators to specify policies that allow granular controls over 
> attributes and system attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ATLAS-3760) Optimize FreeTextSearchProcessor to apply exclude deleted entity filter on solr side.

2020-04-29 Thread Madhan Neethiraj (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj resolved ATLAS-3760.
-
Fix Version/s: 2.1.0
   Resolution: Fixed

[~dwarszawski] - thank you for the patch. Its now committed in master and 
branch-2.0 branches.

> Optimize FreeTextSearchProcessor to apply exclude deleted entity  filter on 
> solr side.
> --
>
> Key: ATLAS-3760
> URL: https://issues.apache.org/jira/browse/ATLAS-3760
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Reporter: Damian Warszawski
>Priority: Minor
> Fix For: 2.1.0, 3.0.0
>
>
> *Problem description*
> Current implementation of FreeTextSearchProcessor applies filtering in memory 
> to exclude deleted entities.
> This introduces significant performance overhead by generating redundant 
> calls to solr index. 
> *Goals*
> Improve performance of FreeTextSearchProcessor by applying filter in solr 
> query.
> *Proposed solution*
>  * replace in-memory filtering with filter in solr query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3760) Optimize FreeTextSearchProcessor to apply exclude deleted entity filter on solr side.

2020-04-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095522#comment-17095522
 ] 

ASF subversion and git services commented on ATLAS-3760:


Commit 204275cc2d7fc520e1f9ea2b0f2ad6161af706c5 in atlas's branch 
refs/heads/master from Damian Warszawski
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=204275c ]

ATLAS-3760: optimize freetext search handling of excludeDeletedEntities flag

Signed-off-by: Madhan Neethiraj 


> Optimize FreeTextSearchProcessor to apply exclude deleted entity  filter on 
> solr side.
> --
>
> Key: ATLAS-3760
> URL: https://issues.apache.org/jira/browse/ATLAS-3760
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Reporter: Damian Warszawski
>Priority: Minor
> Fix For: 3.0.0
>
>
> *Problem description*
> Current implementation of FreeTextSearchProcessor applies filtering in memory 
> to exclude deleted entities.
> This introduces significant performance overhead by generating redundant 
> calls to solr index. 
> *Goals*
> Improve performance of FreeTextSearchProcessor by applying filter in solr 
> query.
> *Proposed solution*
>  * replace in-memory filtering with filter in solr query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3760) Optimize FreeTextSearchProcessor to apply exclude deleted entity filter on solr side.

2020-04-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095525#comment-17095525
 ] 

ASF subversion and git services commented on ATLAS-3760:


Commit 3fba80f2fc16bc87e82d2bb5a3f57e6c028a5e06 in atlas's branch 
refs/heads/branch-2.0 from Damian Warszawski
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=3fba80f ]

ATLAS-3760: optimize freetext search handling of excludeDeletedEntities flag

Signed-off-by: Madhan Neethiraj 
(cherry picked from commit 204275cc2d7fc520e1f9ea2b0f2ad6161af706c5)


> Optimize FreeTextSearchProcessor to apply exclude deleted entity  filter on 
> solr side.
> --
>
> Key: ATLAS-3760
> URL: https://issues.apache.org/jira/browse/ATLAS-3760
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Reporter: Damian Warszawski
>Priority: Minor
> Fix For: 3.0.0
>
>
> *Problem description*
> Current implementation of FreeTextSearchProcessor applies filtering in memory 
> to exclude deleted entities.
> This introduces significant performance overhead by generating redundant 
> calls to solr index. 
> *Goals*
> Improve performance of FreeTextSearchProcessor by applying filter in solr 
> query.
> *Proposed solution*
>  * replace in-memory filtering with filter in solr query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3366) UI: Quick Search dropdown entry icon does not match the fallback icon for that entry

2020-04-29 Thread Keval Bhatt (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keval Bhatt updated ATLAS-3366:
---
Fix Version/s: 3.0.0
   2.1.0

> UI: Quick Search dropdown entry icon does not match the fallback icon for 
> that entry
> 
>
> Key: ATLAS-3366
> URL: https://issues.apache.org/jira/browse/ATLAS-3366
> Project: Atlas
>  Issue Type: Bug
>Reporter: Rahul Kurup
>Assignee: Keval Bhatt
>Priority: Minor
> Fix For: 2.1.0, 3.0.0
>
> Attachments: fallbackquicksearchissue.png
>
>
> When the fallback icon is triggered for a particular entity type, the icon 
> does not change to that fallback icon for that entity in the quick search 
> dropdown. See below screenshot.
>  !fallbackquicksearchissue.png|height=250,width=550! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3578) UI: All types are visible again after a search is complete using a Supertype filter

2020-04-29 Thread Keval Bhatt (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keval Bhatt updated ATLAS-3578:
---
Fix Version/s: 3.0.0
   2.1.0

> UI: All types are visible again after a search is complete using a Supertype 
> filter
> ---
>
> Key: ATLAS-3578
> URL: https://issues.apache.org/jira/browse/ATLAS-3578
> Project: Atlas
>  Issue Type: Bug
>Reporter: Rahul Kurup
>Assignee: Keval Bhatt
>Priority: Minor
> Fix For: 2.1.0, 3.0.0
>
> Attachments: supertypeissue.gif
>
>
> Atlas has a super-type filter display feature as shown in the screenshot. The 
> issue with this feature is that after a search is executed with this filter, 
> the default types are visible in the search drop-down field instead of the 
> types filtered by the super-type.
> !supertypeissue.gif|width=484,height=272!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ATLAS-3578) UI: All types are visible again after a search is complete using a Supertype filter

2020-04-29 Thread Keval Bhatt (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keval Bhatt resolved ATLAS-3578.

Resolution: Fixed

> UI: All types are visible again after a search is complete using a Supertype 
> filter
> ---
>
> Key: ATLAS-3578
> URL: https://issues.apache.org/jira/browse/ATLAS-3578
> Project: Atlas
>  Issue Type: Bug
>Reporter: Rahul Kurup
>Assignee: Keval Bhatt
>Priority: Minor
> Fix For: 2.1.0, 3.0.0
>
> Attachments: supertypeissue.gif
>
>
> Atlas has a super-type filter display feature as shown in the screenshot. The 
> issue with this feature is that after a search is executed with this filter, 
> the default types are visible in the search drop-down field instead of the 
> types filtered by the super-type.
> !supertypeissue.gif|width=484,height=272!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [atlas] HorizonNet commented on pull request #84: Fix typo in authentication docs

2020-04-29 Thread GitBox


HorizonNet commented on pull request #84:
URL: https://github.com/apache/atlas/pull/84#issuecomment-621228056


   Also noticed this one. @tartina You'll need a Jira first related to this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Review Request 72450: ATLAS-3763 Add "serviceType" in AtlasEntityHeader

2020-04-29 Thread Mandar Ambawane

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72450/
---

Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nixon Rodrigues, 
and Sarath Subramanian.


Bugs: ATLAS-3763
https://issues.apache.org/jira/browse/ATLAS-3763


Repository: atlas


Description
---

Added "serviceType" in AtlasEntityHeader, so from UI 
https://issues.apache.org/jira/browse/ATLAS-3366 this can be handled


Diffs
-

  intg/src/main/java/org/apache/atlas/model/instance/AtlasEntityHeader.java 
7d2476a 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
 757fcb1 


Diff: https://reviews.apache.org/r/72450/diff/1/


Testing
---

Pre-commit: 
https://builds.apache.org/job/PreCommit-ATLAS-Build-Test/1856/console


Thanks,

Mandar Ambawane



[jira] [Created] (ATLAS-3763) Add "serviceType" in AtlasEntityHeader

2020-04-29 Thread Mandar Ambawane (Jira)
Mandar Ambawane created ATLAS-3763:
--

 Summary: Add "serviceType" in AtlasEntityHeader
 Key: ATLAS-3763
 URL: https://issues.apache.org/jira/browse/ATLAS-3763
 Project: Atlas
  Issue Type: Bug
Reporter: Mandar Ambawane
Assignee: Mandar Ambawane






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72441: Support solr in standalone (http) mode

2020-04-29 Thread Damian Warszawski

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72441/
---

(Updated April 29, 2020, 10:32 a.m.)


Review request for atlas, Ashutosh Mestry, Bolke de Bruin, madhan, and Sarath 
Subramanian.


Changes
---

Added reference to jira.


Repository: atlas


Description (updated)
---

Atlas does not support running Solr in standalone(http) mode.

It is especially useful for testing purposes to make setup as simple as 
possible without Zookeeper. It also enables full integration with JanusGraph as 
it support both mode of running Solr `cloud` and `http` 
https://docs.janusgraph.org/index-backend/solr/. Additional benefit is to 
decouple hbase and solr while running embedded mode so that solr can be run in 
embbeded mode with external hbase.

Proposed solution

call solr V1 API  while creating/updating request handlers in standalone solr
update atlas start script to enable standalone embedded solr

Reference to jira https://issues.apache.org/jira/browse/ATLAS-3654
Patch was applied against master branch


Diffs
-

  distro/src/bin/atlas_config.py f09026ff9 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraphIndexClient.java
 ba65f3d00 
  graphdb/janus/src/main/java/org/janusgraph/diskstorage/solr/Solr6Index.java 
484c161f0 


Diff: https://reviews.apache.org/r/72441/diff/1/


Testing
---

Patch was applied and verified on our dev env with embedded solr and external 
hbase.


Thanks,

Damian Warszawski



Re: Review Request 72440: Support sort params for FreeTextSearchProcessor

2020-04-29 Thread Damian Warszawski

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72440/
---

(Updated April 29, 2020, 10:26 a.m.)


Review request for atlas, Ashutosh Mestry, Bolke de Bruin, Madhan Neethiraj, 
and Sarath Subramanian.


Changes
---

added reference to jira


Repository: atlas


Description (updated)
---

No way to sort results by specified attribute while freetext search is enabled. 
In our case we would like to enforce ordering by introducing custom attribute 
definition e.g. popularity score from 
https://github.com/dwarszawski/amundsen-atlas-types/blob/master/amundsenatlastypes/schema/01_2_table_schema.json


Reference to jira https://issues.apache.org/jira/browse/ATLAS-3758
Patched applied against master branch.


Diffs
-

  
repository/src/main/java/org/apache/atlas/discovery/EntitySearchProcessor.java 
1a7bf6b16 
  
repository/src/main/java/org/apache/atlas/discovery/FreeTextSearchProcessor.java
 92b5eb4d2 
  repository/src/main/java/org/apache/atlas/discovery/SearchProcessor.java 
11eb7ca49 


Diff: https://reviews.apache.org/r/72440/diff/1/


Testing
---

Patch was applied on our dev env with custom entity definitions and 
successfully verified if order is applied as specified in the search query.


Thanks,

Damian Warszawski



Re: Review Request 72446: Optimize FreeTextSearchProcessor to apply exclude deleted entity filter on solr side.

2020-04-29 Thread Damian Warszawski

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72446/
---

(Updated April 29, 2020, 10:14 a.m.)


Review request for atlas, Ashutosh Mestry, Bolke de Bruin, madhan, and Sarath 
Subramanian.


Changes
---

reference to jira in description


Repository: atlas


Description
---

Current implementation of FreeTextSearchProcessor applies filtering in memory 
to exclude deleted entities. This introduces significant performance overhead 
by generating redundant calls to solr index. The goal is to improve performance 
of FreeTextSearchProcessor by applying filter in solr query.


Reference to jira https://issues.apache.org/jira/browse/ATLAS-3760


Diffs
-

  
repository/src/main/java/org/apache/atlas/discovery/FreeTextSearchProcessor.java
 92b5eb4d2 


Diff: https://reviews.apache.org/r/72446/diff/1/


Testing (updated)
---

Verified on our dev env and achieved 10x faster response for simple call to 
atlas basic search with over 50k entities.

Patch applied on top of master branch.


Thanks,

Damian Warszawski



Re: Review Request 72446: Optimize FreeTextSearchProcessor to apply exclude deleted entity filter on solr side.

2020-04-29 Thread Damian Warszawski

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72446/
---

(Updated April 29, 2020, 10:12 a.m.)


Review request for atlas, Ashutosh Mestry, Bolke de Bruin, madhan, and Sarath 
Subramanian.


Repository: atlas


Description (updated)
---

Current implementation of FreeTextSearchProcessor applies filtering in memory 
to exclude deleted entities. This introduces significant performance overhead 
by generating redundant calls to solr index. The goal is to improve performance 
of FreeTextSearchProcessor by applying filter in solr query.


Reference to jira https://issues.apache.org/jira/browse/ATLAS-3760


Diffs
-

  
repository/src/main/java/org/apache/atlas/discovery/FreeTextSearchProcessor.java
 92b5eb4d2 


Diff: https://reviews.apache.org/r/72446/diff/1/


Testing
---

Verified on our dev env and achieved 10x faster response for simple call to 
atlas basic search with over 50k entities.


Thanks,

Damian Warszawski



[jira] [Updated] (ATLAS-3755) Allow system attributes to be updated when policy allows

2020-04-29 Thread Bolke de Bruin (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated ATLAS-3755:
--
Attachment: 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch

> Allow system attributes to be updated when policy allows
> 
>
> Key: ATLAS-3755
> URL: https://issues.apache.org/jira/browse/ATLAS-3755
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Critical
> Attachments: 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch, 
> 0001-ATLAS-3755-Allow-system-attributes-to-be-updated-by-.patch
>
>
> Atlas does not operate in a isolated environment, this is one of the reasons 
> the "homeId" system attribute was introduced. Unfortunately system attributes 
> can only be updated when importing. This means any integration with other 
> services is significantly limited (Kafka, Rest API will not work). (See also 
> ATLAS-3754)
> To resolve this I propose to make it possible to update the system attributes 
> when policy allows it. This introduces new 
> AtlasPrivilege.ENTITY_UPDATE_SYSTEM_ATTRIBUTE and 
> AtlasPrivilege.ENTITY_CREATE_SYSTEM_ATTRIBUTE next to 
> AtlasPrivilege.ENTITY_UPDATE_ATTRIBUTE and 
> AtlasPrivilege.ENTITY_CREATE_ATTRIBUTE rather than just checking on the 
> entity level. In certain places we will then drop the requirement for an 
> import to be active as this can now happen through other channels as well.
> This allows operators to specify policies that allow granular controls over 
> attributes and system attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72438: Allow system attributes to be updated when policy allows

2020-04-29 Thread Bolke de Bruin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72438/
---

(Updated April 29, 2020, 9:49 a.m.)


Review request for atlas, Ashutosh Mestry, Bolke de Bruin, Madhan Neethiraj, 
Nixon Rodrigues, and Sarath Subramanian.


Changes
---

Fix some minor issues


Bugs: ATLAS-3755
https://issues.apache.org/jira/browse/ATLAS-3755


Repository: atlas


Description
---

Atlas does not operate in a isolated environment, this is one of the reasons 
the "homeId" system attribute was introduced. Unfortunately system attributes 
can only be updated when importing. This means any integration with other 
services is significantly limited (Kafka, Rest API will not work). (See also 
ATLAS-3754)
To resolve this I propose to make it possible to update the system attributes 
when policy allows it. This introduces new 
AtlasPrivilege.ENTITY_UPDATE_SYSTEM_ATTRIBUTE and 
AtlasPrivilege.ENTITY_CREATE_SYSTEM_ATTRIBUTE next to 
AtlasPrivilege.ENTITY_UPDATE_ATTRIBUTE and 
AtlasPrivilege.ENTITY_CREATE_ATTRIBUTE rather than just checking on the entity 
level. In certain places we will then drop the requirement for an import to be 
active as this can now happen through other channels as well.
This allows operators to specify policies that allow granular controls over 
attributes and system attributes.


Diffs (updated)
-

  authorization/src/main/java/org/apache/atlas/authorize/AtlasPrivilege.java 
7287b3dd7 
  authorization/src/main/resources/atlas-simple-authz-policy.json 6b2001279 
  intg/src/main/java/org/apache/atlas/type/AtlasEntityType.java 3962c3c42 
  intg/src/main/java/org/apache/atlas/type/Constants.java 3fc13056e 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java
 379150b7b 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
 36bee301d 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityMutationContext.java
 deb743eea 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/IDBasedEntityResolver.java
 3b9694851 
  
repository/src/test/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2Test.java
 38228a8ec 


Diff: https://reviews.apache.org/r/72438/diff/3/

Changes: https://reviews.apache.org/r/72438/diff/2-3/


Testing
---

- Manually tested
- Unit test updated


Thanks,

Bolke de Bruin