[jira] [Commented] (ATLAS-3762) Entity Creation: Improve Edges Fetch Between Vertices

2020-06-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124289#comment-17124289
 ] 

ASF subversion and git services commented on ATLAS-3762:


Commit fe5fa7f56e2d9872a199dbb872b642e74ae33bef in atlas's branch 
refs/heads/branch-2.0 from Ashutosh Mestry
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=fe5fa7f ]

ATLAS-3762: Improve edge creation using genuine iterator. Part 2

(cherry picked from commit 23aea76fff6c8f530816319649218583f4dfd091)


> Entity Creation: Improve Edges Fetch Between Vertices
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments: ATLAS-3762-Edge-fetch-improvement-gremlin.patch, 
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with 
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all 
> contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Edge Fetch Refactoring*
> Change the _getEdge_ to iterate on smaller dataset. 
> Here are the scenarios:
> - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that 
> outgoing edges from _fromVertex_ will be many more than incoming edges to 
> _toVertex_.
> - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This 
> means that outgoing edges from _fromVertex_ will be fewer than incoming edges 
> _hive_table_.
> Approach:
>  * Search it is a linear search, it will be more efficient to iterate over 
> fewer items than more items.
>  * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the 
> count is 0, return NULL, since it will not result in anything being found.
>  * If either of the counts is not 0, take the one with fewer elements and 
> perform a search.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3762) Entity Creation: Improve Edges Fetch Between Vertices

2020-06-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124288#comment-17124288
 ] 

ASF subversion and git services commented on ATLAS-3762:


Commit 43c9e32c89263a8e98c98dccc47f6b5ae7a3ceba in atlas's branch 
refs/heads/branch-2.0 from Ashutosh Mestry
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=43c9e32 ]

ATLAS-3762: Improve Edge creator using Genuine iterator.

(cherry picked from commit 25f3002e0e84927eb39cebb5708d77ef81755d79)


> Entity Creation: Improve Edges Fetch Between Vertices
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments: ATLAS-3762-Edge-fetch-improvement-gremlin.patch, 
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with 
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all 
> contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Edge Fetch Refactoring*
> Change the _getEdge_ to iterate on smaller dataset. 
> Here are the scenarios:
> - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that 
> outgoing edges from _fromVertex_ will be many more than incoming edges to 
> _toVertex_.
> - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This 
> means that outgoing edges from _fromVertex_ will be fewer than incoming edges 
> _hive_table_.
> Approach:
>  * Search it is a linear search, it will be more efficient to iterate over 
> fewer items than more items.
>  * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the 
> count is 0, return NULL, since it will not result in anything being found.
>  * If either of the counts is not 0, take the one with fewer elements and 
> perform a search.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3762) Entity Creation: Improve Edges Fetch Between Vertices

2020-05-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100060#comment-17100060
 ] 

ASF subversion and git services commented on ATLAS-3762:


Commit 23aea76fff6c8f530816319649218583f4dfd091 in atlas's branch 
refs/heads/master from Ashutosh Mestry
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=23aea76 ]

ATLAS-3762: Improve edge creation using genuine iterator. Part 2


> Entity Creation: Improve Edges Fetch Between Vertices
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments: ATLAS-3762-Edge-fetch-improvement-gremlin.patch, 
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with 
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all 
> contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Edge Fetch Refactoring*
> Change the _getEdge_ to iterate on smaller dataset. 
> Here are the scenarios:
> - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that 
> outgoing edges from _fromVertex_ will be many more than incoming edges to 
> _toVertex_.
> - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This 
> means that outgoing edges from _fromVertex_ will be fewer than incoming edges 
> _hive_table_.
> Approach:
>  * Search it is a linear search, it will be more efficient to iterate over 
> fewer items than more items.
>  * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the 
> count is 0, return NULL, since it will not result in anything being found.
>  * If either of the counts is not 0, take the one with fewer elements and 
> perform a search.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3762) Entity Creation: Improve Edges Fetch Between Vertices

2020-04-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096104#comment-17096104
 ] 

ASF subversion and git services commented on ATLAS-3762:


Commit 25f3002e0e84927eb39cebb5708d77ef81755d79 in atlas's branch 
refs/heads/master from Ashutosh Mestry
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=25f3002 ]

ATLAS-3762: Improve Edge creator using Genuine iterator.


> Entity Creation: Improve Edges Fetch Between Vertices
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments: 
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with 
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all 
> contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Edge Fetch Refactoring*
> Change the _getEdge_ to iterate on smaller dataset. 
> Here are the scenarios:
> - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that 
> outgoing edges from _fromVertex_ will be many more than incoming edges to 
> _toVertex_.
> - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This 
> means that outgoing edges from _fromVertex_ will be fewer than incoming edges 
> _hive_table_.
> Approach:
>  * Search it is a linear search, it will be more efficient to iterate over 
> fewer items than more items.
>  * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the 
> count is 0, return NULL, since it will not result in anything being found.
>  * If either of the counts is not 0, take the one with fewer elements and 
> perform a search.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)