[
https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Mestry updated ATLAS-3762:
---
Description:
*Background*
One of the earlier commits replaced vertices and edges fetch with
_StreamSupport.stream_. This uses _Collect(toList),_ which causes all contents
to be fetched.
Using this causes large amount of data to be fetched.
*Solution*
Switch to iterators that will use lazy loading.
*Edge Fetch Refactoring*
Change the _getEdge_ to iterate on smaller dataset.
Here are the scenarios:
- _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that
outgoing edges from _fromVertex_ will be many more than incoming edges to
_toVertex_.
- _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This
means that outgoing edges from _fromVertex_ will be fewer than incoming edges
_hive_table_.
Approach:
* Search it is a linear search, it will be more efficient to iterate over
fewer items than more items.
* Fetch count edge items for _fromVertex_ and _toVertex_. If either of the
count is 0, return NULL, since it will not result in anything being found.
* If either of the counts is not 0, take the one with fewer elements and
perform a search.
[~sidharthkmishra] Thanks for this simple but effective fix.
was:
*Background*
One of the earlier commits replaced vertices and edges fetch with
_StreamSupport.stream_. This uses _Collect(toList),_ which causes all contents
to be fetched.
Using this causes large amount of data to be fetched.
*Solution*
Switch to iterators that will use lazy loading.
*Minor Refactoring*
Change the _getEdge_ to iterate on smaller dataset.
[~sidharthkmishra] Thanks for this simple but effective fix.
Summary: Entity Creation: Improve Edges Fetch Between Vertices (was:
Entity Creation: Improve Vertices and Edges Fetch Using Genuine Iterators)
> Entity Creation: Improve Edges Fetch Between Vertices
> -
>
> Key: ATLAS-3762
> URL: https://issues.apache.org/jira/browse/ATLAS-3762
> Project: Atlas
> Issue Type: Improvement
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Attachments:
> ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with
> _StreamSupport.stream_. This uses _Collect(toList),_ which causes all
> contents to be fetched.
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Edge Fetch Refactoring*
> Change the _getEdge_ to iterate on smaller dataset.
> Here are the scenarios:
> - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that
> outgoing edges from _fromVertex_ will be many more than incoming edges to
> _toVertex_.
> - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This
> means that outgoing edges from _fromVertex_ will be fewer than incoming edges
> _hive_table_.
> Approach:
> * Search it is a linear search, it will be more efficient to iterate over
> fewer items than more items.
> * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the
> count is 0, return NULL, since it will not result in anything being found.
> * If either of the counts is not 0, take the one with fewer elements and
> perform a search.
> [~sidharthkmishra] Thanks for this simple but effective fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)