[ 
https://issues.apache.org/jira/browse/SPARK-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Dave resolved SPARK-1988.
-------------------------------

    Resolution: Fixed

This is mitigated by SPARK-1991, because the user can increase the number of 
edge partitions so that each edge partition individually fits in memory, then 
set the storage level of the edges to MEMORY_AND_DISK.

> Enable storing edges out-of-core
> --------------------------------
>
>                 Key: SPARK-1988
>                 URL: https://issues.apache.org/jira/browse/SPARK-1988
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>            Reporter: Ankur Dave
>            Assignee: Ankur Dave
>            Priority: Minor
>
> A graph's edges are usually the largest component of the graph, and a cluster 
> may not have enough memory to hold them. For example, a graph with 20 billion 
> edges requires at least 400 GB of memory, because each edge takes 20 bytes.
> GraphX only ever accesses the edges using full table scans or cluster scans 
> using the clustered index on source vertex ID. The edges are therefore 
> amenable to being stored on disk. EdgePartition should provide the option of 
> storing edges on disk transparently and streaming through them as needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to