[ https://issues.apache.org/jira/browse/SPARK-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ankur Dave resolved SPARK-1988. ------------------------------- Resolution: Fixed This is mitigated by SPARK-1991, because the user can increase the number of edge partitions so that each edge partition individually fits in memory, then set the storage level of the edges to MEMORY_AND_DISK. > Enable storing edges out-of-core > -------------------------------- > > Key: SPARK-1988 > URL: https://issues.apache.org/jira/browse/SPARK-1988 > Project: Spark > Issue Type: Improvement > Components: GraphX > Reporter: Ankur Dave > Assignee: Ankur Dave > Priority: Minor > > A graph's edges are usually the largest component of the graph, and a cluster > may not have enough memory to hold them. For example, a graph with 20 billion > edges requires at least 400 GB of memory, because each edge takes 20 bytes. > GraphX only ever accesses the edges using full table scans or cluster scans > using the clustered index on source vertex ID. The edges are therefore > amenable to being stored on disk. EdgePartition should provide the option of > storing edges on disk transparently and streaming through them as needed. -- This message was sent by Atlassian JIRA (v6.2#6252)