Hello, We get a graph with 100B edges of nearly 800GB in gz format. We have 80 machines, each one has 60GB memory. I have not ever seen the program run to completion.
Alcaid 2014-11-02 14:06 GMT+08:00 Ankur Dave <ankurd...@gmail.com>: > How large is your graph, and how much memory does your cluster have? > > We don't have a good way to determine the *optimal* number of partitions > aside from trial and error, but to get the job to at least run to > completion, it might help to use the MEMORY_AND_DISK storage level and a > large number of partitions. > > Ankur <http://www.ankurdave.com/> > > On Sat, Nov 1, 2014 at 10:57 PM, James <alcaid1...@gmail.com> wrote: > >> Hello, >> >> I am trying to run Connected Component algorithm on a very big graph. In >> practice I found that a small number of partition size would lead to OOM, >> while a large number would cause various time out exceptions. Thus I wonder >> how to estimate the number of partition of a graph in GraphX? >> >> Alcaid >> > >