How large is your graph, and how much memory does your cluster have? We don't have a good way to determine the *optimal* number of partitions aside from trial and error, but to get the job to at least run to completion, it might help to use the MEMORY_AND_DISK storage level and a large number of partitions.
Ankur <http://www.ankurdave.com/> On Sat, Nov 1, 2014 at 10:57 PM, James <alcaid1...@gmail.com> wrote: > Hello, > > I am trying to run Connected Component algorithm on a very big graph. In > practice I found that a small number of partition size would lead to OOM, > while a large number would cause various time out exceptions. Thus I wonder > how to estimate the number of partition of a graph in GraphX? > > Alcaid >