Geoff Thompson <geoff.thomp...@redpoint.net>
Subject: Re: Question about GraphX connected-components
let's start from some basics: might be u need to split your data into more
partitions?
spilling depends on your configuration when you create graph(look for storage
level param) and
let's start from some basics: might be u need to split your data into more
partitions?
spilling depends on your configuration when you create graph(look for
storage level param) and your global configuration.
in addition, you assumption of 64GB/100M is probably wrong, since spark
divides memory
Greetings,
We are looking into using the GraphX connected-components algorithm on Hadoop
for grouping operations. Our typical data is on the order of 50-200M vertices
with an edge:vertex ratio between 2 and 30. While there are pathological cases
of very large groups, they tend to be small. I