RE: Question about GraphX connected-components

2015-10-12 Thread John Lilley
Geoff Thompson <geoff.thomp...@redpoint.net> Subject: Re: Question about GraphX connected-components let's start from some basics: might be u need to split your data into more partitions? spilling depends on your configuration when you create graph(look for storage level param) and

Re: Question about GraphX connected-components

2015-10-10 Thread Igor Berman
let's start from some basics: might be u need to split your data into more partitions? spilling depends on your configuration when you create graph(look for storage level param) and your global configuration. in addition, you assumption of 64GB/100M is probably wrong, since spark divides memory

Question about GraphX connected-components

2015-10-09 Thread John Lilley
Greetings, We are looking into using the GraphX connected-components algorithm on Hadoop for grouping operations. Our typical data is on the order of 50-200M vertices with an edge:vertex ratio between 2 and 30. While there are pathological cases of very large groups, they tend to be small. I