Re: Distributing a FlatMap across a Spark Cluster

2021-06-09 Thread Chris Martin
One thing I would check is this line: val fetchedRdd = rdd.map(r => (r.getGroup, r)) how many distinct groups do you ended up with? If there's just one then I think you might see the behaviour you observe. Chris On Wed, Jun 9, 2021 at 4:17 PM Tom Barber wrote: > Also just to follow up on

Re: Distributing a FlatMap across a Spark Cluster

2021-06-09 Thread Chris Martin
ote: > Yeah to test that I just set the group key to the ID in the record which > is a solr supplied UUID, which means effectively you end up with 4000 > groups now. > > On Wed, Jun 9, 2021 at 5:13 PM Chris Martin wrote: > >> One thing I would check is this line: >&g