Not sure what would slow it down as the repartition completes equally fast
on all nodes, implying that the data is available on all, then there are a
few computation steps none of them local on the master.
On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen wrote:
> What machines are HDFS data nodes --
other 2
nodes
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, April 20, 2015 12:57 PM
To: jamborta
Cc: user@spark.apache.org
Subject: Re: writing to hdfs on master node much faster
What machines are HDFS data nodes -- just your master? that would explain it
What machines are HDFS data nodes -- just your master? that would
explain it. Otherwise, is it actually the write that's slow or is
something else you're doing much faster on the master for other
reasons maybe? like you're actually shipping data via the master first
in some local computation? so th