I am trying to implement an application that requires the output to be
aggregated and stored as a single txt file to HDFS (instead of, for
instance, having 4 different txt files coming from my 4 workers).
The solution I used does the trick, but I can't tell if it's ok to
regularly stress one of
Hi,
thanks for answering.
With the *coalesce() *transformation a single worker is in charge of
writing to HDFS, but I noticed that the single write operation usually
takes too much time, slowing down the whole computation (this is
particularly true when 'unified' is made of several partitions).
Having the driver write the data instead of a worker probably won't spread
it up, you still need to copy all of the data to a single node. Is there
something which forces you to only write from a single node?
On Friday, September 11, 2015, Luca wrote:
> Hi,
> thanks for