was harder to match to each
file in one big PCollection.
Reinier
From: Aljoscha Krettek <aljos...@apache.org>
Sent: 13 March 2018 18:29:52
To: user@beam.apache.org
Subject: Re: HDFS data locality and distribution, Flink
Hi,
There should be no data-locality awa
Mar 2018, at 05:50, Reinier Kip <r...@bol.com> wrote:
>
> Relevant versions: Beam 2.1, Flink 1.3.
> From: Reinier Kip <r...@bol.com>
> Sent: 12 March 2018 13:46:24
> To: user@beam.apache.org
> Subject: HDFS data locality and distribution, Flink
>
> Hey all,
Relevant versions: Beam 2.1, Flink 1.3.
From: Reinier Kip <r...@bol.com>
Sent: 12 March 2018 13:46:24
To: user@beam.apache.org
Subject: HDFS data locality and distribution, Flink
Hey all,
I'm trying to batch-process 30-ish files from HDFS, but I see tha
Hey all,
I'm trying to batch-process 30-ish files from HDFS, but I see that data is
distributed very badly across slots. 4 out of 32 slots get 4/5ths of the data,
another 3 slots get about 1/5th and a last slot just a few records. This
probably triggers disk spillover on these slots and slows