From: Chesnay Schepler <ches...@apache.org>
Sent: 13 March 2018 12:40:02
To: user@flink.apache.org
Subject: Re: HDFS data locality and distribution
Hello,
You said that "data is distributed very badly across slots"; do you mean that
only a small number of subtasks is reading fro
9, Reinier Kip wrote:
Relevant versions: Beam 2.1, Flink 1.3.
*From:* Reinier Kip <r...@bol.com>
*Sent:* 12 March 2018 13:45:47
*To:* user@flink.apache.org
*Subject:* HDFS data locality and distribution
Hey all,
I'm
Relevant versions: Beam 2.1, Flink 1.3.
From: Reinier Kip <r...@bol.com>
Sent: 12 March 2018 13:45:47
To: user@flink.apache.org
Subject: HDFS data locality and distribution
Hey all,
I'm trying to batch-process 30-ish files from HDFS, but I see tha
Hey all,
I'm trying to batch-process 30-ish files from HDFS, but I see that data is
distributed very badly across slots. 4 out of 32 slots get 4/5ths of the data,
another 3 slots get about 1/5th and a last slot just a few records. This
probably triggers disk spillover on these slots and slows