o that with Beam.
On 29.09.2017 16:03, Reinier Kip wrote:
Hi all,
I'm running a Beam pipeline on Flink and sending metrics via the Graphite
reporter. I get repeated exceptions on the slaves, which try to register the
same metric multiple times. Jobmanager and taskmanager data is fine: I ca
Hi all,
I'm running a Beam pipeline on Flink and sending metrics via the Graphite
reporter. I get repeated exceptions on the slaves, which try to register the
same metric multiple times. Jobmanager and taskmanager data is fine: I can see
JVM stuff, but only one datapoint here and there for
Hi all,
I’ve been running a Beam pipeline on Flink. Depending on the dataset size and
the heap memory configuration of the jobmanager and taskmanager, I may run into
an EOFException, which causes the job to fail. You will find the stacktrace
near the bottom of this post (data censored).
I
Relevant versions: Beam 2.1, Flink 1.3.
From: Reinier Kip <r...@bol.com>
Sent: 12 March 2018 13:45:47
To: user@flink.apache.org
Subject: HDFS data locality and distribution
Hey all,
I'm trying to batch-process 30-ish files from HDFS, but I see tha
Hey all,
I'm trying to batch-process 30-ish files from HDFS, but I see that data is
distributed very badly across slots. 4 out of 32 slots get 4/5ths of the data,
another 3 slots get about 1/5th and a last slot just a few records. This
probably triggers disk spillover on these slots and slows
a is
distributed to each node, i.e. if 80% of your data has the same key (or rather
hash), they will all end up on the same node.
On 12.03.2018 13:49, Reinier Kip wrote:
Relevant versions: Beam 2.1, Flink 1.3.
____________
From: Reinier Kip <r...@bol.com><mailto:r...@