Hi Maximilian,Slots per TM is = 512. For 4 nodes, that adds up to be 2048 in
the cluster.I set Parallelism(2048) in the code.Like shown below, all have
been consumed and 0 is remained.Before I start my Beam application, the
Available Task Slots = 2048 as well.Once started, it goes down to 0 implying
all 4 nodes are now consumed and are processing data.Its my understanding at
least.Cheers
From: Maximilian Michels <[email protected]>
To: [email protected]
Sent: Wednesday, October 5, 2016 7:43 AM
Subject: Re: Regarding TextIO()
You have probably configured a parallelism of 1. Try using `-p 2048` to run
with a parallelism of 2048.
On Wed, Oct 5, 2016 at 7:43 AM, amir bahmanyari <[email protected]> wrote:
Thanks JB,I copied the data file in all nodes. Applied TextIo() in src code.
Started the Flink cluster & deployed the beam fat file to it. Interesting
enough, the dashboard shows thatout of 2048 slots I hav configured for
parallelism, only ONE slot is being used in Flink cluster.This is not true when
i use KafkaIO(). All 2040 get consumed like below.Have a great eve.
From: Jean-Baptiste Onofré <[email protected]>
To: [email protected]
Cc: [email protected]
Sent: Tuesday, October 4, 2016 10:29 PM
Subject: Re: Regarding TextIO()
HiAll depends of the filesystem you are using. If you want all nodes share the
same files then you need a shared filesystem or distributed filesystem like
hdfs (not yet supported by textio) or gs (supported by textio). If you use file
(local filesystem) then the files will be local to each node.Regards
JBOn Oct 5, 2016, at 02:12, amir bahmanyari <[email protected]> wrote:
Hi Colleagues,When you are using TextIO() in a Beam (java) app executing in a
Flink Cluster, do you have to have a copy of the file in every Flink cluster
node?Also, if you want to read the file from one node (probably remote node)
only, how would the TextIO.Read.from("....") look like?What are the best
TextIO() practices in situations like this?Thanks+regards,Amir