Another option if the file is small enough is to load it in the driver and directly initialize an in-memory source (env.fromElements).
On Tue, Jun 23, 2020 at 9:57 PM Vishwas Siravara <[email protected]> wrote: > Thanks that makes sense. > > On Tue, Jun 23, 2020 at 2:13 PM Laurent Exsteens < > [email protected]> wrote: > >> Hi Nick, >> >> On a project I worked on, we simply made the file accessible on a shared >> NFS drive. >> Our source was custom, and we forced it to parallelism 1 inside the job, >> so the file wouldn't be read multiple times. The rest of the job was >> distributed. >> This was also on a standalone cluster. On a resource managed cluster I >> guess the resource manager could take care of copying the file for us. >> >> Hope this can help. If there would have been a better solution, I'm also >> happy to hear it :). >> >> Regards, >> >> Laurent. >> >> >> On Tue, Jun 23, 2020, 20:51 Nick Bendtner <[email protected]> wrote: >> >>> Hi guys, >>> What is the best way to process a file from a unix file system since >>> there is no guarantee as to which task manager will be assigned to process >>> the file. We run flink in standalone mode. We currently follow the brute >>> force way in which we copy the file to every task manager, is there a >>> better way to do this ? >>> >>> >>> Best, >>> Nick. >>> >> >> ♻ Be green, keep it on the screen > > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng
