Re: Non parallel file sources

Arvid Heise Wed, 24 Jun 2020 13:02:24 -0700

Another option if the file is small enough is to load it in the driver and
directly initialize an in-memory source (env.fromElements).


On Tue, Jun 23, 2020 at 9:57 PM Vishwas Siravara <[email protected]>
wrote:

> Thanks that makes sense.
>
> On Tue, Jun 23, 2020 at 2:13 PM Laurent Exsteens <
> [email protected]> wrote:
>
>> Hi Nick,
>>
>> On a project I worked on, we simply made the file accessible on a shared
>> NFS drive.
>> Our source was custom, and we forced it to parallelism 1 inside the job,
>> so the file wouldn't be read multiple times. The rest of the job was
>> distributed.
>> This was also on a standalone cluster. On a resource managed cluster I
>> guess the resource manager could take care of copying the file for us.
>>
>> Hope this can help. If there would have been a better solution, I'm also
>> happy to hear it :).
>>
>> Regards,
>>
>> Laurent.
>>
>>
>> On Tue, Jun 23, 2020, 20:51 Nick Bendtner <[email protected]> wrote:
>>
>>> Hi guys,
>>> What is the best way to process a file from a unix file system since
>>> there is no guarantee as to which task manager will be assigned to process
>>> the file. We run flink in standalone mode. We currently follow the brute
>>> force way in which we copy the file to every task manager, is there a
>>> better way to do this ?
>>>
>>>
>>> Best,
>>> Nick.
>>>
>>
>> ♻ Be green, keep it on the screen
>
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: Non parallel file sources

Reply via email to