It might be simpler to discuss if you replicate the question here.

Are your CSV files splittable? Otherwise Flink/Dataflow runners would not
load the entire file into memory. This is a streaming application, right?
MatchAll in FileIO.java is used in TextIO, AvroIO etc to read files
continuously in streaming applications. It is built on SDF and allows
reading smaller chunks of the file (as long as the file is splittable).

Raghu.


On Mon, Jul 23, 2018 at 7:16 AM Andrew Pilloud <[email protected]> wrote:

> Hi Kelsey,
>
> I posted a reply on stackoverflow. It sounds like you might be using the
> DirectRunner, which isn't meant to handle datasets that are too big to fit
> into memory. If that is the case, have you tried the Flink local runner or
> the Dataflow runner?
>
> Andrew
>
> On Mon, Jul 23, 2018 at 4:06 AM Kelsey RIDER <
> [email protected]> wrote:
>
>> Hello,
>>
>>
>>
>> SO question here :
>> https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam
>>
>> Anybody have any ideas? Am I missing something?
>>
>>
>>
>> Thanks
>> Suite à l’évolution des dispositifs de réglementation du travail, si vous
>> recevez ce mail avant 7h00, en soirée, durant le week-end ou vos congés
>> merci, sauf cas d’urgence exceptionnelle, de ne pas le traiter ni d’y
>> répondre immédiatement.
>>
>

Reply via email to