It might be simpler to discuss if you replicate the question here. Are your CSV files splittable? Otherwise Flink/Dataflow runners would not load the entire file into memory. This is a streaming application, right? MatchAll in FileIO.java is used in TextIO, AvroIO etc to read files continuously in streaming applications. It is built on SDF and allows reading smaller chunks of the file (as long as the file is splittable).
Raghu. On Mon, Jul 23, 2018 at 7:16 AM Andrew Pilloud <[email protected]> wrote: > Hi Kelsey, > > I posted a reply on stackoverflow. It sounds like you might be using the > DirectRunner, which isn't meant to handle datasets that are too big to fit > into memory. If that is the case, have you tried the Flink local runner or > the Dataflow runner? > > Andrew > > On Mon, Jul 23, 2018 at 4:06 AM Kelsey RIDER < > [email protected]> wrote: > >> Hello, >> >> >> >> SO question here : >> https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam >> >> Anybody have any ideas? Am I missing something? >> >> >> >> Thanks >> Suite à l’évolution des dispositifs de réglementation du travail, si vous >> recevez ce mail avant 7h00, en soirée, durant le week-end ou vos congés >> merci, sauf cas d’urgence exceptionnelle, de ne pas le traiter ni d’y >> répondre immédiatement. >> >
