Just wanted to chime in here: The builders and the CSV reader are currently explicitly NOT goroutine safe, there are no locks or checks in them to prevent race conditions.
In the sample code you have above Gus, you need to add a call to record.Retain() before the `go func()` call. This is because every call to `Next()` on the CSV reader will release the previous record. So you need to call Retain on it before calling next so that it still exists for the goroutine. Otherwise that should be fine. For the rest, I agree with Bryce: the CSV reader can already take a WithChunk argument to read in batches rather than line by line. That said there is definitely a desire to add a way to better parallelize the CSV reader and I would welcome a PR to that end. You could look at the C++ CSV reader for an example as how to potentially do that. --Matt On Thu, Jul 6, 2023 at 9:02 PM Bryce Mecum <[email protected]> wrote: > Hi Gus, did you ever get an answer to your questions? > > From a look at the source code, neither the CSV reader or builders > look goroutine safe. However, your usage of the CSV reader above looks > safe to me because 'record' gets copied into each goroutine > invocation. Importantly, the builder would need to be guarded with > something like sync.Mutex [1] to be goroutine safe. > > As for approach, do you really need to process your CSV file > line-by-line? If not, the CSV reader can take a WithChunk(n int) > argument to read in batches of lines which might be preferable. More > details about what kind of processing you're doing might be the most > helpful thing here though. > > [1] https://pkg.go.dev/sync#Mutex > > > On Thu, Jun 22, 2023 at 1:39 AM Gus Minto-Cowcher <[email protected]> > wrote: > > > > Hi, > > > > I am trying to read a CSV file and then concurrently process each line > before building it into a different schema (along with some metadata) which > I can output as a parquet file. Are builder's goroutine safe? In very loose > go code below is what I am trying to do, is this possible, does it make > sense, are there better ways of doing it? > > > > The aim here is to try and improve performance essentially of reading > and processing these files and marshaling them into a different schema. > > > > All feedback is appreciated thank you. > > > > for csvReader.Next() { > > record := csvReader.Record() > > go func() { > > process(record) > > builder.Append(record.Column(1).somedataetcetc) > > }() > > } > > > > Thanks, > > Gus >
