Hi Gus, did you ever get an answer to your questions? >From a look at the source code, neither the CSV reader or builders look goroutine safe. However, your usage of the CSV reader above looks safe to me because 'record' gets copied into each goroutine invocation. Importantly, the builder would need to be guarded with something like sync.Mutex [1] to be goroutine safe.
As for approach, do you really need to process your CSV file line-by-line? If not, the CSV reader can take a WithChunk(n int) argument to read in batches of lines which might be preferable. More details about what kind of processing you're doing might be the most helpful thing here though. [1] https://pkg.go.dev/sync#Mutex On Thu, Jun 22, 2023 at 1:39 AM Gus Minto-Cowcher <[email protected]> wrote: > > Hi, > > I am trying to read a CSV file and then concurrently process each line before > building it into a different schema (along with some metadata) which I can > output as a parquet file. Are builder's goroutine safe? In very loose go code > below is what I am trying to do, is this possible, does it make sense, are > there better ways of doing it? > > The aim here is to try and improve performance essentially of reading and > processing these files and marshaling them into a different schema. > > All feedback is appreciated thank you. > > for csvReader.Next() { > record := csvReader.Record() > go func() { > process(record) > builder.Append(record.Column(1).somedataetcetc) > }() > } > > Thanks, > Gus
