Hi Gus, did you ever get an answer to your questions?

>From a look at the source code, neither the CSV reader or builders
look goroutine safe. However, your usage of the CSV reader above looks
safe to me because 'record' gets copied into each goroutine
invocation. Importantly, the builder would need to be guarded with
something like sync.Mutex [1] to be goroutine safe.

As for approach, do you really need to process your CSV file
line-by-line? If not, the CSV reader can take a WithChunk(n int)
argument to read in batches of lines which might be preferable. More
details about what kind of processing you're doing might be the most
helpful thing here though.

[1] https://pkg.go.dev/sync#Mutex


On Thu, Jun 22, 2023 at 1:39 AM Gus Minto-Cowcher <[email protected]> wrote:
>
> Hi,
>
> I am trying to read a CSV file and then concurrently process each line before 
> building it into a different schema (along with some metadata) which I can 
> output as a parquet file. Are builder's goroutine safe? In very loose go code 
> below is what I am trying to do, is this possible, does it make sense, are 
> there better ways of doing it?
>
> The aim here is to try and improve performance essentially of reading and 
> processing these files and marshaling them into a different schema.
>
> All feedback is appreciated thank you.
>
> for csvReader.Next() {
>   record := csvReader.Record()
>   go func() {
>     process(record)
>     builder.Append(record.Column(1).somedataetcetc)
>   }()
> }
>
> Thanks,
> Gus

Reply via email to