Hi, On 2020-02-19 11:38:45 +0100, Tomas Vondra wrote: > I generally agree with the impression that parsing CSV is tricky and > unlikely to benefit from parallelism in general. There may be cases with > restrictions making it easier (e.g. restrictions on the format) but that > might be a bit too complex to start with. > > For example, I had an idea to parallelise the planning by splitting it > into two phases:
FWIW, I think we ought to rewrite our COPY parsers before we go for complex schemes. They're way slower than a decent green-field CSV/... parser. > The one piece of information I'm missing here is at least a very rough > quantification of the individual steps of CSV processing - for example > if parsing takes only 10% of the time, it's pretty pointless to start by > parallelising this part and we should focus on the rest. If it's 50% it > might be a different story. Has anyone done any measurements? Not recently, but I'm pretty sure that I've observed CSV parsing to be way more than 10%. Greetings, Andres Freund