Re: Speed up COPY FROM text/CSV parsing using SIMD

Andrew Dunstan Fri, 21 Nov 2025 06:49:20 -0800


On 2025-11-20 Th 7:55 AM, Nazir Bilal Yavuz wrote:

Hi,

Thank you for looking into this!

On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <[email protected]> wrote:

IMHO we should be looking for ways to simplify this should-we-use-SIMD
code.  For example, perhaps we could just disable the SIMD path for 10K or
100K lines any time a special character is found.  I'm dubious that a lot
of complexity is warranted.

I think this is a bit too harsh since SIMD is still worth it if SIMD
can advance more than ~5 character average. I am trying to use SIMD as
much as possible when it is worth it but what you said can remove the
regression completely, perhaps that is the correct way.

Perhaps a very small regression (say under 1%) in the worst case wouldbe OK. But the closer you can get that to zero the more acceptable thiswill be. Very large loads of sparse data, which will often have lots ofspecial characters AIUI, are very common, so we should not dismiss theworst case as an outlier. I still like the idea of testing, say, athousand lines every million, or something like that.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Speed up COPY FROM text/CSV parsing using SIMD

Reply via email to