On Wed, Nov 12, 2025 at 8:44 AM KAZAR Ayoub <[email protected]> wrote:
> On Tue, Nov 11, 2025 at 11:23 PM Manni Wood <[email protected]> > wrote: > >> Hello! >> >> I wanted reproduce the results using files attached by Shinya Kato and >> Ayoub Kazar. I installed a postgres compiled from master, and then I >> installed a postgres built from master plus Nazir Bilal Yavuz's v3 patches >> applied. >> >> The master+v3patches postgres naturally performed better on copying into >> the database: anywhere from 11% better for the t.csv file produced by >> Shinyo's test.sql, to 35% better copying in the t_4096_none.csv file >> created by Ayoub Kazar's simd-copy-from-bench.sql. >> >> But here's where it gets weird. The two files created by Ayoub Kazar's >> simd-copy-from-bench.sql that are supposed to be slower, t_4096_escape.txt, >> and t_4096_quote.csv, actually ran faster on my machine, by 11% and 5% >> respectively. >> >> This seems impossible. >> >> A few things I should note: >> >> I timed the commands using the Unix time command, like so: >> >> time psql -X -U mwood -h localhost -d postgres -c '\copy t from >> /tmp/t_4096_escape.txt' >> >> For each file, I timed the copy 6 times and took the average. >> >> This was done on my work Linux machine while also running Chrome and an >> Open Office spreadsheet; not a dedicated machine only running postgres. >> > Hello, > I think if you do a perf benchmark (if it still reproduces) it would > probably be possible to explain why it's performing like that looking at > the CPI and other metrics and compare it to my findings. > What i also suggest is to make the data close even closer to the worst > case i.e: more special characters where it hurts the switching between SIMD > and scalar processing (in simd-copy-from-bench.sql file), if still does a > good job then there's something to look at. > >> >> > >> All of the copy results took between 4.5 seconds (Shinyo's t.csv copied >> into postgres compiled from master) to 2 seconds (Ayoub >> Kazar's t_4096_none.csv copied into postgres compiled from master plus >> Nazir's v3 patches). >> >> Perhaps I need to fiddle with the provided SQL to produce larger files to >> get longer run times? Maybe sub-second differences won't tell as >> interesting a story as minutes-long copy commands? >> > I did try it on some GBs (around 2-5GB only), the differences were not > that much, but if you can run this on more GBs (at least 10GB) it would be > good to look at, although i don't suspect anything interesting since the > shape of data is the same for the totality of the COPY. > >> >> Thanks for reading this. >> -- >> -- Manni Wood EDB: https://www.enterprisedb.com >> > Thanks for the info. > > > Regards, > Ayoub Kazar. > Hello again! It looks like using 10 times the data removed the apparent speedup in the simd code when the simd code has to deal with t_4096_escape.txt and t_4096_quote.csv. When both files contain 1,000,000 lines each, postgres master+v3patch imports 0.63% slower and 0.54% slower respectively. For 1,000,000 lines of t_4096_none.txt, the v3 patch yields a 30% speedup. For 1,000,000 lines of t_4096_none.csv, the v3 patch yields a 33% speedup. I got these numbers just via simple timing, though this time I used psql's \timing feature. I left psql running rather than launching it each time as I did when I used the unix "time" command. I ran the copy command 5 times for each file and averaged the results. Again, this happened on a Linux machine that also happened to be running Chrome and Open Office's spreadsheet. I should probably try to construct some .txt or .csv files that would trip up the simd on/off heuristic in the v3 patch. If data "in the wild" tend to be roughly the same "shape" from row to row, as Andrew's experience has shown, I imagine these million row results bode well for the v3 patch... -- -- Manni Wood EDB: https://www.enterprisedb.com
