Hello, On Tue, Jan 20, 2026 at 9:49 PM Manni Wood <[email protected]> wrote:
> Hello, all I have more benchmarks. > > These benchmarks are from a Raspberry Pi 5 that I bought. It has an Arm > Cortex A76 processor. > > (I was so impressed with the stability of the results I got on my > standalone Intel tower PC that I figured I needed a standalone Arm-based > machine that was not a laptop and not a VM at a cloud service provider. The > run-to-run results were indeed more stable, just like with my standalone > tower PC.) > > COPY FROM > > master: (852558b9) > > text, no special: 9111 > text, 1/3 special: 10302 > csv, no special: 11147 > csv, 1/3 special: 13375 > > v3 > > text, no special: 7351 (19.3% speedup) > text, 1/3 special: 10397 (0.9% regression) > csv, no special: 7272 (34.7% speedup) > csv, 1/3 special: 13472 (0.7% regression) > > v4.2 > > text, no special: 7300 (19.6% speedup) > text, 1/3 special: 10537 (2.3% regression) > csv, no special: 7260 (34.8% speedup) > csv, 1/3 special: 13881 (3.8% regression) > > COPY TO > > master: (852558b9) > > text, no special: 2446 > text, 1/3 special: 6988 > csv, no special: 2822 > csv, 1/3 special: 6967 > > v4 (copy to) > > text, no special: 1533 (37.3% speedup) > text, 1/3 special: 5949 (14.8% speedup) > csv, no special: 1560 (44.7% speedup) > csv, 1/3 special: 6006 (13.8% speedup) > > I find these results particularly exciting because with the COPY FROM v3 > patch, the worst-case scenarios are just under 1% regression. The v4 COPY > TO patch is a win across the board. > > Note that I ran these benchmarks with everything in RAM disk and using the > cpupower instructions that Nazir suggested. > > So on Arm, the v3 COPY FROM patch is almost all upside, and the v4 COPY TO > patch is all upside. The same is almost true for Intel, but the CSV COPY > FROM regression, even from the V3 COPY FROM patch, is about 5%. The v4.2 > COPY FROM patch always performs worse than the v3 COPY FROM patch in > worst-case scenarios. > > Does it seem reasonable to stop performance testing the v4.2 COPY FROM > patch? Have we collected enough benchmark data to be confident that the v3 > COPY FROM patch is the one we should be moving forward with? > For the case of v4.2 using the 1/3 specials benchmark, it will always take the decision to not use SIMD after sampling and that 3%-4% regression is the combination of the small overhead of counting special characters and 2-4 branches and its effect on the general layout, branch prediction, pipeline ..etc, while i don't think it's more complex than v3 but this is the only thing i can think of. And since it assumes uniformity of special characters between lines so yes IMHO v3 is generally better. Regards, Ayoub
