Hello,

On Tue, Jan 20, 2026 at 9:49 PM Manni Wood <[email protected]>
wrote:

> Hello, all I have more benchmarks.
>
> These benchmarks are from a Raspberry Pi 5 that I bought. It has an Arm
> Cortex A76 processor.
>
> (I was so impressed with the stability of the results I got on my
> standalone Intel tower PC that I figured I needed a standalone Arm-based
> machine that was not a laptop and not a VM at a cloud service provider. The
> run-to-run results were indeed more stable, just like with my standalone
> tower PC.)
>
> COPY FROM
>
> master: (852558b9)
>
> text, no special: 9111
> text, 1/3 special: 10302
> csv, no special: 11147
> csv, 1/3 special: 13375
>
> v3
>
> text, no special: 7351 (19.3% speedup)
> text, 1/3 special: 10397 (0.9% regression)
> csv, no special: 7272 (34.7% speedup)
> csv, 1/3 special: 13472 (0.7% regression)
>
> v4.2
>
> text, no special: 7300 (19.6% speedup)
> text, 1/3 special: 10537 (2.3% regression)
> csv, no special: 7260 (34.8% speedup)
> csv, 1/3 special: 13881 (3.8% regression)
>
> COPY TO
>
> master: (852558b9)
>
> text, no special: 2446
> text, 1/3 special: 6988
> csv, no special: 2822
> csv, 1/3 special: 6967
>
> v4 (copy to)
>
> text, no special: 1533 (37.3% speedup)
> text, 1/3 special: 5949 (14.8% speedup)
> csv, no special: 1560 (44.7% speedup)
> csv, 1/3 special: 6006 (13.8% speedup)
>
> I find these results particularly exciting because with the COPY FROM v3
> patch, the worst-case scenarios are just under 1% regression. The v4 COPY
> TO patch is a win across the board.
>
> Note that I ran these benchmarks with everything in RAM disk and using the
> cpupower instructions that Nazir suggested.
>
> So on Arm, the v3 COPY FROM patch is almost all upside, and the v4 COPY TO
> patch is all upside. The same is almost true for Intel, but the CSV COPY
> FROM regression, even from the V3 COPY FROM patch, is about 5%. The v4.2
> COPY FROM patch always performs worse than the v3 COPY FROM patch in
> worst-case scenarios.
>
> Does it seem reasonable to stop performance testing the v4.2 COPY FROM
> patch? Have we collected enough benchmark data to be confident that the v3
> COPY FROM patch is the one we should be moving forward with?
>
For the case of v4.2 using the 1/3 specials benchmark, it will always take
the decision to not use SIMD after sampling and that 3%-4% regression is
the combination of the small overhead of counting special characters and
2-4 branches and its effect on the general layout, branch prediction,
pipeline ..etc, while i don't think it's more complex than v3 but this is
the only thing i can think of.
And since it assumes uniformity of special characters between lines so yes
IMHO v3 is generally better.

Regards,
Ayoub

Reply via email to