> On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <[email protected]> wrote: > >> Thanks for running that benchmark! Would you mind sharing a reproducer > >> for the regression you observed? > > > > Of course, I attached the sql to generate the text and csv test files. > > If having a 1/3 of line length of special characters can be an > exaggeration, something lower might still reproduce some regressions of > course for the same idea. > > Thank you so much! > > I am able to reproduce the regression you mentioned but both > regressions are %20 on my end. I found that (by experimenting) SIMD > causes a regression if it advances less than 5 characters. > > So, I implemented a small heuristic. It works like that: > > - If advance < 5 -> insert a sleep penalty (n cycles). > - Each time advance < 5, n is doubled. > - Each time advance ≥ 5, n is halved. > > I am sharing a POC patch to show heuristic, it can be applied on top > of v1-0001. Heuristic version has the same performance improvements > with the v1-0001 but the regression is %5 instead of %20 compared to > the master. > > -- > Regards, > Nazir Bilal Yavuz > Microsoft
Yes this is good, i'm also getting about 5% regression only now. Regards, Ayoub Kazar
