Hi Frederic, thank you for measuring PCRE on PPC. The results are quite interesting.
It seems to me that those patterns are slower whose require heavy backtracking. I mean where fast-forward (skipping) algorithms cannot be used (or they match too frequently). The /[a-zA-Z]+ing/ is a good example for that. Backtracking engines (PCRE, Oniguruma) suffers much more on PPC than those that read input once (TRE, RE2). I suspect branch prediction on x86 is better, but only statistics profilers can prove that. Oprofile is available everywhere, and can profile JIT code. That part is developed by IBM :) http://oprofile.sourceforge.net/doc/devel/index.html It needs some extra coding though. If you are interested to work on that, I can help. Btw the Tom.{10,25}river|river.{10,25}Tom pattern is twice as fast on PPC with JIT if I understand the numbers correctly. Regards, Zoltan Frederic Bonnard <[email protected]> írta: >Thanks Zoltan for the quick reply. >- Ok I think I got it for SSE2. >- For SIMD instructions, I fear I don't have currently the knowledge for that >but >would be willing to learn/help. >- A good start would be that 3rd point, about current code and performance > status on PPC vs x86. > I reused http://sljit.sourceforge.net/regex_perf.html, I hope it is relevant. > pcre directory has been updated to use latest 8.37 instead of 8.32. > My VMs were : > * x86-64 4x2.3GHz 4G memory on a x86-64 host > * ppc64el 4x3GHz 4G memory on a P8 host > * ppc64 4x3GHz 4G memory on a P8 host > All were installed with Ubuntu 14.04 LTS. > Note on Ubuntu for ppc64, default is to have binary in 32b running on a 64b > kernel, thus the binary 'runtest' is 32b. Maybe I'd need to try with 64b > binary. > Here is attached the results for those 3 environments. The goal is not to > find who's the best but rather find any odd behaviour. Also let's focus on > pcre/pcre-jit . > Any comment from experts eyes welcomed. > On my side, I see very comparable results between ppc64/pcc64el so no major > issue on ppc64el. Now, between x86 and ppc64el, the results for the latter > seem overall weaker, all the more that the x86 VM has lower freq. > Results would need maybe more repetition ? and percentage to compare but I > already see some x2 or x3 time slower results for pcre-jit : > .{0,3}(Tom|Sawyer|Huckleberry|Finn) > [a-zA-Z]+ing > ^[a-zA-Z]{0,4}ing[^a-zA-Z] > [a-zA-Z]+ing$ > ^[a-zA-Z ]{5,}$ > ^.{16,20}$ > "[^"]{0,30}[?!\.]" > Tom.{10,25}river|river.{10,25}Tom > > Any special treatment for these that could make code generated on power > weaker ? > > Fred > >-- >## List details at https://lists.exim.org/mailman/listinfo/pcre-dev -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
