Hi, do you use JIT?
The engine has special single character optimizations to make /.*/ fast. The generic /(?:anything)*/ is much slower. The numbers below are reasonable. E.g. using capturing bracket is the slowest: /(.)*/, since you need to store extra data. Yes, you can make pattern matching fast, but the solution depends on what you do. Backtracking engines are similar to algorithmic languages, they provide a lot of tools, but the user need to choose the right one. E.g. a quicksort in python is faster than a bubble sort in C even though C is much faster than python. Regards, Zoltan -------- Eredeti levél -------- Feladó: Magnus Rex < mag...@rexikan.se (Link -> mailto:mag...@rexikan.se) > Dátum: 2017 december 4 17:22:20 Tárgy: [pcre-dev] Non-capturing group overhead (100x) Címzett: pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) Hi, When optimizing a regular expression in Elixir I was surprised to see that there is a massive overhead to using non-capture groups. I find that /(?:.)*/ takes 100 times longer to execute than /.*/. Why is that, and is there anything I can do to speed it up? Elixir uses Erlang and PCRE 8.41 for regular expressions. Investigating further using the Elixir console (see expanation below): $ iex Erlang/OTP 20 [erts-9.1.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false] [dtrace] Interactive Elixir (1.5.2) - press Ctrl+C to exit (type h() ENTER for help) iex(1)> s = String.duplicate("a",1000000);nil nil iex(2)> time = fn f -> t=:os.system_time(:milli_seconds); f.(); :os.system_time(:milli_seconds)-t end #Function<6.99386804/1 in :erl_eval.expr/5> iex(3)> time.(fn -> :re.run(s, ".*") end ) 8 iex(4)> time.(fn -> :re.run(s, "(?:.)*") end ) 669 iex(5)> time.(fn -> :re.run(s, "(?:..)*") end ) 281 iex(6)> time.(fn -> :re.run(s, "(?:....)*") end ) 140 iex(7)> time.(fn -> :re.run(s, "(?>.)*") end ) 678 iex(8)> time.(fn -> :re.run(s, "(.)*") end ) 1558 iex(9)> time.(fn -> :re.run(s, "(..)*") end ) 607 iex(10)> time.(fn -> :re.run(s, "(....)*") end ) 270 iex(11)> time.(fn -> :re.run(s, ".((?:.)*)") end ) 602 iex(12)> time.(fn -> :re.run(s, ".((?:..)*)") end ) 285 iex(13)> time.(fn -> :re.run(s, ".((?:....)*)") end ) 147 #1 above creates a string with one million characters. #2 sets up a function to measure execution time in milliseconds. #3-4 gives that it is 100 times more expensive with a non-capturing group. #4-6 gives that the execution time is linear to the number of groups captured. #7 gives that the same is true for atomic groups. #8-10 shows the extra overhead with actually capturing the groups. It takes double time compared with non-capturing. #11-13 shows that it is really the number of groups that set the execution time and not the length of what is captured. I have seen the same effect when trying other languages that used PCRE, but not when using JavaScript of Googles RE2. I would be really grateful if someone could help me understand this unexpected result. / Magnus -- Magnus Rex https://twitter.com/rexikan -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev