Since SunSpider time now is about 10% inside the regular expression matcher, it's time to reconsider regular expression optimizations. In particular, we should see if we can find a subset of regular expressions where we can implement a more efficient matching algorithm.
I wrote code to dump a histogram of time spent matching regular expressions: <https://bugs.webkit.org/show_bug.cgi?id=19801>. Here's the result from running SunSpider on my computer. The column on the left is the number of seconds spend matching an expression, and the column on the right is the expression: 0.080214 - \b\w+\b 0.079685 - agggtaa[cgt]|[acg]ttaccct (ignore case) 0.077910 - [cgt]gggtaaa|tttaccc[acg] (ignore case) 0.073445 - agggta[cgt]a|t[acg]taccct (ignore case) 0.073439 - a[act]ggtaaa|tttacc[agt]t (ignore case) 0.073387 - agggt[cgt]aa|tt[acg]accct (ignore case) 0.072898 - ag[act]gtaaa|tttac[agt]ct (ignore case) 0.072552 - aggg[acg]aaa|ttt[cgt]ccct (ignore case) 0.072440 - agg[act]taaa|ttta[agt]cct (ignore case) 0.072167 - agggtaaa|tttaccct (ignore case) 0.057528 - >.*\n|\n 0.045544 - "[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(:?[eE][+ \-]?\d+)? 0.024901 - (?:^|:|,)(?:\s*\[)+ 0.008831 - [EMAIL PROTECTED](\.?[a-zA-Z0-9\-_]*) \.[a-zA-Z]{2,3}$ 0.001060 - ^[\],:{}\s]*$ 0.000517 - \\. 0.000264 - [\0\t\n\v\f\r\xa0'"!-] 0.000158 - !\d\d?\d?! 0.000062 - -.* 0.000026 - ^ 0.000024 - ('|\\) 0.000010 - ^[^-]*- I wish we had a test with some better regular expression coverage! Here's one other case with some significant regexp time in it, I think perhaps from JSON validtion? Loading 280slides.com and looking at the test presentation: 0.028335 - \/\/.*(\r|\n)?|\/\*(?:.|\n|\r)*?\*\/|\w+\b|[+-]?\d+ (([.]\d+)*([eE][+-]?\d+))?|"[^"\\]*(\\.[^"\\]*)*"|'[^'\\]*(\\.[^'\ \]*)*'|\s+|. 0.006968 - \S 0.001098 - "[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+ \-]?\d+)? 0.000422 - \\. 0.000263 - (?:^|:|,)(?:\s*\[)+ 0.000259 - [\:\+\-\*\/\=\<\>\&\|\!\.\%] 0.000245 - [^\s] 0.000193 - [\+\-\*\/\=\<\>\&\|\!\.\[\^\(] 0.000119 - \s 0.000109 - ^[0-9a-zA-Z_\-+~!$\^{}|.%'`#&*]+/[0-9a-zA-Z_\-+~!$ \^{}|.%'`#&*]+\+xml$ (ignore case) 0.000042 - ^\w 0.000023 - ^[\],:{}\s]*$ I had a hard time finding pages and tests where regular expressions took enough time to be worth looking at. -- Darin _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

