On Thu, 24 Apr 2014, swati upadhyaya wrote: > Thanks for your replt,it will be great if you can > shot out my problem...I have tried with many pattern and found that PCRE > talkes lesser time then any other regex lib thats why want to use PCRE but > there are some pattern like the one abpve for which its unable to match.
Is this pattern generated by some process? It contains really silly sequences like \s*(?:(?:(?:\s+)))\s* and similar. I had a further look. I found it was failing at the \t in the sequence \s*\s*(?:(?:(?:[\t]+)))\s*\s* (another crazy sequence) because there were no tab characters in the data string. So I changed \t to \s (to match a space). The match then failed with Error -8 (match limit exceeded) In other words, the pattern makes a very large search tree, which takes a long time to scan. Sequences such as (?:(?:\w+\s?)+))) are dangerous because they contain nested unlimited repeats. This is such a crazy pattern that I really can't mess with any more. Can you not find a way of creating a clean pattern without all the redundancy? It might then be easier to see why it runs for so long. I'm suspicious of all the .*? items: each of those is going to try the rest of the pattern after swallowing 0, 1, 2, 3, ... characters. The use of atomic groups (?>.....) would also stop a lot of the backtracking. Aha! I changed (?:(?:\w+\s?)+))) to (?:(?>\w+\s?)+))) that is, made it into an atomic group, and lo and behold, when I ran pcretest: PCRE version 8.35 2014-04-04 "MSWinEventLog\s*(?:(?:(?:\s+)))\s*(?:\s*(?:(?:(?:\d\s+)))\s*)?\s*(?:(?P<event_log__string>(?:\S+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:\s+)))\s*\s*(?:(?P<event_id__0>(?:4610|4614|4622)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?P<event_source__all>(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?P<event_category__all>(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:(A|An).*?)))\s*\s*(?:(?P<object__words>(?:(?>\w+\s?)+)))\s*\s*(?:(?:(?:has been)))\s*\s*(?:(?P<action__0>(?:loaded)))\s*\s*(?:(?:(?: by the)))\s*\s*(?:(?:(?:.*?)))\s*Package Name\:\s*(?:(?P<package__0>(?:\S+)))\s*" <14>Mar 2 11:34:38 89.237.143.23 MSWinEventLog 1 Security 6500 Fri Mar 02 11:34:37 2012 4610 Microsoft-Windows-Security-Auditing N/A N/A Success Audit prabhat.ImmuneAps.com User Logoff A authentication package has been loaded by the Local Security Authority. This authentication package will be used to authenticate logon attempts. Authentication Package Name: C:\\Windows\\system32\\msv1_0.dll : MICROSOFT_AUTHENTICATION_PACKAGE_V1_0 0: MSWinEventLog 1 Security 6500 Fri Mar 02 11:34:37 2012 4610 Microsoft-Windows-Security-Auditing N/A N/A Success Audit prabhat.ImmuneAps.com User Logoff A authentication package has been loaded by the Local Security Authority. This authentication package will be used to authenticate logon attempts. Authentication Package Name: C:\Windows\system32\msv1_0.dll 1: Security 2: 4610 3: Microsoft-Windows-Security-Auditing 4: prabhat.ImmuneAps.com User Logoff 5: A 6: authentication package 7: loaded 8: C:\Windows\system32\msv1_0.dll ... and this was pretty well instantaneous. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
