Re: Very Stupid Regex question
On Thu, 07 Aug 2014 16:05:16 +, seany wrote: obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways? You're not really using regexes properly. You want to greedily match as much as possible in this case, e.g.: void main() { import std.regex; auto re = regex(ab(cd)?); assert(PREabcdPOST.matchFirst(re).hit == abcd); assert(PREabPOST.matchFirst(re).hit == ab); }
Re: Very Stupid Regex question
On Thursday, 7 August 2014 at 16:05:17 UTC, seany wrote: Cosider please the following: string s1 = PREabcdPOST; string s2 = PREabPOST; string[] srar = [ab, abcd]; // this can not be constructed with a particular order foreach(sr; srar) { auto r = regex(sr; g); auto m = matchFirst(s1, r); break; // this one matches ab // but I want this to match abcd // and for s2 I want to match ab } obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways? It's not clear to me what exactly you want, but: Are the regexes in `srar` related? That is, does one regex always include the previous one as a prefix? Then you can use optional matches: /ab(cd)?/ This will match abcd if it is there, but will also match ab otherwise.
Re: Very Stupid Regex question
On Thursday, 7 August 2014 at 16:12:59 UTC, Justin Whear wrote: On Thu, 07 Aug 2014 16:05:16 +, seany wrote: obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways? You're not really using regexes properly. You want to greedily match as much as possible in this case, e.g.: void main() { import std.regex; auto re = regex(ab(cd)?); assert(PREabcdPOST.matchFirst(re).hit == abcd); assert(PREabPOST.matchFirst(re).hit == ab); } thing is, abcd is read from a file, and in the compile time, i dont know if cd may at all be there or not, ir if it should be ab(ef)
Re: Very Stupid Regex question
On Thu, Aug 07, 2014 at 04:49:05PM +, seany via Digitalmars-d-learn wrote: On Thursday, 7 August 2014 at 16:12:59 UTC, Justin Whear wrote: On Thu, 07 Aug 2014 16:05:16 +, seany wrote: obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways? You're not really using regexes properly. You want to greedily match as much as possible in this case, e.g.: void main() { import std.regex; auto re = regex(ab(cd)?); assert(PREabcdPOST.matchFirst(re).hit == abcd); assert(PREabPOST.matchFirst(re).hit == ab); } thing is, abcd is read from a file, and in the compile time, i dont know if cd may at all be there or not, ir if it should be ab(ef) So basically you have a file containing regex patterns, and you want to find the longest match among them? One way to do this is to combine them at runtime: string[] patterns = ... /* read from file, etc. */; // Longer patterns match first patterns.sort!((a,b) = a.length b.length); // Build regex string regexStr = %((%(%c%))%||%).format(patterns); auto re = regex(regexStr); ... // Run matches against input char[] input = ...; auto m = input.match(re); auto matchedString = m.captures[0]; T -- When solving a problem, take care that you do not become part of the problem.
Re: Very Stupid Regex question
On Thu, 07 Aug 2014 10:22:37 -0700, H. S. Teoh via Digitalmars-d-learn wrote: So basically you have a file containing regex patterns, and you want to find the longest match among them? // Longer patterns match first patterns.sort!((a,b) = a.length b.length); // Build regex string regexStr = %((%(%c%))%||%).format (patterns); auto re = regex(regexStr); This only works if the patterns are simple literals. E.g. the pattern 'a +' might match a longer sequence than 'aaa'. If you're out for the longest possible match, iteratively testing each pattern is probably the way to go.
Re: Very Stupid Regex question
On Thu, Aug 07, 2014 at 05:33:42PM +, Justin Whear via Digitalmars-d-learn wrote: On Thu, 07 Aug 2014 10:22:37 -0700, H. S. Teoh via Digitalmars-d-learn wrote: So basically you have a file containing regex patterns, and you want to find the longest match among them? // Longer patterns match first patterns.sort!((a,b) = a.length b.length); // Build regex string regexStr = %((%(%c%))%||%).format (patterns); auto re = regex(regexStr); This only works if the patterns are simple literals. E.g. the pattern 'a +' might match a longer sequence than 'aaa'. If you're out for the longest possible match, iteratively testing each pattern is probably the way to go. Hmm, you're right. I was a bit disappointed to find out that the | operator in std.regex (and also in Perl's regex) doesn't do longest-match but first-match. :-( I had always thought it did longest-match, like in lex/flex. I wish we can extend std.regex to allow longest-match for alternations... but there may be performance consequences. T -- There's light at the end of the tunnel. It's the oncoming train.
Re: Very Stupid Regex question
On Thu, Aug 07, 2014 at 10:42:13AM -0700, H. S. Teoh via Digitalmars-d-learn wrote: [...] Hmm, you're right. I was a bit disappointed to find out that the | operator in std.regex (and also in Perl's regex) doesn't do longest-match but first-match. :-( I had always thought it did longest-match, like in lex/flex. I wish we can extend std.regex to allow longest-match for alternations... but there may be performance consequences. https://issues.dlang.org/show_bug.cgi?id=13268 T -- Valentine's Day: an occasion for florists to reach into the wallets of nominal lovers in dire need of being reminded to profess their hypothetical love for their long-forgotten.
Re: Very Stupid Regex question
On Thursday, 7 August 2014 at 18:16:11 UTC, H. S. Teoh via Digitalmars-d-learn wrote: https://issues.dlang.org/show_bug.cgi?id=13268 T Thank you soo much!!