Re: Very Stupid Regex question

2014-08-07 Thread Justin Whear via Digitalmars-d-learn
On Thu, 07 Aug 2014 16:05:16 +, seany wrote:

 obviously there are ways like counting the match length, and then using
 the maximum length, instead of breaking as soon as a match is found.
 
 Are there any other better ways?

You're not really using regexes properly.  You want to greedily match as 
much as possible in this case, e.g.:

void main()
{
import std.regex;
auto re = regex(ab(cd)?);
assert(PREabcdPOST.matchFirst(re).hit == abcd);
assert(PREabPOST.matchFirst(re).hit == ab);

}


Re: Very Stupid Regex question

2014-08-07 Thread via Digitalmars-d-learn

On Thursday, 7 August 2014 at 16:05:17 UTC, seany wrote:

Cosider please the following:

string s1 = PREabcdPOST;
string s2 = PREabPOST;


string[] srar = [ab, abcd];
// this can not be constructed with a particular order

foreach(sr; srar)
{

  auto r = regex(sr; g);
  auto m = matchFirst(s1, r);
  break;
  // this one matches ab
  // but I want this to match abcd
  // and for s2 I want to match ab

}

obviously there are ways like counting the match length, and 
then using the maximum length, instead of breaking as soon as a 
match is found.


Are there any other better ways?


It's not clear to me what exactly you want, but:

Are the regexes in `srar` related? That is, does one regex always 
include the previous one as a prefix? Then you can use optional 
matches:


/ab(cd)?/

This will match abcd if it is there, but will also match ab 
otherwise.


Re: Very Stupid Regex question

2014-08-07 Thread seany via Digitalmars-d-learn

On Thursday, 7 August 2014 at 16:12:59 UTC, Justin Whear wrote:

On Thu, 07 Aug 2014 16:05:16 +, seany wrote:

obviously there are ways like counting the match length, and 
then using
the maximum length, instead of breaking as soon as a match is 
found.


Are there any other better ways?


You're not really using regexes properly.  You want to greedily 
match as

much as possible in this case, e.g.:

void main()
{
import std.regex;
auto re = regex(ab(cd)?);
assert(PREabcdPOST.matchFirst(re).hit == abcd);
assert(PREabPOST.matchFirst(re).hit == ab);

}


thing is, abcd is read from a file, and in the compile time, i 
dont know if cd may at all be there or not, ir if it should be 
ab(ef)


Re: Very Stupid Regex question

2014-08-07 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Aug 07, 2014 at 04:49:05PM +, seany via Digitalmars-d-learn wrote:
 On Thursday, 7 August 2014 at 16:12:59 UTC, Justin Whear wrote:
 On Thu, 07 Aug 2014 16:05:16 +, seany wrote:
 
 obviously there are ways like counting the match length, and then
 using the maximum length, instead of breaking as soon as a match is
 found.
 
 Are there any other better ways?
 
 You're not really using regexes properly.  You want to greedily match
 as much as possible in this case, e.g.:
 
 void main()
 {
  import std.regex;
  auto re = regex(ab(cd)?);
  assert(PREabcdPOST.matchFirst(re).hit == abcd);
  assert(PREabPOST.matchFirst(re).hit == ab);
 
 }
 
 thing is, abcd is read from a file, and in the compile time, i dont
 know if cd may at all be there or not, ir if it should be ab(ef)

So basically you have a file containing regex patterns, and you want to
find the longest match among them?

One way to do this is to combine them at runtime:

string[] patterns = ... /* read from file, etc. */;

// Longer patterns match first
patterns.sort!((a,b) = a.length  b.length);

// Build regex
string regexStr = %((%(%c%))%||%).format(patterns);
auto re = regex(regexStr);

...

// Run matches against input
char[] input = ...;
auto m = input.match(re);
auto matchedString = m.captures[0];


T

-- 
When solving a problem, take care that you do not become part of the problem.


Re: Very Stupid Regex question

2014-08-07 Thread Justin Whear via Digitalmars-d-learn
On Thu, 07 Aug 2014 10:22:37 -0700, H. S. Teoh via Digitalmars-d-learn
wrote:

 
 So basically you have a file containing regex patterns, and you want to
 find the longest match among them?

   // Longer patterns match first patterns.sort!((a,b) = a.length 
   b.length);
 
   // Build regex string regexStr = %((%(%c%))%||%).format
(patterns);
   auto re = regex(regexStr);

This only works if the patterns are simple literals.  E.g. the pattern 'a
+' might match a longer sequence than 'aaa'.  If you're out for the 
longest possible match, iteratively testing each pattern is probably the 
way to go.


Re: Very Stupid Regex question

2014-08-07 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Aug 07, 2014 at 05:33:42PM +, Justin Whear via Digitalmars-d-learn 
wrote:
 On Thu, 07 Aug 2014 10:22:37 -0700, H. S. Teoh via Digitalmars-d-learn
 wrote:
 
  
  So basically you have a file containing regex patterns, and you want
  to find the longest match among them?
 
  // Longer patterns match first patterns.sort!((a,b) = a.length 
  b.length);
  
  // Build regex string regexStr = %((%(%c%))%||%).format
 (patterns);
  auto re = regex(regexStr);
 
 This only works if the patterns are simple literals.  E.g. the pattern
 'a +' might match a longer sequence than 'aaa'.  If you're out for the
 longest possible match, iteratively testing each pattern is probably
 the way to go.

Hmm, you're right. I was a bit disappointed to find out that the |
operator in std.regex (and also in Perl's regex) doesn't do
longest-match but first-match. :-( I had always thought it did
longest-match, like in lex/flex.

I wish we can extend std.regex to allow longest-match for
alternations... but there may be performance consequences.


T

-- 
There's light at the end of the tunnel. It's the oncoming train.


Re: Very Stupid Regex question

2014-08-07 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Aug 07, 2014 at 10:42:13AM -0700, H. S. Teoh via Digitalmars-d-learn 
wrote:
[...]
 Hmm, you're right. I was a bit disappointed to find out that the |
 operator in std.regex (and also in Perl's regex) doesn't do
 longest-match but first-match. :-( I had always thought it did
 longest-match, like in lex/flex.
 
 I wish we can extend std.regex to allow longest-match for
 alternations... but there may be performance consequences.

https://issues.dlang.org/show_bug.cgi?id=13268


T

-- 
Valentine's Day: an occasion for florists to reach into the wallets of
nominal lovers in dire need of being reminded to profess their
hypothetical love for their long-forgotten.


Re: Very Stupid Regex question

2014-08-07 Thread seany via Digitalmars-d-learn
On Thursday, 7 August 2014 at 18:16:11 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:




https://issues.dlang.org/show_bug.cgi?id=13268


T


Thank you soo much!!