Michael Holzt wrote:

I'm still trying to decode that regexp (will have a look in the camel book
later)


Does this help?

$ perl -Mre=debug -e "qr'(?:[a-zA-Z0-9](?:[-a-zA-Z0-9]*[a-zA-Z0-9])?)';" Freeing REx: `","'
Compiling REx `(?:[a-zA-Z0-9](?:[-a-zA-Z0-9]*[a-zA-Z0-9])?)'
size 39 Got 316 bytes for offset annotations.
first at 1
1: ANYOF[0-9A-Za-z](12)
12: CURLYX[0] {0,1}(38)
14: STAR(26)
15: ANYOF[\-0-9A-Za-z](0)
26: ANYOF[0-9A-Za-z](37)
37: WHILEM(0)
38: NOTHING(39)
39: END(0)
stclass `ANYOF[0-9A-Za-z]' minlen 1


How about this ?

my $subdomain = qr'
    (?:                    # group but no backreferences
      [a-zA-Z0-9]          # match a single ALPHA / DIGIT
        (?:                # group but no backreferences
            [-a-zA-Z0-9]*  # match zero or more ALPHA / DIGIT / HYPHEN
            [a-zA-Z0-9]    # followed by a single ALPHA / DIGIT
        )?                 # but only optionally match this group
    )'x;

However, now that I diagram this out, I think this is still too limiting. Here is the BNF notation for the subdomain term:

#   sub-domain = Let-dig [Ldh-str]
#   Let-dig = ALPHA / DIGIT
#   Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig

so we have to match

        u.domain.edu
        u-u.domain.edu
        university-something.domain.edu

which I don't think that regex manages.  Isn't this better?

my $subdomain = qr'
    (?:                    # group but no backreferences
      [a-zA-Z0-9]          # match a single ALPHA / DIGIT
        (?:                # group but no backreferences
          -(?=[a-zA-Z0-9]) # match HYPHEN when followed by ALPHA / DIGIT
        )?                 # but only optionally match this group
      [a-zA-Z0-9]*         # followed by a zero or more ALPHA / DIGIT
    )'x;

Right???

John

Reply via email to