Monday, June 05, 2000, 10:12:02 AM, Tom wrote:
> The regexp would be "([a-zA-Z]*)@" and the match would end up in "\1".
> I don't have any idea how to set this up in TB!.

    Erm, no.  This is the classic problem of trying to define what is a
"legal" email address.  [a-zA-z]* doesn't match, for example,
<[EMAIL PROTECTED]> or <[EMAIL PROTECTED]>.

    Secondly, since these are Perl compatible regex it would be easier to use
\w* instead of [a-zA-Z]*.  \w* has the notion of detecting the latter of the
above examples.

    Also, I doubt that it would deposit into \1.  If it is truly Perl
compatible regex it would deposit into $1.

    So, as an exercise in pontification I decided to see what I could dig up.
First, from RFC822 the definition of an email address:
     addr-spec   =  local-part "@" domain        ; global address

    And then the local-part:
     local-part  =  word *("." word)             ; uninterpreted
                                                 ; case-preserved

    And then word:
     word        =  atom / quoted-string

    Then atom:
     atom        =  1*<any CHAR except specials, SPACE and CTLs>

    Then special:
     specials    =  "(" / ")" / "<" / ">" / "@"  ; Must be in quoted-
                 /  "," / ";" / ":" / "\" / <">  ;  string, to use
                 /  "." / "[" / "]"              ;  within a word.

    Sooooo, from all of that we get the following in english.  An email
address is made up of two parts, a local part and a domain.  The local part is
a single word or multiple words separated by a dot(.).  A word is defined as
either a single atom or a quoted string and an atom is defined as any number
of characters except specials (defined in the spec), space and control
characters.

    Hmmmm, to define a regex to match that in the context of extracting the
email address.  Warning, perl syntax from here on out since I've not worked
with TB!'s implementation since it is 100% undocumented outside of the what's
new (AFAIK).

    m/([\w\.]+)@/ would catch things like [EMAIL PROTECTED]  But
here is the problem.  ! isn't listed as a special and I don't think it is a
control.  So is #, $, %, ^, &, *, -, _, +, =, ', /, ?, \, |, `, ~.  So,
technically, those can be used for an email address.  Sooooo...

    m/([\w\!\.\#\$\%\^\&\*\_\-\=\+\'\,\/\?\\\|\`\~]+)@/ *should*, in theory,
match all "legal" email addresses.  That is, of course, assuming that you
could extract the email address in the first place or that TB! presents only
the email address to the regex.  If not then you need to figure out a way to
parse out the name, find the innermost set of <>'s which are traditionally used
to denote email addresses and /then/ apply the above regex to that portion.
:)


-- 
         Steve C. Lamb         | I'm your priest, I'm your shrink, I'm your
         ICQ: 5107343          | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------

-- 
--------------------------------------------------------------
View the TBUDL archive at http://tbudl.thebat.dutaint.com
To send a message to the list moderation team double click here:
   <mailto:[EMAIL PROTECTED]>
To Unsubscribe from TBUDL, double click here and send the message:
   <mailto:[EMAIL PROTECTED]>
--------------------------------------------------------------

You are subscribed as : [email protected]


Reply via email to