Re: Implementation of :w in regexes and other regex questions

Luke Palmer Tue, 14 Feb 2006 00:34:59 -0800

On 2/14/06, David Romano <[EMAIL PROTECTED]> wrote:
> ==Question 1==
> macro rxmodinternal:<x> { ... } # define your own /:x() stuff/
> macro rxmodexternal:<x> { ... } # define your own m:x()/stuff/
> With this, I can make my own adverbs then? Like :without, or :skip, and
> describe what each does?


Yes, although exactly how is completely unspecified.

> For instance, say I'm trying to pull out date from
> (html) text:
> ...
> Jan had a great birthday on <B>F e b</B> 5, 2<B>00</B>3.
> Her older sister, May, turned 23 on <B>Ma r</B> 5, 19<b>98</b>
> Their younger sister, June, will be going home on <B >Apr</B> 5,
> 2<B>006</B>
> April is their mother, and she's buying a car on <B>Feb< / B > 7,
> 2<B>0</B>06
> I don't know when Roger, their father, is going to buy his guitar.
> ...
>
> The grammar becomes messy when I have to account for things that the rules
> don't allow me to just easily skip:
> grammar Date {
>     rule tag_B_beg:w:i { \<B\> }
>     rule tag_B_end:w:i { \<\/B\> }
>     rule tag_B:w:i { <tag_B_beg>|<tag_B_end> }
>
>     rule month_english:w:i {
>           J<sp>*a<sp>*n       | F<sp>*e<sp>*b | M<sp>*a<sp>*r
>         | A<sp>*p<sp>*r       | M<sp>*a<sp>*y | J<sp>*u<sp>*n<sp>*e
>         | J<sp>*u<sp>*l<sp>*y | A<sp>*u<sp>*g | S<sp>*e<sp>*p
>         | O<sp>*c<sp>*t       | N<sp>*o<sp>*v | D<sp>*e<sp>*c
>     }
>
>     rule year:w:i { (\d<tag_B>?\d<tag_B>?\d<tag_B>?\d) }
>     rule month:w:i { <after <tag_B_beg> > (<month_english>) <before
> <tag_B> > }
>
>     rule day {
>         <after <month> >
>         ( <after <[1..2]> >? <[1..9]> | 3<[0..1]> ) <sp>+
>         <before <year> >
>     }
>
>     rule date { <month> <day> <year> }
>
> }
>
> I don't want to just skip <B> tags wholly, because they do serve a purpose,
> but only in a particular context. (Can <?ws> be changed back to a "default" if
> changed to include html tags?)

Brackets serve as a kind of scoping for modifiers.  We're also
considering that :ws take an argument telling it what to consider to
be whitespeace.  So you could do:

    rule Month :w {
        [ :w(&my_ws) J a n ] # not sure about the &
        # out here we still have the default :w
    }


> ==Question 3==
> I'm also curious about exclusions. Right now, to do a general exclusion, I'm
> thinking I would probably do something like:
> rule text_no_date {
>     {$/ !~ /<date>/ }
>     ^ [.*] $
> }
>
> Would something like below be easier to decode for a human reader?
> text:without(<date>) {
>     ^ [.*] $
> }

Well, if you could define exactly what it means, then perhaps.  Does
that mean that date appears nowhere within the matched text, or that
it just doesn't appear at the beginning.  In either case, you can make
these rules grammar friendly by including your test at the end:

    rule text_no_date {
        (.*)
        { $1 !~ /<date>/ }
    }

Or if you just don't want one at the beginning:

    rule text_no_date {
        <!before <date>> .*
    }

>
> If that adverb were available, then I could have a rule that doesn't include
> two other rules:
> line:without(<date>&&<name>) {
>     ^^ [.*] $$
> }
>
>
> The rule above would match a line with a <date> or <name>, but not a line with
> both.

Huh.  That kind of test really wants a closure.  You can't use the
regex & because that requires that they match at the same place.  You
can't use the logical &&, because <date> isn't an expression.  Of
course, that is unless you include your own parsing rule, but that
isn't recommended.

Luke

Re: Implementation of :w in regexes and other regex questions

Reply via email to