Nathan Wiger wrote:
> 
> > It would be useful (and increasingly more common) to be able to match
> > qr|<\s*(\w+)([^>]*)>| to qr|<\s*/\1\s*>|, and handle the case where those
> > can nest as well.  Something like
> >
> > <list>    match this with
> >    <list>
> >    </list>   not this but
> > </list>   this.
> 
> I suspect this is going to need a ?[ and ?] of its own. I've been
> thinking about this since your email on the subject yesterday, and I
> don't see how either RFC 145 or this alternative method could support
> it, since there are two tags - > and </ - which are paired
> asymmetrically, and neither approach gives any credence to what's
> contained inside the tag. So <tag> would be matched itself as "< matches
> >".

Actually, in one of my responses I did outline a syntax which would
handle this with
reasonably ease, I think.  If the contents of (?[) is considered a
pattern, then you can
define a matching pattern.

Consider either of these.

m:(?[<list>]).*?(?]</list>): 

or

m:(?['<list>' => '</list>').*(?]):    # really ought to include (?i:) in
there, but left out for readablity

or more generically

m:(?['<\w+>' => '</\1>').*(?]):


I'll grant you it's not the simplest syntax, but it's a lot simpler than
using the 5.6 method... :)
> 
> What if we added special XML/HTML-parsing ?< and ?> operators?
> Unfortunately, as Richard notes, ?> is already taken, but I will use it
> for the examples to make things symmetrical.
> 
>    ?<  =  opening tag (with name specified)
>    ?>  =  closing tag (matches based on nesting)
> 
> Your example would simply be:
> 
>    /(?<list)[\s\w]*(?<list)[\s\w]*(?>)[\s\w]*(?>)/;
> 
> What makes me nervous about this is that ?< and ?> seem special-case.
> They are, but then again XML and HTML are also pervasive. So a
> special-case for something like this might not be any stranger than
> having a special-case for sin() and cos() - they're extremely important
> operations.
> 
> The other thing that this doesn't handle is tags with no closing
> counterpart, like:
> 
>    <br>
> 
> Perhaps for these the easiest thing is to tell people not to use ?< and
> ?>:
> 
>    /(?<p)[\s*\w](?:<br>)(?>)/;
> 
> Would match
> 
>    <p>
>       Some stuff<br>
>    </p>
> 
> Finally, tags which take arguments:
> 
>    <div align="center">Stuff</div>
> 
> Would require some type of "this is optional" syntax:
> 
>    /(?<div\s*\w*)Stuff(?>)/
> 
> Perhaps only the first word specified is taken as the tag name? This is
> the XML/HTML spec anyways.
> 
> -Nate

-- 
David Corbin            
Mach Turtle Technologies, Inc.
http://www.machturtle.com
[EMAIL PROTECTED]

Reply via email to