Re: RFC 145 (alternate approach)
I'd suggest also, that (?[) (with no specified brackets) have the default meaning of the "four standard brackets" : (?['('=')','{'='}','['=']',''='') Note also the subtle syntax change. We are either dealing with strings or with patterns. The consensus seems to be against patterns (I can understand that). Given that, we need to quote the right hand side of the = operator I think. The quotes on the left side would be optional, I think. Richard Proctor wrote: On Tue 05 Sep, Nathan Wiger wrote: Eric Roode wrote: Now *that* sounds cool, I like it! What if the RFC only suggested the addition of two new constructs, (?[) and (?]), which did nested matches. The rest would be bound by standard regex constructs and your imagination! That is, the ?] simply takes whatever the closest ?[ matched and reverses it, verbatim, including ordering, case, and number of characters. The only trick would be a way to get what "reverses it" means correct. No ?] should match the closest ?[ it should nest the ?[s bound by any brackets in the regex and act accordingly. Also this does not work as a definition of simple bracket matching as you need ( to match ) not ( to match (. A ?[ list should specify for each element what the matching element is perhaps (?[( = ),{ = }, 01 = 10) sort of hashish in style. Perhaps the brackets could be defined as a hash allowing (?[%Hash) Richard -- [EMAIL PROTECTED] -- David Corbin Mach Turtle Technologies, Inc. http://www.machturtle.com [EMAIL PROTECTED]
Re: XML/HTML-specific ? and ? operators? (was Re: RFC 145 (alternate approach))
Nathan Wiger wrote: It would be useful (and increasingly more common) to be able to match qr|\s*(\w+)([^]*)| to qr|\s*/\1\s*|, and handle the case where those can nest as well. Something like listmatch this with list /list not this but /list this. I suspect this is going to need a ?[ and ?] of its own. I've been thinking about this since your email on the subject yesterday, and I don't see how either RFC 145 or this alternative method could support it, since there are two tags - and / - which are paired asymmetrically, and neither approach gives any credence to what's contained inside the tag. So tag would be matched itself as " matches ". Actually, in one of my responses I did outline a syntax which would handle this with reasonably ease, I think. If the contents of (?[) is considered a pattern, then you can define a matching pattern. Consider either of these. m:(?[list]).*?(?]/list): or m:(?['list' = '/list').*(?]):# really ought to include (?i:) in there, but left out for readablity or more generically m:(?['\w+' = '/\1').*(?]): I'll grant you it's not the simplest syntax, but it's a lot simpler than using the 5.6 method... :) What if we added special XML/HTML-parsing ? and ? operators? Unfortunately, as Richard notes, ? is already taken, but I will use it for the examples to make things symmetrical. ? = opening tag (with name specified) ? = closing tag (matches based on nesting) Your example would simply be: /(?list)[\s\w]*(?list)[\s\w]*(?)[\s\w]*(?)/; What makes me nervous about this is that ? and ? seem special-case. They are, but then again XML and HTML are also pervasive. So a special-case for something like this might not be any stranger than having a special-case for sin() and cos() - they're extremely important operations. The other thing that this doesn't handle is tags with no closing counterpart, like: br Perhaps for these the easiest thing is to tell people not to use ? and ?: /(?p)[\s*\w](?:br)(?)/; Would match p Some stuffbr /p Finally, tags which take arguments: div align="center"Stuff/div Would require some type of "this is optional" syntax: /(?div\s*\w*)Stuff(?)/ Perhaps only the first word specified is taken as the tag name? This is the XML/HTML spec anyways. -Nate -- David Corbin Mach Turtle Technologies, Inc. http://www.machturtle.com [EMAIL PROTECTED]
Re: XML/HTML-specific ? and ? operators? (was Re: RFC 145 (alternate approach))
Jonathan Scott Duff wrote: On Wed, Sep 06, 2000 at 08:40:37AM -0700, Nathan Wiger wrote: What if we added special XML/HTML-parsing ? and ? operators? What if we just provided deep enough hooks into the RE engine that specialized parsing constructs like these could easily be added by those who need them? In principle, that's a very Perlish thing to do... -Scott -- Jonathan Scott Duff [EMAIL PROTECTED] -- David Corbin Mach Turtle Technologies, Inc. http://www.machturtle.com [EMAIL PROTECTED]
Re: RFC 145 (alternate approach)
Nathan Wiger wrote: I think it's cool too, I don't like the @^g and ^@G either. But I worry about the double-meaning of the []'s in your solution, and the fact that these: /\m[...]...\M/; /\d[...]...\D/; Well, it's not really a double meaning. It's a set of characters, just like '[]' always means. Granted, the meaning between upper lower case characters is not the same here, but I don't think it always is the same currently (positive/negative). Will work so differently. Maybe another character like ()'s that takes a list: /\m(,[).*?\M(,])/; If you don't want to use [] (which limits it to single character "para-brace-ets"), then I"d suggest using {} as that is already established for use in with \? type escapes. Maybe: m/\m{()|(\[)}.*?\M{()|(])}/; Essentially everything inside the {} is in-fact another pattern, and the back-references within match "1-for-1". Of course, with this syntax you'd have to escape actual braces m{\{} which I don't much care for... That solves the multiple characters problem at least. However, we still have a \M and \m, which isn't consistent if they're going to take arguments. I'm not sure I understand your point here. But, how about a new ?m operator? /(?m|[).*?(?M|])/; Let's combine yor operator with my example from above where everything inside the (?m) or the ?(M) fits the syntax of a RE. /(?m()|\[).*?(?M()|(\])) Then the ?M matches pairs with the previous ?m, if there was one that was matched. The | character separates or'ed sets consistent with other regex patterns. You can do that, or you can say it's done with backreferences (as noted above) -Nate David Corbin wrote: I never saw one comment on this, and the more I think about it, the more I like it. So, I thought I'd throw it back out one more time...(If I get no comments this time, I'll be quiet :) David Corbin wrote: I haven't given this a WHOLE lot of thought, so please, shoot it full of holes. I certainly like the goal of this RFC, but I dislike the idea that the specification for what chacters are going to match are specified outside of the RE. -- David Corbin Mach Turtle Technologies, Inc. http://www.machturtle.com [EMAIL PROTECTED]