Re: \z vs \Z vs $
In 12839.969393548@chthon, Tom Christiansen writes: :What can be done to make $ work "better", so we don't have to :make people use /foo\z/ to mean /foo$/? They'll keep writing :the $ for things that probably oughtn't abide optional newlines. : :Remember that /$/ really means /(?=\n?\z)/. And likewise with \Z. It might be reasonable to redefine $ to mean the same as \z whenever the /s flag is supplied. Another possibility would be to have a scoped "use re qw/simple_anchor/' pragma to achieve the same. And another would be simply to switch the meaning of $ and \z. None of these feel particularly satisfactory, however, and I think any change to the current semantics would be difficult for existing perl programmers. Perhaps '$$' to mean 'match at end of string (without /m) or at end of any line (with /m)? The p52p6 translator can easily replace references to $$ with ${$}. I can't think of a usefully different meaning for ^^, but as currently defined it will already do the right thing. I don't know what proposals have come out of the other wgs, but if we know when a variable has been read from a line-oriented input medium, then we could turn on the special meaning of $ only in such cases and define it as $$ above in all other cases. I think this would be more confusing, though. We could also consider changing the base definition to (?=($/)?\z), particularly if $/ is to be seen as a regexp. I think I like $$ the best. Hugo
Re: \z vs \Z vs $
"TC" == Tom Christiansen [EMAIL PROTECTED] writes: Could you explain what the problem is? TC /$/ does not only match at the end of the string. TC It also matches one character fewer. This makes TC code like $path =~ /etc$/ "wrong". Sorry, I'm missing it. $_ = "etc\n"; /etc$/; # true $_ = "etc"; /etc$/; # true In what way is this _wrong_? Is it under /m? But then wouldn't longest match cover the situation? And doesn't it only trigger at the end of a string? Within the string it eats the "\n". chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: \z vs \Z vs $
"TC" == Tom Christiansen [EMAIL PROTECTED] writes: Could you explain what the problem is? TC /$/ does not only match at the end of the string. TC It also matches one character fewer. This makes TC code like $path =~ /etc$/ "wrong". Sorry, I'm missing it. I know. On your "longest match", you are committing the classic error of thinking green more important than eagerness. It's not. This is unrelated to /m. Go back and read all the insanities we (mostly gbacon and your truly) went through to fix the 5.6 release's modules. People coded them *WRONG*. Wrong means incorrect behaviour. Sometimes this even leads to security foo. BOTTOM LINE: You cannot use /foo$/ to say "does the string end in `foo'?". You can't do that. You can't even use /s to fix it. It doesn't fix it. This is an annoying gotcha. Larry once said that he wished he had made \Z do what \z now does. One would like $ to (be able to) mean "ONLY AT END OF STRING". --tom EXAMPLE 1: --- /usr/local/lib/perl5/5.00554/File/Basename.pm Mon Jan 4 13:00:53 1999 +++ /usr/local/lib/perl5/5.6.0/File/Basename.pm Sun Mar 12 22:24:29 2000 @@ -37,10 +37,10 @@ "VMS", "MSDOS", "MacOS", "AmigaOS" or "MSWin32", the file specification syntax of that operating system is used in future calls to fileparse(), basename(), and dirname(). If it contains none of -these substrings, UNIX syntax is used. This pattern matching is +these substrings, Unix syntax is used. This pattern matching is case-insensitive. If you've selected VMS syntax, and the file specification you pass to one of these routines contains a "/", -they assume you are using UNIX emulation and apply the UNIX syntax +they assume you are using Unix emulation and apply the Unix syntax rules instead, for that function call only. If the argument passed to it contains one of the substrings "VMS", @@ -73,7 +73,7 @@ =head1 EXAMPLES -Using UNIX file syntax: +Using Unix file syntax: ($base,$path,$type) = fileparse('/virgil/aeneid/draft.book7', '\.book\d+'); @@ -102,7 +102,7 @@ The basename() routine returns the first element of the list produced by calling fileparse() with the same arguments, except that it always quotes metacharacters in the given suffixes. It is provided for -programmer compatibility with the UNIX shell command basename(1). +programmer compatibility with the Unix shell command basename(1). =item Cdirname @@ -111,8 +111,8 @@ second element of the list produced by calling fileparse() with the same input file specification. (Under VMS, if there is no directory information in the input file specification, then the current default device and -directory are returned.) When using UNIX or MSDOS syntax, the return -value conforms to the behavior of the UNIX shell command dirname(1). This +directory are returned.) When using Unix or MSDOS syntax, the return +value conforms to the behavior of the Unix shell command dirname(1). This is usually the same as the behavior of fileparse(), but differs in some cases. For example, for the input file specification Flib/, fileparse() considers the directory name to be Flib/, while dirname() considers the @@ -124,12 +124,22 @@ ## use strict; -use re 'taint'; +# A bit of juggling to insure that Cuse re 'taint'; always works, since +# File::Basename is used during the Perl build, when the re extension may +# not be available. +BEGIN { + unless (eval { require re; }) +{ eval ' sub re::import { $^H |= 0x0010; } ' } + import re 'taint'; +} + + +use 5.005_64; +our(@ISA, @EXPORT, $VERSION, $Fileparse_fstype, $Fileparse_igncase); require Exporter; @ISA = qw(Exporter); @EXPORT = qw(fileparse fileparse_set_fstype basename dirname); -use vars qw($VERSION $Fileparse_fstype $Fileparse_igncase); $VERSION = "2.6"; @@ -162,23 +172,23 @@ if ($fstype =~ /^VMS/i) { if ($fullname =~ m#/#) { $fstype = '' } # We're doing Unix emulation else { - ($dirpath,$basename) = ($fullname =~ /^(.*[:\]])?(.*)/); + ($dirpath,$basename) = ($fullname =~ /^(.*[:\]])?(.*)/s); $dirpath ||= ''; # should always be defined } } if ($fstype =~ /^MS(DOS|Win32)/i) { -($dirpath,$basename) = ($fullname =~ /^((?:.*[:\\\/])?)(.*)/); -$dirpath .= '.\\' unless $dirpath =~ /[\\\/]$/; +($dirpath,$basename) = ($fullname =~ /^((?:.*[:\\\/])?)(.*)/s); +$dirpath .= '.\\' unless $dirpath =~ /[\\\/]\z/; } - elsif ($fstype =~ /^MacOS/i) { -($dirpath,$basename) = ($fullname =~ /^(.*:)?(.*)/); + elsif ($fstype =~ /^MacOS/si) { +($dirpath,$basename) = ($fullname =~ /^(.*:)?(.*)/s); } elsif ($fstype =~ /^AmigaOS/i) { -($dirpath,$basename) = ($fullname =~ /(.*[:\/])?(.*)/); +($dirpath,$basename) = ($fullname =~ /(.*[:\/])?(.*)/s); $dirpath = './' unless $dirpath; } elsif ($fstype !~ /^VMS/i) { # default to Unix -($dirpath,$basename) = ($fullname =~ m#^(.*/)?(.*)#); +($dirpath,$basename) =
Re: \z vs \Z vs $
On Wed, 20 Sep 2000 10:03:08 +0100, Hugo wrote: In 12839.969393548@chthon, Tom Christiansen writes: :What can be done to make $ work "better", so we don't have to :make people use /foo\z/ to mean /foo$/? They'll keep writing :the $ for things that probably oughtn't abide optional newlines. Gee you just beat me to this one. My first thought was: add a new modifier. It might be reasonable to redefine $ to mean the same as \z whenever the /s flag is supplied. That was my second thought. I kinda like it, because //s would have two effects: + let . match a newline too (current) + let /$/ NOT accept a trailing newline (new) This combines into: = treat "\n" as an ordinary character That's why I like it. -- Bart.
Re: \z vs \Z vs $
That was my second thought. I kinda like it, because //s would have two effects: + let . match a newline too (current) + let /$/ NOT accept a trailing newline (new) Don't forget /s's other meaning. --tom
Re: \z vs \Z vs $
Tom Christiansen wrote: Don't forget /s's other meaning. Do you enjoy making people ask what you're talking about? What other meaning did you have in mind, overriding $*? -- Robert Mathews Software Engineer Excite@Home
perl6-language-regex summary for 20000920
perl6-language-regex Summary report 2920 Mark-Jason Dominus has relinquished the wg chair due to the pressure of other commitments; I'll be taking over the chair for the short time remaining. Thanks to Mark-Jason for all his hard work. I'll be contacting the authors of all outstanding RFCs shortly to encourage them to work towards freezing them as soon as practical. Hugo RFC 72: The regexp engine should go backward as well as forward. (Peter Heslin) Peter says (edited): :If the regexp code is unlikely to be rewritten from the ground up, then :there may be little chance of this feature being implemented. I'll make :a pitch for it anyway at the end of my talk at YAPC::Europe, and then :I'll freeze the RFC. RFC 93: Regex: Support for incremental pattern matching (Damian Conway) Now frozen at v3 with no changes; I don't think there was a v2. RFC 110: counting matches (Richard Proctor) Richard added my suggestions about the interaction between /t, /g and \G, and froze the RFC soon after. RFC 112: Assignment within a regex (Richard Proctor) No discussion. RFC 138: Eliminate =~ operator. (Steve Fink) Withdrawn. RFC 144: Behavior of empty regex should be simple (Mark Dominus) Frozen. RFC 145: Brace-matching for Perl Regular Expressions (Eric Roode) No discussion directly about this RFC. The discussion of XML/HTML- -specific extensions continued for a short while, but has not resulted in an RFC. The closest we have to an emerging consensus appears to be that it is very difficult to pin down a precise problem to solve - the areas in which we want to match pairs of delimiters (such as numeric expressions, C code, perl code, HTML and XML) each seem to require a variety of special cases, each different from the other. RFC 150: Extend regex syntax to provide for return of a hash of matched subpatterns (Kevin Walker) One suggestion from me of (?\%key) for backreferencing, but no substantive discussion. RFC 158: Regular Expression Special Variables (Uri Guttman) No discussion. RFC 164: Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade() (Nathan Wiger) This RFC has now been frozen; the frozen version included some rewording and a couple of additional explanatory notes, as well as introducing a typo ('$gotis') in an example. RFC 165: Allow variables in tr/// (Richard Proctor) Surprisingly, no discussion. RFC 166: Alternative lists and quoting of things (Richard Proctor) New version, with a new name (was 'Additions to regexs'). This RFC is not currently available from the archive due to a misfiling, but you'll find it here: http://www.mail-archive.com/perl6-language-regex@perl.org/msg00350.html This removes two of the three original suggestions, and expands on the remaining one. Mark-Jason pointed out that the (new) extension to (?\Q$foo) is not needed. RFC 170: Generalize =~ to a special-purpose assignment operator (Nathan Wiger) Now frozen, with some modifications. RFC 197: Numberic Value Ranges In Regular Expressions (David Nichol) No discussion. RFC 198: Boolean Regexes (Richard Proctor) No discussion. New RFCS Of the other discussions that may still spawn a new RFC, most have been mentioned previously. One new one: Tom Christiansen has asked '[w]hat can be done to make $ work "better", so we don't have to make people use /foo\z/ to mean /foo$/'.
RFC 110 (v6) counting matches
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE counting matches =head1 VERSION Maintainer: Richard Proctor [EMAIL PROTECTED] Date: 16 Aug 2000 Last Modified: 20 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 110 Version: 6 Status: Frozen =head1 ABSTRACT Provide a simple way of giving a count of matches of a pattern. =head1 DESCRIPTION Have you ever wanted to count the number of matches of a patten? s///g returns the number of matches it finds. m//g just returns 1 for matching. Counts can be made using s//$/g but this is wastefull, or by putting some counting loop round a m//g. But this all seams rather messy. TomC (and a couple of others) have said that it can also be done as : $count = () = $string =~ /pattern/g; However many people do not like this construct, here are a couple of quotes: jhi: Which I find cute as a demonstration of the Perl's context concept, but ugly as hell from usability viewpoint. Bart Lateur: '()=' is not perfect. It is also butt ugly. It is a "dirty hack". This construct is also likely to be inefficient as perl will have to build up a list of all the matches, store them somewhere, count them, then throw them away. Therefore I would like a way of counting matches. =head2 Proposal m//gt (or m//t see below) would be defined to do the match, and return the count of matches, this leaves all existing uses consistent and unaffected. /t is suggested for "counT", as /c is already taken. Relationship of m//t and m//g - there are three possibilities, my original: m//gt, where /t adds counting to a group match (/t without /g would just return 0 or 1). However \G loses its meaning. The Alternative By Uri : m//t and m//g are mutually exclusive and m//gt should be regarded as an error. Hugo: I like this too. I'd suggest /t should mean a) return a scalar of the number of matches and b) don't set any special variables. Then /t without /g would return 0 or 1, but be faster since no extra information need be captured (except internally for (.)\1 type matching - compile time checks could determine if these are needed, though (?{..}) and (??{..}) patterns would require disabling of that optimisation). /tg would give a scalar count of the total number of matches. \G would retain its meaning. I think Hugo's wording about the relationship makes the best sense, and this is the suggested way forward. =head1 CHANGES RFC110 V1 - Original posting to perl6-language RFC110 V2 - Reposted to perl6-language-regex RFC110 V3 - Added Uri's alternitive m//t RFC110 V4 - Added notes about $count = () = $string =~ /pattern/g RFC110 V5 - Added Hugo's wording about /g and /t relationship, suggested this is the way forward. RFC110 V6 - Frozen =head1 IMPLENTATION Hugo: Implementation should be fairly straightforward, though ensuring that optimisations occurred precisely when they are safe would probably involve a few bug-chasing cycles. =head1 REFERENCES I brought this up on p5p a couple of years ago, but it was lost in the noise...