Re: \z vs \Z vs $

2000-09-20 Thread Hugo

In 12839.969393548@chthon, Tom Christiansen writes:
:What can be done to make $ work "better", so we don't have to
:make people use /foo\z/ to mean /foo$/?  They'll keep writing
:the $ for things that probably oughtn't abide optional newlines.
:
:Remember that /$/ really means /(?=\n?\z)/. And likewise with \Z.

It might be reasonable to redefine $ to mean the same as \z whenever
the /s flag is supplied. Another possibility would be to have a
scoped "use re qw/simple_anchor/' pragma to achieve the same. And
another would be simply to switch the meaning of $ and \z.

None of these feel particularly satisfactory, however, and I think
any change to the current semantics would be difficult for existing
perl programmers.

Perhaps '$$' to mean 'match at end of string (without /m) or at end
of any line (with /m)? The p52p6 translator can easily replace
references to $$ with ${$}. I can't think of a usefully different
meaning for ^^, but as currently defined it will already do the
right thing.

I don't know what proposals have come out of the other wgs, but if
we know when a variable has been read from a line-oriented input
medium, then we could turn on the special meaning of $ only in such
cases and define it as $$ above in all other cases. I think this
would be more confusing, though.

We could also consider changing the base definition to (?=($/)?\z),
particularly if $/ is to be seen as a regexp.

I think I like $$ the best.

Hugo



Re: \z vs \Z vs $

2000-09-20 Thread Chaim Frenkel

 "TC" == Tom Christiansen [EMAIL PROTECTED] writes:

 Could you explain what the problem is?

TC /$/ does not only match at the end of the string.
TC It also matches one character fewer.  This makes
TC code like $path =~ /etc$/ "wrong".

Sorry, I'm missing it.

$_ = "etc\n";   /etc$/; # true
$_ = "etc"; /etc$/; # true

In what way is this _wrong_?

Is it under /m? But then wouldn't longest match cover the situation?

And doesn't it only trigger at the end of a string? Within the string
it eats the "\n".

chaim
-- 
Chaim FrenkelNonlinear Knowledge, Inc.
[EMAIL PROTECTED]   +1-718-236-0183




Re: \z vs \Z vs $

2000-09-20 Thread Tom Christiansen

 "TC" == Tom Christiansen [EMAIL PROTECTED] writes:

 Could you explain what the problem is?

TC /$/ does not only match at the end of the string.
TC It also matches one character fewer.  This makes
TC code like $path =~ /etc$/ "wrong".

Sorry, I'm missing it.

I know.  

On your "longest match", you are committing the classic error of thinking
green more important than eagerness.  It's not.

This is unrelated to /m.

Go back and read all the insanities we (mostly gbacon and your
truly) went through to fix the 5.6 release's modules.  People coded
them *WRONG*.  Wrong means incorrect behaviour.  Sometimes this
even leads to security foo.

BOTTOM LINE: You cannot use /foo$/ to say "does the string end in `foo'?".
You can't do that.  You can't even use /s to fix it.  It doesn't fix it.

This is an annoying gotcha.  Larry once said that he wished he had made  \Z
do what \z now does.  One would like $ to (be able to) mean "ONLY AT END OF
STRING".

--tom

EXAMPLE 1:

--- /usr/local/lib/perl5/5.00554/File/Basename.pm   Mon Jan  4 13:00:53 1999
+++ /usr/local/lib/perl5/5.6.0/File/Basename.pm Sun Mar 12 22:24:29 2000
@@ -37,10 +37,10 @@
 "VMS", "MSDOS", "MacOS", "AmigaOS" or "MSWin32", the file specification 
 syntax of that operating system is used in future calls to 
 fileparse(), basename(), and dirname().  If it contains none of
-these substrings, UNIX syntax is used.  This pattern matching is
+these substrings, Unix syntax is used.  This pattern matching is
 case-insensitive.  If you've selected VMS syntax, and the file
 specification you pass to one of these routines contains a "/",
-they assume you are using UNIX emulation and apply the UNIX syntax
+they assume you are using Unix emulation and apply the Unix syntax
 rules instead, for that function call only.
 
 If the argument passed to it contains one of the substrings "VMS",
@@ -73,7 +73,7 @@
 
 =head1 EXAMPLES
 
-Using UNIX file syntax:
+Using Unix file syntax:
 
 ($base,$path,$type) = fileparse('/virgil/aeneid/draft.book7',
'\.book\d+');
@@ -102,7 +102,7 @@
 The basename() routine returns the first element of the list produced
 by calling fileparse() with the same arguments, except that it always
 quotes metacharacters in the given suffixes.  It is provided for
-programmer compatibility with the UNIX shell command basename(1).
+programmer compatibility with the Unix shell command basename(1).
 
 =item Cdirname
 
@@ -111,8 +111,8 @@
 second element of the list produced by calling fileparse() with the same
 input file specification.  (Under VMS, if there is no directory information
 in the input file specification, then the current default device and
-directory are returned.)  When using UNIX or MSDOS syntax, the return
-value conforms to the behavior of the UNIX shell command dirname(1).  This
+directory are returned.)  When using Unix or MSDOS syntax, the return
+value conforms to the behavior of the Unix shell command dirname(1).  This
 is usually the same as the behavior of fileparse(), but differs in some
 cases.  For example, for the input file specification Flib/, fileparse()
 considers the directory name to be Flib/, while dirname() considers the
@@ -124,12 +124,22 @@
 
 
 ## use strict;
-use re 'taint';
+# A bit of juggling to insure that Cuse re 'taint'; always works, since
+# File::Basename is used during the Perl build, when the re extension may
+# not be available.
+BEGIN {
+  unless (eval { require re; })
+{ eval ' sub re::import { $^H |= 0x0010; } ' }
+  import re 'taint';
+}
+
+
 
+use 5.005_64;
+our(@ISA, @EXPORT, $VERSION, $Fileparse_fstype, $Fileparse_igncase);
 require Exporter;
 @ISA = qw(Exporter);
 @EXPORT = qw(fileparse fileparse_set_fstype basename dirname);
-use vars qw($VERSION $Fileparse_fstype $Fileparse_igncase);
 $VERSION = "2.6";
 
 
@@ -162,23 +172,23 @@
   if ($fstype =~ /^VMS/i) {
 if ($fullname =~ m#/#) { $fstype = '' }  # We're doing Unix emulation
 else {
-  ($dirpath,$basename) = ($fullname =~ /^(.*[:\]])?(.*)/);
+  ($dirpath,$basename) = ($fullname =~ /^(.*[:\]])?(.*)/s);
   $dirpath ||= '';  # should always be defined
 }
   }
   if ($fstype =~ /^MS(DOS|Win32)/i) {
-($dirpath,$basename) = ($fullname =~ /^((?:.*[:\\\/])?)(.*)/);
-$dirpath .= '.\\' unless $dirpath =~ /[\\\/]$/;
+($dirpath,$basename) = ($fullname =~ /^((?:.*[:\\\/])?)(.*)/s);
+$dirpath .= '.\\' unless $dirpath =~ /[\\\/]\z/;
   }
-  elsif ($fstype =~ /^MacOS/i) {
-($dirpath,$basename) = ($fullname =~ /^(.*:)?(.*)/);
+  elsif ($fstype =~ /^MacOS/si) {
+($dirpath,$basename) = ($fullname =~ /^(.*:)?(.*)/s);
   }
   elsif ($fstype =~ /^AmigaOS/i) {
-($dirpath,$basename) = ($fullname =~ /(.*[:\/])?(.*)/);
+($dirpath,$basename) = ($fullname =~ /(.*[:\/])?(.*)/s);
 $dirpath = './' unless $dirpath;
   }
   elsif ($fstype !~ /^VMS/i) {  # default to Unix
-($dirpath,$basename) = ($fullname =~ m#^(.*/)?(.*)#);
+($dirpath,$basename) = 

Re: \z vs \Z vs $

2000-09-20 Thread Bart Lateur

On Wed, 20 Sep 2000 10:03:08 +0100, Hugo wrote:

In 12839.969393548@chthon, Tom Christiansen writes:
:What can be done to make $ work "better", so we don't have to
:make people use /foo\z/ to mean /foo$/?  They'll keep writing
:the $ for things that probably oughtn't abide optional newlines.

Gee you just beat me to this one.

My first thought was: add a new modifier.

It might be reasonable to redefine $ to mean the same as \z whenever
the /s flag is supplied.

That was my second thought. I kinda like it, because //s would have two
effects:

 + let . match a newline too (current)

 + let /$/ NOT accept a trailing newline (new)

This combines into:

 = treat "\n" as an ordinary character


That's why I like it.

-- 
Bart.



Re: \z vs \Z vs $

2000-09-20 Thread Tom Christiansen

That was my second thought. I kinda like it, because //s would have two
effects:

 + let . match a newline too (current)

 + let /$/ NOT accept a trailing newline (new)

Don't forget /s's other meaning.

--tom



Re: \z vs \Z vs $

2000-09-20 Thread Robert Mathews

Tom Christiansen wrote:
 Don't forget /s's other meaning.

Do you enjoy making people ask what you're talking about?  What other
meaning did you have in mind, overriding $*?

-- 
Robert Mathews
Software Engineer
Excite@Home



perl6-language-regex summary for 20000920

2000-09-20 Thread Hugo

perl6-language-regex

Summary report 2920

Mark-Jason Dominus has relinquished the wg chair due to the pressure
of other commitments; I'll be taking over the chair for the short
time remaining. Thanks to Mark-Jason for all his hard work.

I'll be contacting the authors of all outstanding RFCs shortly to
encourage them to work towards freezing them as soon as practical.

Hugo


RFC 72: The regexp engine should go backward as well as
forward. (Peter Heslin)

Peter says (edited):
:If the regexp code is unlikely to be rewritten from the ground up, then
:there may be little chance of this feature being implemented. I'll make
:a pitch for it anyway at the end of my talk at YAPC::Europe, and then
:I'll freeze the RFC.

RFC 93: Regex: Support for incremental pattern matching  (Damian Conway)

Now frozen at v3 with no changes; I don't think there was a v2.

RFC 110: counting matches  (Richard Proctor)

Richard added my suggestions about the interaction between /t, /g
and \G, and froze the RFC soon after.

RFC 112: Assignment within a regex  (Richard Proctor)

No discussion.

RFC 138: Eliminate =~ operator.  (Steve Fink)

Withdrawn.

RFC 144: Behavior of empty regex should be simple  (Mark Dominus)

Frozen.

RFC 145: Brace-matching for Perl Regular Expressions  (Eric Roode)

No discussion directly about this RFC. The discussion of XML/HTML-
-specific extensions continued for a short while, but has not
resulted in an RFC.

The closest we have to an emerging consensus appears to be that
it is very difficult to pin down a precise problem to solve - the
areas in which we want to match pairs of delimiters (such as
numeric expressions, C code, perl code, HTML and XML) each seem
to require a variety of special cases, each different from the
other.

RFC 150: Extend regex syntax to provide for return of a hash of
 matched subpatterns  (Kevin Walker)

One suggestion from me of (?\%key) for backreferencing, but no
substantive discussion.

RFC 158: Regular Expression Special Variables  (Uri Guttman)

No discussion.

RFC 164: Replace =~, !~, m//, s///, and tr// with match(), subst(),
 and trade()  (Nathan Wiger)

This RFC has now been frozen; the frozen version included some
rewording and a couple of additional explanatory notes, as well
as introducing a typo ('$gotis') in an example.

RFC 165: Allow variables in tr///  (Richard Proctor)

Surprisingly, no discussion.

RFC 166: Alternative lists and quoting of things  (Richard Proctor)

New version, with a new name (was 'Additions to regexs'). This RFC
is not currently available from the archive due to a misfiling, but
you'll find it here:
  http://www.mail-archive.com/perl6-language-regex@perl.org/msg00350.html

This removes two of the three original suggestions, and expands on
the remaining one. Mark-Jason pointed out that the (new) extension
to (?\Q$foo) is not needed.

RFC 170: Generalize =~ to a special-purpose assignment operator
 (Nathan Wiger)

Now frozen, with some modifications.

RFC 197: Numberic Value Ranges In Regular Expressions (David Nichol)
  
No discussion.

RFC 198: Boolean Regexes (Richard Proctor)

No discussion.

New RFCS

Of the other discussions that may still spawn a new RFC, most have been
mentioned previously. One new one: Tom Christiansen has asked '[w]hat
can be done to make $ work "better", so we don't have to make people
use /foo\z/ to mean /foo$/'.



RFC 110 (v6) counting matches

2000-09-20 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

counting matches

=head1 VERSION

  Maintainer: Richard Proctor [EMAIL PROTECTED]
  Date: 16 Aug 2000
  Last Modified: 20 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 110
  Version: 6
  Status: Frozen

=head1 ABSTRACT

Provide a simple way of giving a count of matches of a pattern.

=head1 DESCRIPTION

Have you ever wanted to count the number of matches of a patten?  s///g 
returns the number of matches it finds.  m//g just returns 1 for matching.
Counts can be made using s//$/g but this is wastefull, or by putting some 
counting loop round a m//g.  But this all seams rather messy. 

TomC (and a couple of others) have said that it can also be done as :
$count = () = $string =~ /pattern/g;

However many people do not like this construct, here are a couple of quotes:

jhi: Which I find cute as a demonstration of the Perl's context concept,
but ugly as hell from usability viewpoint.  

Bart Lateur: '()=' is not perfect. It is also butt ugly. It is a "dirty hack".

This construct is also likely to be inefficient as perl will have to
build up a list of all the matches, store them somewhere, count them, then
throw them away.

Therefore I would like a way of counting matches.

=head2 Proposal

m//gt (or m//t see below) would be defined to do the match, and return the
count of matches, this leaves all existing uses consistent and unaffected.
/t is suggested for "counT", as /c is already taken.

Relationship of m//t and m//g - there are three possibilities, my original:

m//gt, where /t adds counting to a group match (/t without /g would just
return 0 or 1).  However \G loses its meaning.

The Alternative By Uri :

m//t and m//g are mutually exclusive and m//gt should be regarded as an error.

Hugo:

 I like this too. I'd suggest /t should mean a) return a scalar of
 the number of matches and b) don't set any special variables. Then
 /t without /g would return 0 or 1, but be faster since no extra
 information need be captured (except internally for (.)\1 type
 matching - compile time checks could determine if these are needed,
 though (?{..}) and (??{..}) patterns would require disabling of
 that optimisation). /tg would give a scalar count of the total
 number of matches. \G would retain its meaning.

I think Hugo's wording about the relationship makes the best sense, and
this is the suggested way forward.

=head1 CHANGES

RFC110 V1 - Original posting to perl6-language

RFC110 V2 - Reposted to perl6-language-regex

RFC110 V3 - Added Uri's alternitive m//t

RFC110 V4 - Added notes about $count = () = $string =~ /pattern/g

RFC110 V5 - Added Hugo's wording about /g and /t relationship, suggested this
is the way forward.

RFC110 V6 - Frozen

=head1 IMPLENTATION

Hugo:
 Implementation should be fairly straightforward,
 though ensuring that optimisations occurred precisely when they
 are safe would probably involve a few bug-chasing cycles.


=head1 REFERENCES

I brought this up on p5p a couple of years ago, but it was lost in the noise...