RFC 166 (disambiguator)

2000-08-29 Thread Mark-Jason Dominus


Richard Proctor suggests that (?) will match the empty string. 
Then it can be inserted into regexes to separate elements that need to
be separated.  For example, /$foo(?)bar/ interpolates the value of
$foo and then looks for that pattern followed by 'bar'.   You cannot
simply write /$foobar/ because then Perl tries to interpolate $foobar,
which is not what you wanted.

1. You can already write /${foo}bar/ to get what you wanted.  This
   solution already works inside of double-quoted strings.  (?) would
   not work inside of double-quoted strings.

2. You can already write /$foo(?:)bar/ to get what you wanted.  This
   is almost identical to what Richard proposed anyway.

It is really not clear to me that this problem needs to be solved any
better than it is already.

I suggest that this section be removed from the RFC.

Mark-Jason Dominus   [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.




Re: RFC 110 (v3) counting matches

2000-08-29 Thread Mark-Jason Dominus


 On Mon, 28 Aug 2000, Mark-Jason Dominus wrote:
 
  But there is no convenient way to run the loop once for each date and
  split the dates into pieces:
  
  # WRONG
  while (($mo, $dy, $yr) = ($string =~ /(\d\d)-(\d\d)-(\d\d)/g)) {
...
  }
 
 What I use in a script of mine is:
 
 while ($string =~ /(\d\d)-(\d\d)-(\d\d)/g) {
 ($mo, $dy, $yr) = ($1, $2, $3);
 }
 
 Although this, of course, also requires that you know the number of
 backreferences. 

The real problem I was trying to discuss was not this particular
application.  I was trying to point out a larger problem, which is
that there are several regex features that are enabled or disabled
depending on what context the match is in, so that if you want one
scalar-context feature and one list-context feature at the same time,
there is no direct way to do it.

 Nicer would be to be able to assign from @matchdata or something
 like that :)

I agree.  There are many operations that would be simpler if there was
a magic array that contained ($1, $2, $3, ...).  If anyone wants to
write an RFC on this, I will help.




Re: RFC 110 (v2) counting matches

2000-08-29 Thread Mark-Jason Dominus


 On Tue, 29 Aug 2000 08:47:25 -0400, Mark-Jason Dominus wrote:
 
 m/.../Count,Insensitive   (instead of m/.../ti)
 
 That would escape the problem that we are running out of letters and
 also the problem that the current letters are hard to remember.
 
 Yes, but wouldn't this give us backward compatibility problems? For
 example, code like
 
   $result = m/(.)/Insensitive, ord $1;

No, because that is presently a syntax error.  The one you have to
watch out for is:

$result = m/(.)/s,Insensitive, ord $1;

 And, I don't really see the need for the comma.
 
 m/.../CountInsensitive   (instead of m/.../ti)

I guess, but to me CountInsensitive looks like one option, not two.




Re: RFC 110 (v3) counting matches

2000-08-29 Thread Bart Lateur

On Tue, 29 Aug 2000 08:51:29 -0400, Mark-Jason Dominus wrote:

There are many operations that would be simpler if there was
a magic array that contained ($1, $2, $3, ...).  If anyone wants to
write an RFC on this, I will help.

Heh. I once complained about the lack of such an array, in
comp.lang.perl.misc, *years* ago.

My practical problem was something like this, in a translation program.
$phrase is one of many patterns in a table, to look for English phrases,
%translate contains the French translations. interpolate() is a sub that
fills in the parameters -- the numbers in the string):

$_ = "It is 5 past 10." 
$phrase = 'it is (\d+) past (\d+)';
s/^$phrase/interpolate($translate{$phrase}, $1, $2)/ie;


The problem is that with variable patterns, you *don't know* how many
paren groups there are.

The solution they came upo with, was @+ and @-. I still can't work with
those. An array of matches, (e.g. @) would be a lot easier. It could
also be a lot slower; see the discussion on $ for this. (mystery: how
can filling in $ be a lot slower than filling in $1?)

-- 
Bart.



Re: RFC 110 (v2) counting matches

2000-08-29 Thread Bart Lateur

On Tue, 29 Aug 2000 09:00:43 -0400, Mark-Jason Dominus wrote:

 And, I don't really see the need for the comma.
 
 m/.../CountInsensitive   (instead of m/.../ti)

I guess, but to me CountInsensitive looks like one option, not two.

That goes fot this too.

:   m/.../iCount  (instead of m/.../it)

-- 
Bart.



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

That empty list to force the proper context irks me.  How about a
modifier to the RE that forces it (this would solve the "counting matches"
problem too).

   $string =~ m{
   (\d\d) - (\d\d) - (\d\d)
   (?{ push @dates, makedate($1,$2,$3) })
   }gxl;

   $count = $string =~ m/foo/gl;   # always list context

The reason why not is because you're adding a special case hack to 
one particular place, rather than promoting a general mechanism
that can be everywhere.  

Tell me: which is better and why.

1) A regex switch to specify scalar context, as in a mythical /r:

push(@got, /bar/r)

2) A general mechanism, say for example, "scalar":

push(@got, scalar /bar/)

Obviously the "scalar" is better, because it does not require that
a new switch be learnt, nor is its use restricted to pattern matching.
Furthermore, it's inarguably more mnemonic for the sense of "match this
scalarishly".

Likewise, to force list context (a far less common operation, mind
you), it is a bad idea to have what amounts to a special argument
to just one function to this.  What happens to the next function you
want to do this to?  How about if I want to force getpwnam() into list
context and get back a scalar result?

$count = getpwnam("tchrist")/l;
$count = getpwnam("tchrist", LIST);
$count = getpwnam("tchrist")-as_list;

All of those, frankly, suck.  This is much better:

$count = () = getpwnam("tchrist");

It's better because 

  * You don't have to invent anything new, whether syntactically
or mnemonically.  The sucky solution all require modification
of Perl's very syntax.  With the list assignment, you just need
to learn how to use what you *already have*.  I could say as
much for (?{...}).  Think how many of the suggestions on these
lists can be dealt with simply through using existing features
that the suggesting party was unaware of.

  * It's a general mechanism that isn't tailored for this particular
function call.  Special-purpose solutions are often inferior
to general-purpose ones, because the latter are more likely to 
be creatively usable in a fashion unforeseen by the author.

  * What could possibly be more intuitive for the action of acting
as though one were assigning to a list than doing that very
thing itself?  Since () is the canonical list (it's empty, after
all), this follows directly and requires on special knowledge
whatsoever.

--tom



Re: RFC 110 (v2) counting matches

2000-08-29 Thread Tom Christiansen

If we want to use uppercase, make these unique as well. That gives us
many more combinations, and is not necessarily confusing:

   m//f  -  fast match
   m//F  -  first match
   m//i  -  case-insentitive
   m//I  -  ignore whitespace
   
And so on. This seems like a much more productive use, otherwise we're
just wasting characters.

Larry's on record as preferring not to have us going down the road
of using distinct upper and lower case regex switches.  The distance
between //c and //C, say, is far too narrow.

--tom



Overlapping RFCs 135 138 164

2000-08-29 Thread Mark-Jason Dominus


RFC135: Require explicit m on matches, even with ?? and // as delimiters.

C?...? and C/.../ are what makes Perl hard to tokenize.
Requiring them to be written Cm?...? and Cm/.../ would
solve this.

(Nathan Torkington)

RFC138: Eliminate =~ operator.

Replace EXPR =~ m/.../ with m/.../ EXPR, and similarly for
s/// and tr///. Force an explicit dereference when using
qr/.../. Disallow the implicit treatment of a string as a
regular expression to match against.

(Steve Fink)

RFC164: Replace =~, !~, m//, and s/// with match() and subst()

Several people (including Larry) have expressed a desire to
get rid of C=~ and C!~. This RFC proposes a way to replace
Cm// and Cs/// with two new builtins, Cmatch() and
Csubst().

(Nathan Widger)


I would like to see these three RFCs merged into one if this is
appropriate.  I am calling on the three authors to discuss in private
email how this may be done.  I hope that the discussion will result in
the withdrawal at least two of the three RFCs, and that this private
discussion produces a new RFC.  The new RFC should discuss the points
raised by all three existing RFCs, should investigate several
solutions in parallel, and should compare them with one another and
contrast the benefits and drawbacks of each one.





Mark-Jason Dominus   [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.




Re: Overlapping RFCs 135 138 164

2000-08-29 Thread Nathan Wiger

Mark-Jason Dominus wrote:
 
 RFC135: Require explicit m on matches, even with ?? and // as delimiters.

This one is along a different line from these two:

 RFC138: Eliminate =~ operator.
 
 RFC164: Replace =~, !~, m//, and s/// with match() and subst()

Which I could see unifying. I'd ask people to wait until v2 of RFC 164
comes up. It may well include everything from RFC 138 already.

-Nate



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Nathan Wiger

Mark-Jason Dominus wrote:

 I think the reason this hasn't been done before it because it's *not*
 quite straightforward.

Before everyone gets tunnel vision, let me point out one thing:
Accepting variables in tr// makes no sense. It defeats the purpose of
tr/// - extremely fast, known transliterations.

tr///e is the same as s///g:

tr/$foo/$bar/e  ==  s/$foo/$bar/g

I don't think this RFC accomplishes anything, personally.

-Nate



Re: RFC 110 (v2) counting matches

2000-08-29 Thread David L. Nicol

Mark-Jason Dominus wrote:
 
 It occurs to me that since none of the capital letters are taken, we
 could adopt the convention that a capital letter as a regex modifier
 will introduce a *word* which continues up to the next comma. 


Excelsior!


-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
   Yum, sidewalk eggs!



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Tom Christiansen

tr///e is the same as s///g:

tr/$foo/$bar/e  ==  s/$foo/$bar/g

I suggest you read up on tr///, sir.  You are completely wrong.

--tom



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Nathan Wiger

Tom Christiansen wrote:
 
 tr///e is the same as s///g:
 
 tr/$foo/$bar/e  ==  s/$foo/$bar/g
 
 I suggest you read up on tr///, sir.  You are completely wrong.

Yep, sorry. I tried to hit cancel and hit send instead. I'll shut up
now.

-Nate



Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen

p.s. Has anybody already suggested that we ought to have a nicer
solution to execute perl code inside a string, replacing "${\(...)}" and
"@{[...]}", which also won't ever win a beauty contest?  Oops, wrong
mailing list.

The first one doesn't work, and never did.  You want 
@{[]} and @{[scalar ]} instead.

"Doesn't work"?

   print "The sum of 1 + 2 is ${\(1+2)}.\n";
--
   The sum of 1 + 2 is 3.

I'm surprised your wouldn't have known this. The principle is the same:
"${...}" expects a scalar reference inside the block, and '\' provides
one. Of course, there shouldn't be a real multi-element list inside the
parens, but just one scalar. And often, the parens aren't needed.

I'm surprised that you still don't understand.  Notice what I showed
you for the replacement above: @{[scalar ]}.

Using ${\(...)} doesn't work in the sense that contrary to popular
belief, it fails to provide a scalar context to the contents of
those parens.  Thus ${ \( fn() ) } is still calling fn() in list
context, not scalar context.  Witness:

sub fn { sprintf "called in %s context", wantarray ? "list" : "scalar" } 

print "Test 1: ";
print "@{ [fn()] }\n";

print "Test 2: ";
print "${ \(fn()) }\n";

print "Test 3: ";
print "@{ [scalar fn()] }\n";

That, when executed, yields:

Test 1: called in list context
Test 2: called in list context
Test 3: called in scalar context

*That's* why test 2 "doesn't work".

--tom



Re: Overlapping RFCs 135 138 164

2000-08-29 Thread Tom Christiansen

($foo = $bar) =~ s/x/y/; will never make much sense to me. 

What about these, which are much the same thing in that they all
use the lvaluability of assignment:

chomp($line = STDIN);
($foo = $bar) += 10;
($foo += 3) *= 2;
func($diddle_me = $protect_me);
$n = select($rout=$rin, $wout=$win, $eout=$ein, 2.5);

--tom



Re: Overlapping RFCs 135 138 164

2000-08-29 Thread Tom Christiansen

What about these, which are much the same thing in that they all
use the lvaluability of assignment:

And don't forget:

for (@new = @old) { s/foo/bar/ } 

--tom



RFC 170 (v1) Generalize =~ to a special-purpose assignment operator

2000-08-29 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Generalize =~ to a special-purpose assignment operator

=head1 VERSION

   Maintainer: Nathan Wiger [EMAIL PROTECTED]
   Date: 29 Aug 2000
   Mailing List: [EMAIL PROTECTED]
   Version: 1
   Number: 170
   Status: Developing
   Requires: RFC 164

=head1 ABSTRACT

Currently, C=~ is only available for use in specific builtin pattern
matches. This is too bad, because it's really a neat operator.

This RFC proposes a simple way to make it more general-purpose.

=head1 DESCRIPTION

First off, this assumes RFC 164. Second, it requires you drop any
knowledge of how C=~ currently works. Finally, it runs directly
counter to RFC 139, which proposes another application for C=~.

This RFC proposes a simple use for C=~: as a last-argument rvalue
duplicator. What this means is that an expression such as this:

   $value = dostuff($arg1, $arg2, $value);

Could now be rewritten as:

   $value =~ dostuff($arg1, $arg2);

And C$value would be implicitly transferred over to the right side as
the last argument. It's simple, but it makes what is being operated on
very obvious.

This enables us to rewrite the following constructs:

   ($name) = split /\s+/, $name;
   $string = quotemeta($string);
   @array = reverse @array;
   @vals = sort { $a = $b } @vals;

   $string = s/\s+/SPACE/, $string;# RFC 164
   $matches = m/\w+/, $string; # RFC 164
   @strs = s/foo/bar/gi, @strs;# RFC 164

As the shorter and more readable:

   ($name) =~ split /\s+/;
   $string =~ quotemeta;
   @array =~ reverse;
   @vals =~ sort { $a = $b };

   $string =~ s/\s+/SPACE/;# looks familiar
   $string =~ m/\w+/;  # this too [1]
   @strs =~ s/foo/bar/gi;  # cool extension

It's a simple solution, true, but it has a good amount of flexibility
and brevity. It could also be the case that multiple values could be
called and returned, so that:

   ($name, $email) = special_parsing($name, $email);

Becomes:

   ($name, $email) =~ special_parsing;

Again, it's simple, but seems to have useful applications.

=head1 IMPLEMENTATION

Simplistic (hopefully).

=head1 MIGRATION

This introduces new functionality, which allows backwards compatibility
for regular expressions. As such, it should require no special
translation of code. This RFC assumes RFC 164 will be adopted (which it
may not be) for changes to regular expressions.

True void contexts may also render some parts of this moot, in which
case coming up with a more advanced use for C=~ may be desirable.

=head1 NOTES

[1] That m// one doesn't quite work right, but that's a special case
that I would suggest should be caught by some other part of the grammar
to maintain backwards compatability (like bare //).

=head1 REFERENCES

RFC 164: Replace =~, !~, m//, and s/// with match() and subst()

RFC 139: Allow Calling Any Function With A Syntax Like s///