Re: S5: substitutions

2006-10-10 Thread Markus Laire

On 10/9/06, Jonathan Lang [EMAIL PROTECTED] wrote:

Smylers wrote:
 To be consistent your proposal should also suggest that these become
 equivalent:

 * { function() }
 * qq[ {function() }]
 * qq{ function() }
 * eval function()

How so?  AFAIK, string literal syntax requires you to prepend a sigil
on the front of any embedded closure that you want to interpolate a
value from; otherwise, it isn't a closure - it's just a pair of
curly-brace characters.  My proposal isn't curly braces _always_ act
like closures, no matter what; it's the second part of a s[]
construct doesn't have to be a literal; it can be anything that can be
evaluated as needed by the algorithm to provide substitute text.


According to S02 bare curlies do interpolate in double-quoted strings:

S02 =item *
S02
S02 A bare closure also interpolates in double-quotish context.  It may
S02 not be followed by any dereferencers, since you can always put them
S02 inside the closure.  The expression inside is evaluated in scalar
S02 (string) context.  You can force list context on the expression using
S02 the Clist operator if necessary.

--
Markus Laire


Re: S5: substitutions

2006-10-10 Thread Jonathan Lang

Markus Laire wrote:

According to S02 bare curlies do interpolate in double-quoted strings:


Yeah; that was subsequently pointed out to me.  Oops.

--
Jonathan Dataweaver Lang


Re: S5: substitutions

2006-10-08 Thread Jonathan Lang

Larry Wall wrote:

On Sat, Oct 07, 2006 at 07:49:48PM -0700, Jonathan Lang wrote:
: Another possibility: make it work.  Add a delayed parameter trait
: that causes evaluation of that trait to be postponed until the first
: time that the parameter actually gets used in the routine.  If it
: never gets used, then it never gets evaluated.  I could see uses for
: this outside of the narrow scope of implementing substitutions.

Tell me how you plan to do MMD on a value you don't have yet.


MMD is based on types, not values; and you don't neccessarily have to
evaluate something in order to know its type.  Also, you don't
neccessarily need every argument in order to do MMD: if there's a
semi-colon in any of the candidates' signatures prior to the argument
in question, MMD stands a decent chance of selecting a candidate
before the question of its type comes up.  Worst case scenario (a Code
object without a return type being compared to a non-Code parameter),
you can treat the argument's type as Any, and let the method
redispatch once the type is known, if it's appropriate to do so.

That said, the real problem here is figuring out what to do if some
candidates ask for a given parameter to be lazily evaluated and others
don't.  It would probably be best to restrict the lazy evaluation
option to the prototype's parameters, so that it always applies across
the board.

--

Consider this as another option: instead of a parameter trait, apply a
trait to the method prototype.  With this trait in play, all parameter
evaluations are postponed as long as possible.  If the first candidate
needs only the first two parameters to test its viability, only
evaluate the first two parameters before testing it.  If the dispatch
succeeds, the other parameters remain unevaluated until they actually
get used in the body.  If all of the two-parameter candidates fail,
evaluate the next batch of parameters and go from there.

This approach doesn't guarantee that a given parameter won't be
evaluated before its first appearance within the routine; but it does
remove the guarantee that it will be.

--

In the case of subst, there's an additional wrinkle: you can't always
evaluate the expression without making reference to the pattern's
Match object, which won't be known until the pattern is applied to the
invocant.  In particular, closures that refer to $0, $1, etc. will
only work properly if called by the method itself, and only after $0,
$1, etc. have been set.

All things considered, the best solution for subst might be to treat
the timing of quote evaluation in a manner analogous to regex
evaluation.

--
Jonathan Dataweaver Lang


Re: S5: substitutions

2006-10-08 Thread Smylers
Jonathan Lang writes:

 Translating this to perl 6, I'm hoping that perl6 is smart enough to
 let me say:
 
s(pattern) { doit() }
 
 Instead of
 
s(pattern) { { doit() } }

That special case is nasty if you don't know about it -- you
inadvertently execute as code something which you just expected to be a
string.  Not a good trap to have in the language.

Smylers


S5: substitutions

2006-10-08 Thread Jonathan Lang

Smylers wrote:

Jonathan Lang writes:

 Translating this to perl 6, I'm hoping that perl6 is smart enough to
 let me say:

s(pattern) { doit() }

 Instead of

s(pattern) { { doit() } }

That special case is nasty if you don't know about it -- you
inadvertently execute as code something which you just expected to be a
string.  Not a good trap to have in the language.


If you expected it to be a string, why did you use curly braces?

While I'm completely on board with the idea that _pattern_ delimiters
shouldn't affect the _pattern's_ semantics, the second half of the
search-and-replace syntax isn't a pattern.  Conceptually, it's either
a string or an expression that returns a string.

Larry pretty much summed up what I'm looking for in this regard -
change the s/// syntax so that it has two distinctive forms:

   s/pattern/string/

(where '/' can be replaced by any valid non-bracketing delimiter, and
string is always evaluated as an interpolated string)

or

   s[pattern] expression

(where '[' and ']' can be replaced by any valid pair of bracketing
delimiters, and expression is evaluated as a perl6 expression)

--
Jonathan Dataweaver Lang


Re: S5: substitutions

2006-10-08 Thread Smylers
Jonathan Lang writes:

 Smylers wrote:
 
  Jonathan Lang writes:
  
   Translating this to perl 6, I'm hoping that perl6 is smart enough
   to let me say:
  
  s(pattern) { doit() }
  
   Instead of
  
  s(pattern) { { doit() } }
  
  That special case is nasty if you don't know about it -- you
  inadvertently execute as code something which you just expected to
  be a string.  Not a good trap to have in the language.
 
 If you expected it to be a string, why did you use curly braces?

Because it isn't possible to learn of all Perl (5 or 6) in one go.  And
in general you learn rules before exceptions to rules.

In general in Perl the replacement part of a substitution is a string,
one that takes interpolation like double-quoted strings do.

In general in Perl if the default delimiter for something is
inconvenient you can pick a different delimiter -- this includes
patterns, and also strings.  And if you pick any sort of brackets for
your delimiters then they match -- which is handy, cos it means that
they can still be used even if the string inside contains some of those
brackets.

So it's quite possible for somebody to have picked up all the above, and
have got used to using Cqq[long string] or Cqq{long string} when he
wishes to quote long strings.  The form with braces has the advantage
that they are relatively uncommon in text (and HTML, and SQL, and many
other typically encountered long strings).

At which point if he wants to do substitution with slashes in at least
one of the pattern or the replacement text (perhaps it's a URL or a
filename) then he's likely to pick some other arbitrary characters for
doing the quoting.  And braces seem as likely to be picked as anything
else.  Unless he specifically knows about an exception there's no reason
not to pick them.

I refer simply to Perl above.  The above situation could just as
easily arise (or already have arisen) in Perl 5 -- in which case the
programmer's expectations would've been met and the code interpreted
fine.  Your proposal would make that no longer the case in Perl 6.

And, apart from people learning Perl fresh, there's also a large number
of existing Perl 5 programmers who also won't be expecting this
exception.

Yes, Perl 6 isn't supposed to be compatible with Perl 5, and obviously a
Perl 5 coder is going to have to learn lots of new things anyway.  But
usually they are significantly different, or the old way of doing things
will be a syntax error.  This is a situation where the old syntax
continues to work but does something quite different.

That's unfortunate, but probably liveable with in general.  But in this
particular case the particular behaviour involves _executing as Perl
code something which the programmer never intended to be code in the
first place_.  That's crazily dangerous.

It's like having a Perl 5 to Perl 6 translator that randomly sticks
eval statements in front of some of your double-quoted strings.

 While I'm completely on board with the idea that _pattern_ delimiters
 shouldn't affect the _pattern's_ semantics, the second half of the
 search-and-replace syntax isn't a pattern.  Conceptually, it's either
 a string or an expression that returns a string.

Sure.  Or rather, it's a string (but braces inside strings can be used
to embed expressions in them).

To be consistent your proposal should also suggest that these become
equivalent:

* { function() }
* qq[ {function() }]
* qq{ function() }
* eval function()

and, naturally, that these no longer are:

* string
* qq[string]
* qq{string}

And if braces are special as delimiters for Cqq consistency would say
they should be for Cq as well -- effectively just another way of
spelling Ceval, but one that doesn't stand out so much.

Smylers


Re: S5: substitutions

2006-10-08 Thread Dr.Ruud
Smylers schreef:

 in
 this particular case the particular behaviour involves _executing as
 Perl code something which the programmer never intended to be code in
 the first place_.  That's crazily dangerous.

I wouldn't mind eval() to be off by default, so to have to put a use
eval in every block that needs it.

-- 
Affijn, Ruud

Gewoon is een tijger.




S5: substitutions

2006-10-08 Thread Jonathan Lang

Smylers wrote:

Jonathan Lang writes:
 If you expected it to be a string, why did you use curly braces?

Because it isn't possible to learn of all Perl (5 or 6) in one go.  And
in general you learn rules before exceptions to rules.


Agreed.


In general in Perl the replacement part of a substitution is a string,
one that takes interpolation like double-quoted strings do.


Here's where I differ from you.  In general, string literals are
delimited by quotes; pattern literals are delimited by forward
slashes; and closures (i.e., code literals) are delimited by curly
braces.  And once you learn that other delimiters are possible for
patterns and strings, that knowledge comes with the added fact that in
order to use a non-standard delimiter, you have to preface it with a
short tag clearly identifying what is being delimited.

In general, you can use literals anywhere you can use variables, and
vice versa.  There are two crucial exceptions to this, both of which
apply to the topic at hand: one is minor, and the other is major.  The
minor exception is the pattern-matching macro, m//.  This macro takes
a pattern literal and applies it as a match criterion to either the
current topic ($_) or whatever is attempting to match (via ~~).  m//
_must_ take a pattern literal; it cannot take a variable containing a
Regex object.  To do the latter, you have to embed the Regex object in
a pattern literal.

m// is a _minor_ exception because it can be viewed as being the
complement of rx// - both can be thought of as pattern literals, with
m// being a pattern literal that always attempts to create a Match
object and rx// being a pattern literal that always tries to create a
Regex object.  Meanwhile, you have the .match method: unlike m//,
.match isn't choosy about where it gets its Regex object; it can come
from a pattern literal, as with m//; it can be passed in by means of a
variable; it can be composed on the spot by an expression; and so on.
The possibilities are endless.

The s/// macro is a Frankenstein Monster, stitched together from the
bodies of a pattern literal and an interpolating string literal and
infused with the spirit of a search-and-replace algorithm.  Like the
m// macro, s/// can only work on literals, never on variables (unless,
as above, you embed them in literals).  In addition, if you choose a
non-bracketing delimiter for the pattern literal, you _must_ use the
same delimiter for the string literal.  (This is more of a handicap
than at first it seems: in general, different kinds of literals use
different delimiters.  With s///, you're forced to use the same
delimiter for two different kinds of literals: a pattern and a
string.)  Using bracketed delimiters for the pattern gets around this
problem, but you're still left with the fact that the delimiters for
the string no longer follow the common-sense rule of either
double-quotes or 'qq' followed by something else - no matter what
delimiters you apply here, the semantics remain the same - unlike
anywhere else that string literals are used.  And short of embedding
them in the literal (notice a trend here?), you cannot apply any
modifiers to the string - only to the pattern or to the
search-and-replace algorithm.  In this regard, the string literal is
the odd man out - s/// could be thought of as a pattern literal with
an auxiliary string literal attached to it, but not the other way
around.

There's nothing natural about this beastie; and if it wasn't so darn
useful, I'd advocate dropping it.

The .subst method bypasses _all_ of these problems, letting you use
distinct and independently modifiable literals for each of the pattern
and the string, or even letting you use a variable or expression to
supply the pattern (or string) in lieu of literals.  On the downside,
the .subst syntax isn't nearly as streamlined as the s/// syntax.  In
addition, there's the issue about delayed evaluation (or lack thereof)
of the string argument, currently being discussed.


In general in Perl if the default delimiter for something is
inconvenient you can pick a different delimiter -- this includes
patterns, and also strings.  And if you pick any sort of brackets for
your delimiters then they match -- which is handy, cos it means that
they can still be used even if the string inside contains some of those
brackets.


As noted above, if you choose non-standard delimiters, you have to
explicitly tag them; and with the exception of s/// and tr///, a given
set of delimiters delimits one thing at a time.  s/// and tr/// are
exceptions to the general rule.


So it's quite possible for somebody to have picked up all the above, and
have got used to using Cqq[long string] or Cqq{long string} when he
wishes to quote long strings.  The form with braces has the advantage
that they are relatively uncommon in text (and HTML, and SQL, and many
other typically encountered long strings).


And he will be used to saying 'qq{long string}', as opposed to '{long
string}', when he expects the 

S5: substitutions

2006-10-07 Thread Jonathan Lang

S5 says:

There is no /e evaluation modifier on substitutions; instead use:

 s/pattern/{ doit() }/

Instead of /ee say:

 s/pattern/{ eval doit() }/


In my perl5 code, I would occasionally take advantage of the pairs of
brackets quoting mechanism to do something along the lines of:

   s(pattern) { doit() }e

Translating this to perl 6, I'm hoping that perl6 is smart enough to let me say:

   s(pattern) { doit() }

Instead of

   s(pattern) { { doit() } }

--

In a similar vein, I tend to write other perl5 substitutions using
parentheses for the pattern so that I can use double-quotes for the
substitution expression:

   s(pattern) expression

This highlights to me the fact that the expression is _not_ a pattern,
and uses a syntax more akin to interpolated strings than to patterns.
The above bit about executables got me to thinking: _if_ perl6 is
smart enough to recognize curly braces and automatically treat the
second argument as an executable expression, would there be any
benefit to letting perl6 apply customized quoting semantics to the
second argument as well, based on the choice of delimiters?  e.g.,
using single quotes would disable variable substitutions and the like
(useful in cases where the substitution doesn't make use of the
captures done by the pattern, if any).

--
Jonathan Dataweaver Lang


Re: S5: substitutions

2006-10-07 Thread Juerd
Jonathan Lang skribis 2006-10-07 15:07 (-0700):
 Translating this to perl 6, I'm hoping that perl6 is smart enough to let me 
 say:
s(pattern) { doit() }
 Instead of
s(pattern) { { doit() } }

I would personally hope that Perl isn't that clever, but treats all
bracketing delimiters the same there. Partly for future-proofness,
partly for least surprise.
-- 
korajn salutojn,

  juerd waalboer:  perl hacker  [EMAIL PROTECTED]  http://juerd.nl/sig
  convolution: ict solutions and consultancy [EMAIL PROTECTED]

Ik vertrouw stemcomputers niet.
Zie http://www.wijvertrouwenstemcomputersniet.nl/.


Re: S5: substitutions

2006-10-07 Thread Larry Wall
On Sat, Oct 07, 2006 at 03:07:49PM -0700, Jonathan Lang wrote:
: S5 says:
: There is no /e evaluation modifier on substitutions; instead use:
: 
:  s/pattern/{ doit() }/
: 
: Instead of /ee say:
: 
:  s/pattern/{ eval doit() }/
: 
: In my perl5 code, I would occasionally take advantage of the pairs of
: brackets quoting mechanism to do something along the lines of:
: 
:s(pattern) { doit() }e
: 
: Translating this to perl 6, I'm hoping that perl6 is smart enough to let me 
: say:
: 
:s(pattern) { doit() }

Well, the () are illegal without intervening whitespace because that
makes s() a function call, but we'll leave that alone.

: Instead of
: 
:s(pattern) { { doit() } }

Perl 5 let certain choose-your-own quotes introduce various kinds of
odd semantics, and that was generally viewed as a mistake.  That is why
S02 says:

For these q forms the choice of delimiters has no influence on the
semantics.  That is, C'', C, C  , C«», C``, C(),
C[], and C{} have no special significance when used in place of
C// as delimiters. 

We could make an exception for the second part of s///, but certainly
for this case I think it's easy enough to write:

.subst(/pattern/, { doit })

However, taken as a macro, s/// is a rather odd fish.  The right side
isn't just a string, but a deferred string, which implies that there
are always curlies there, much like the right side of  implies
deferred evaluation.

: In a similar vein, I tend to write other perl5 substitutions using
: parentheses for the pattern so that I can use double-quotes for the
: substitution expression:
: 
:s(pattern) expression

Because the right side must be deferred, the .subst form of that would be:

.subst(/pattern/, {expression})

Otherwise, the double quotes interpolate too early.  That's getting a
little more cumbersome.

: This highlights to me the fact that the expression is _not_ a pattern,
: and uses a syntax more akin to interpolated strings than to patterns.
: The above bit about executables got me to thinking: _if_ perl6 is
: smart enough to recognize curly braces and automatically treat the
: second argument as an executable expression, would there be any
: benefit to letting perl6 apply customized quoting semantics to the
: second argument as well, based on the choice of delimiters?  e.g.,
: using single quotes would disable variable substitutions and the like
: (useful in cases where the substitution doesn't make use of the
: captures done by the pattern, if any).

Well, again, that's maybe just:

.subst(/pattern/, {'expression'})

or even, since we don't need to delay evaluation:

.subst(/pattern/, 'expression')

But it's possible that some syntactic relief of a dwimmy sort is
in order here.  One could view s[pattern] as a kind of metaprefix
on the following expression, sort of a self-contained unary .
I wonder how often we'd have to explain why

s/pattern/ expression

doesn't do that, though.  'Course, it's already like that in Perl 5.
Unlike in Perl 5, this approach would rule out things like:

s[pattern] !foo!

which would instead have to be written:

s[pattern] qq!foo!

As a unary lazy prefix, you could even just say

s[pattern] doit();

Of course, then people will wonder why

.subst(/pattern/, doit())

doesn't work.  Which makes me want to build it into the pattern somewhere
where there's already deferred evaluation that just happens to be triggered
at the right moment:

/pattern {subst doit}/
/pattern {subst ($0)}/
/pattern {subst q:to'END'}/
a new line
END

We can give the user even more rope to shoot themselves in the dark with:

/pattern {$/ = doit}/
/pattern {$0 = ($0)}/
/pattern {$() = q:to'END'}/
a new line
END

The possibilities are endless...

Well, not quite.  One syntax we *can't* allow is /pattern/{ doit }
because that's already used to pull named captures out of the match
object.

Well, enough random braindump for now.

Larry


Re: S5: substitutions

2006-10-07 Thread Jonathan Lang

Larry Wall wrote:

Jonathan Lang wrote:
: Translating this to perl 6, I'm hoping that perl6 is smart enough to let me
: say:
:
:s(pattern) { doit() }

Well, the () are illegal without intervening whitespace because that
makes s() a function call, but we'll leave that alone.


Thank you; I noticed this after I had sent it.


Perl 5 let certain choose-your-own quotes introduce various kinds of
odd semantics, and that was generally viewed as a mistake.  That is why
S02 says:

For these q forms the choice of delimiters has no influence on the
semantics.  That is, C'', C, C  , C«», C``, C(),
C[], and C{} have no special significance when used in place of
C// as delimiters.

We could make an exception for the second part of s///, but certainly
for this case I think it's easy enough to write:

.subst(/pattern/, { doit })

However, taken as a macro, s/// is a rather odd fish.  The right side
isn't just a string, but a deferred string, which implies that there
are always curlies there, much like the right side of  implies
deferred evaluation.


Perhaps quotes should be given the same defer or evaluate as
appropriate to the context capability that regexes and closures have?
That is, 'q (text)' is always a Quote object, which may be evaluated
immediately in certain contexts and be passed as an object in others.
As a first cut, consider using the same rule for this that regexes
use: in a value context (void, boolean, string, or numeric) or as an
explicit argument of ~~, a quote is immediately evaluated; otherwise,
it's passed as an object to be evaluated later.

The main downside I see to this is that there's no way to force one
approach or the other; a secondary issue has to do with the usefulness
of an unevaluated string: with regexes and closures, the unevaluated
versions are useful in part because they can be made to do different
things when evaulated, based on the circumstances: $x ~~ $regex will
do something different than $y ~~ $regex, and closures can potentially
be fed arguments that allow one closure to do many things.  A quote,
OTOH, isn't neccessarily that flexible.

Or is it?  Is there benefit to extending the analogy all the way,
letting someone define a parameterized quote?


But it's possible that some syntactic relief of a dwimmy sort is
in order here.  One could view s[pattern] as a kind of metaprefix
on the following expression, sort of a self-contained unary .
I wonder how often we'd have to explain why

s/pattern/ expression

doesn't do that, though.  'Course, it's already like that in Perl 5.


Probably not too often - although I _would_ recommend that you
emphasize the distinction between standard regex notation used
everywhere else and the extended regex notion used by s///.

I _do_ like the idea of reserving this behavior to situations where
the pattern delimiters are a matched set, letting you freely choose
some other delimiter for the expression.  In particular, I'm not
terribly fond of the idea of

   s'pattern'expression'

applying single-quote semantics to the expression.


Unlike in Perl 5, this approach would rule out things like:

s[pattern] !foo!

which would instead have to be written:

s[pattern] qq!foo!


Fine by me.  This would also let you easily apply quote modifiers to
the expression.


As a unary lazy prefix, you could even just say

s[pattern] doit();

Of course, then people will wonder why

.subst(/pattern/, doit())

doesn't work.


Perhaps.  But people quickly learn that different approaches in perl
often have their own unique quirks; this would just be one more
example.


Which makes me want to build it into the pattern somewhere
where there's already deferred evaluation that just happens to be triggered
at the right moment:

/pattern {subst doit}/
/pattern {subst ($0)}/
/pattern {subst q:to'END'}/
a new line
END

We can give the user even more rope to shoot themselves in the dark with:

/pattern {$/ = doit}/
/pattern {$0 = ($0)}/
/pattern {$() = q:to'END'}/
a new line
END

The possibilities are endless...


These aren't syntaxes that I'd want to use; but then, TIMTOWTDI.  The
main problem that I have with this approach is that it could interfere
with being able to use the venerable s/pattern/expression/ notation;
I'm looking to open up new possibilities, not to remove a perfectly
workable existing one.


Well, not quite.  One syntax we *can't* allow is /pattern/{ doit }
because that's already used to pull named captures out of the match
object.


...which brings up another potential conflict with the s/// notation:
how _do_ you pull named captures out of the match object in s///?

--

On a related subject: it seems to me that the notion of extending the
pattern notation to include a replace clause is at the heart of the
issue here.  In addition to the above issues, it seems to be too much
of a one-trick pony as currently 

Re: S5: substitutions

2006-10-07 Thread Jonathan Lang

Larry Wall wrote:

As a unary lazy prefix, you could even just say

s[pattern] doit();

Of course, then people will wonder why

.subst(/pattern/, doit())

doesn't work.


Another possibility: make it work.  Add a delayed parameter trait
that causes evaluation of that trait to be postponed until the first
time that the parameter actually gets used in the routine.  If it
never gets used, then it never gets evaluated.  I could see uses for
this outside of the narrow scope of implementing substitutions.

--
Jonathan Dataweaver Lang


Re: S5: substitutions

2006-10-07 Thread Jonathan Lang

Jonathan Lang wrote:

Another possibility: make it work.  Add a delayed parameter trait...


...although lazy might be a better name for it.  :)

--
Jonathan Dataweaver Lang


Re: S5: substitutions

2006-10-07 Thread Larry Wall
On Sat, Oct 07, 2006 at 07:49:48PM -0700, Jonathan Lang wrote:
: Another possibility: make it work.  Add a delayed parameter trait
: that causes evaluation of that trait to be postponed until the first
: time that the parameter actually gets used in the routine.  If it
: never gets used, then it never gets evaluated.  I could see uses for
: this outside of the narrow scope of implementing substitutions.

Tell me how you plan to do MMD on a value you don't have yet.

Larry