RFC 128 (v2) Subroutines: Extend subroutine contexts to include name parameters and lazy arguments

Perl6 RFC Librarian Thu, 17 Aug 2000 14:25:47 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Subroutines: Extend subroutine contexts to include name parameters and lazy arguments

=head1 VERSION

  Maintainer: Damian Conway <[EMAIL PROTECTED]>
  Date: 18 August 2000
  Last Modified: 18 August 2000
  Version: 2
  Mailing List: [EMAIL PROTECTED]
  Number: 128

=head1 ABSTRACT

This RFC proposes that subroutine argument context specifiers be
extended in several ways, including allowing parameters to be typed and
named, and that a syntax be provided for binding arguments to named
parameters.

=head1 CHANGES

Added section describing named parameter interaction with named higher-order
function placeholders.

=head1 DESCRIPTION

It is proposed that the existing subroutine "prototype" mechanism
be replaced by optional formal parameter lists that allow parameters
to be named and their contexts specified.

The syntax for this would be:

        sub subname ( type context(s) parameter_name : parameter_attributes ,
                      type context(s) parameter_name : parameter_attributes ,
                      type context(s) parameter_name : parameter_attributes ;
                      # end of required parameters
                      type context(s) parameter_name : parameter_attributes ,
                      # etc.
                    ) : subroutine_attributes
        { body }

Each of the four components of a parameter specification -- type,
context, name, and attributes -- would be optional.

=head2 Contexts

The context specifiers would be:

        $       parameter is scalar
        @       parameter is array (eats remaining args)
        %       parameter is hash (eats remaining args)
        /       parameter is qr'd string
        &       parameter is subroutine reference or block
        *       parameter is typeglob (assuming they still exist)

Note that any of these specifiers may appear in any position in a
parameter list (especially C<&>, which would no longer be constrained to
the first position).

The following context modifiers would be available:

        \       parameter must be a reference,
                magically en-reference arg if necessary

        ?       argument is lazily evaluated

        ^       (& only) terminate curry propagation on argument

Note that the semantics of the \ modifier would be altered somewhat
so that a reference is I<always> passed for that parameter.
It would retain its magical en-referencing coercion:

        \$      argument must be scalar ref or start with $
                scalar var magically en-referenced

        \@      argument must be array ref or start with @,
                array var magically en-referenced

        \%      argument must be hash ref of start with %,
                hash var magically en-referenced

        \/      argument must be qr'd string or /.../ or m/.../
                /.../ or m/.../ magically qr'd to en-reference

        \&      arg must be sub reference, curried function, or block
                block converted to anonymous sub ref

        \*      arg must be something convertible to a typeglob
                typeglob magically en-referenced


=head3 Context classes

The revised syntax would also allow I<context classes> to be specified.
A context class aggregates two or more alternative contexts, allowing
any one of them to be the context for corresponding argument.

For example:

        sub mymap ([\/&?$]@) {...}
        
Here, the first argument must be either a /.../ pattern (or qr), or a
block (or sub ref), or a lazily evaluated scalar (see below). In parsing
that argument the various possible contexts are considered left-to-right
and the first context that allows the argument to be parsed is used.

Note that context classes may also have modifiers:

        sub mymap (^[\/&?$]@) {...}

In this example, no matter what the first argument is, it does not propagate currying
(see below).

A context class may only contain context specifiers that yield scalar
parameters. Hence, a context class may contain any of the following
specifiers (any of which may also have C<^> or C<?> modifiers):

        $       /       \$      \/
        &       *       \&      \*      
                        \@      \%

but not:

        @       %

A context class always yields a scalar parameter.


=head3 Lazy evaluation

If the C<?> modifier is used for a particular parameter, that parameter
is lazily evaluated. This means that it is only evaluated when the
corresponding named parameter (see below) -- or the corresponding element
of @_ -- is accessed in some way. Passing the parameter to another
subroutine or returning it as an lvalue does not count as an access.

If the C<?> modifier is applied to a C<@> parameter (which eats the
remaining arguments), those remaining arguments are not evaluated
until the corresponding element of the array is accessed. Iteration
through such an array (i.e. in a C<for> or C<foreach>) only evaluates
one element per iteration.

If the C<?> modifier is applied to a C<%> parameter (which eats the
remaining arguments), the odd arguments (that are mapped to keys) are 
immediately evaluated, but the even arguments (that map to values)
are not evaluated until the corresponding entry of the hash is accessed.
Iteration through such a hash (i.e. via C<each> or C<values>) only
evaluates one element per iteration.

For example:

        sub firstdef(?@) { defined($_) && return $_ for (@_); }

        sub enervate(?$) { return $_[0] }

        sub Klingon::OPERATOR_?: ($,?$,?$)      # nb: proposed new operator
                                                #     overloading mechanism
        {
                if ( $_[0]->debaseToTerran() ) { return eval{$_[1]} }
                return eval{$_[2]};
        }

Note the use of explicit C<eval>'s in the last example, to force the
lazy arguments to evaluate before being returned.

=head3 Controlling curry propagation

RFC 23 proposes the addition of higher order functions, via argument/operand
placeholders. However, when a subroutine call includes a curried argument,
there is an ambiguity as to how far "outwards" the currying should propagate.
For example:

        $num_nodes = traverse( $root, $sum += ^_ );

might mean:

        $num_nodes = sub{ traverse( $root, $sum += $_[0] ) };

if currying continued to the outermost subroutine, or:

        $num_nodes = traverse( $root, sub{$sum += $_[0]} );

if it were restricted to the second argument.

As the former interpretation is the proposed default behaviour, some
syntactic means of requesting the latter interpretation is required.

It is proposed that a parameter context modifier -- C<^> -- be
added to handle this. Any parameter with the C<^> modifier would
prevent curry propagation to the surrounding subroutine call.
Thus, with the declaration:

        sub traverse ($,^$);

the call:

        $num_nodes = traverse( $root, $sum += ^_ );

would be equivalent to:

        $num_nodes = traverse( $root, sub{$sum += $_[0]} );

whereas the declaration:

        sub traverse ($,$);

would allow the curried argument to "infect" the entire surrounding call:

        $num_nodes = sub{ traverse( $root, $sum += $_[0] ) };

Note that the curry control only applies to the argument whose parameter
has the C<^> modifier. So:

        sub traverse ($,^$);
        $num_nodes = traverse( ^_ , $sum += ^_ );

means:

        $num_nodes = sub { traverse( $_[0], sub{$sum += $_[0]} ) };

The currying of the second argument is restricted to its argument slot, whilst
the currying of the first argument propagates outwards to encompass the entire
call to C<traverse>.


=head2 Parameter names

Each parameter may optionally (and independently) be given a name.
This name is specified after the parameter's context specifer.
The declaration of a parameter name creates a lexical variable of the
same name in the scope of the subroutine body. Named C<@> and C<%>
parameters create a lexical array or hash respectively. All other
named parameters create a lexical scalar.

For example:

        sub doublemap (&mapsub, @args) {        # creates my($mapsub,@args)
                my @mapped;
                push @mapped, $mapsub->(splice @_, 0, 2) while @_;
                return @mapped;
        }

Note that the context specifier can still be any valid specifier:

        sub lazymap (^[&\/?$]mapper, $max, ?@args) {
                my @mapped;
                switch (ref $mapper) {
                        case 'CODE'  { push @mapped, $mapper->(shift)
                                                while @_ && $max--; }
                        case 'REGEX' { push @mapped, shift() =~ m/$mapper/
                                                while @_ && $max--; }
                        case ""      { push @mapped, $mapper
                                                while @_ && $max--; }
                }
                return @mapped;
        }


=head3 Named arguments

It is further proposed that arguments may be passed by name, and that
named arguments may be passed in any order.

An argument would be associated with a named parameter by prefixing it
with a standard Perl label (i.e. an identifier-colon sequence). For example:

        @mapped = doublemap(args: @list, mapsub: ^a+^b);

On encountering labelled arguments in a subroutine call, the interpreter
would evaluate those arguments (in left-to-right sequence) in the
context specified by the corresponding named parameters (or I<not>
evaluate them for lazy contexts!). The resulting values would then be
assigned to the corresponding named parameters.

Any unlabelled arguments would then be evaluated and assigned (again in
left-to-right sequence) to any remaining parameters. Those nameless
evaluations would be carried out in the respective contexts specified by
the remaining parameters.

It would be an error to:

        * Define two named parameters with the same name, unless they
          can be distinguished by context. 

        * Label two arguments with the same name, unless there are 
          two context-distinguishable named parameters of that name.

        * Label an argument such that there is no corresponding named
          parameter.


=head3 Interaction with named placeholders

It is further proposed that when named placeholders are used to curry a
function, the resulting subroutine would have named parameters. If the
curried function mixed named, ordinal, and anonymous placeholders, the
resulting subroutine would have a mixture of named and unnamed parameters.

For example:

        my $selector = ^condition ? ^2 : ^_;

would be equivalent to:

        my $selector = sub ($condition,$,$) { $condition ? $_[2] : $_[1] };

This would make currying out the condition clearer:

        my $select_on_val = $selector->(condition: $val);


=head2 Types

It is proposed that parameters may be given types: either the name of
a class, or the name of a builtin type (such as 'ARRAY', 'HASH',
'CODE', etc.)

If a parameter has a type (C<T>) then the following additional
constraints are placed upon it and its value:

=over 4

=item 1.

The parameter's specified (or implicit) context must yield a scalar value.

=item 2.

The scalar value of the bound argument (say, $val) must satisfy
C<UNIVERSAL::isa($val,'T')>.

=item 3.

If the parameter is named, the corresponding lexical variable will be
typed to class C<T>, unless C<T> is the name of a built-in type:
'SCALAR', 'HASH', 'CODE', etc.  (and maybe even then, if typed lexicals
were to be extended to built-in types)

=item 4.

If the subroutine has the attribute C<:multi>, then the typed parameter
takes part in the multiple dispatching of the subroutine (see forthcoming
RFC).

=back

For example:

        sub traverse (Tree $root, ^$subref) {...}

This specifies that the first argument must be a Tree object, or an object of a
class derived from Tree. The corresponding lexical variable would be equivalent
to:

        my Tree $root;


=head3 Using builtin type names

The ability to specify the names of builtin types as parameter types offers
additional flexibility in controlling argument interpretation. For example,
the specification:

        sub demo(ARRAY $a, @b) {...}    # version 1

constrains the argument to be an array reference, but does not invoke a 
magical en-referencing context, the way this would:

        sub demo(\@a, @b) {...}         # version 2

Thus, a call like:

        demo(@LOL);

will succeed under version 1 (binding $LOL[0] to $a,
and the rest of @LOL to @b), provided $LOL[0] is an array reference.

Under version 2, the call to C<demo> would fail, since C<\@LOL> will be
bound to $a and there will be nothing left to bind to @b.


=head2 Parameter attributes

These are identical to variable attributes.


=head2 Banishment of the term "prototype"

It is further proposed that parameter lists I<never> be referred to
as "prototypes", and that use of the term be a flameworthy offence.


=head1 MIGRATION ISSUES

This proposal has the potential to break a small number of cases
where a backslashed context specifier would now match a reference
argument that it previously complained about.

Also, the suggested regularization of semantics for backslash means
that a C<\$> argument is passed as a reference, not a value.


=head1 IMPLEMENTATION

Definitely S.E.P.


=head1 REFERENCES

Forthcoming RFC on changing ref(qr/.../) to return "REGEX", not "Regexp".

Forthcoming RFC on multiple dispatch.

Forthcoming RFC on operator overloading 

Forthcoming RFC on extending C<UNIVERSAL::isa>

Forthcoming RFC on restricting assignments to typed lexicals

RFC 21 (v1): Replace C<wantarray> with a generic C<want> function

RFC 22 (v1): Builtin switch statement

RFC 23 (v2): Higher order functions

RFC 57 (v1): Subroutine prototypes and parameters

RFC 84 (v1): Replace => (stringifying comma) with => (pair constructor)

RFC 97 (v1): prototype-based method overloading
RFC 128 (v2) Subroutines: Extend subroutine contexts to include name parameters and lazy arguments

Reply via email to