This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Function-call named parameters (with compiler optimizations) =head1 VERSION Maintainer: Michael Maraist <[EMAIL PROTECTED]> Date: 25 Aug 2000 Mailing List: [EMAIL PROTECTED] Version: 1 Number: 160 =head1 ABSTRACT Function parameters and their positions can be ambiguous in function-oriented programming. Hashes offer tremendous help in this realm, except that error checking can be very tedious. Also, hashes, in general, take a performance hit. Thus either of the following syntaxes are suggested: sub foo<a, b, c> { return $a + $b * $c; } # end foo $res = foo( b => 5, c => 8, a => 9 ); OR sub foo: params_req_defined(a, b, c) { reuturn $a + $b * $c; } # end foo $res = foo( b => 5, c => 8, a => 9 ); The goal is to enhance functionality / convinience / performance where possible in regards to named-parameters, with a minimal of changes. And, at the same time, allow this to be a completely optional and virtually transparent process. The following is an in-depth analysis of various ways of accomplishing these goals. =head1 DESCRIPTION The current method of parameter proto-types only fulfills a tiny niche, which is mainly to offer compile-type checking and to disambiguate context ( as in sub foo($) { }, or sub foo(&$) { } ). No support, however, is given to hashes, even though they are one of perl's greatest strengths. We see them pop up in parameterized function calls all over the place (CGI, tk, SQL wrapper functions, etc). As above, however, it is left to the coder to check the existance of required parameters, since in this realm, the current proto-types are of no help. It should not be much additional work to provide an extension to prototypes that allow the definition of hashes. The following is a complex example of robust code: #/usr/bin/perl -w use strict # IN: hash: # a => '...' # req # b => '...' # req, defined # c => '...' # req, 0 <= c <= MAX_C # d => '..' # opt # e => '..' # opt # f => '..' # opt # OUT: xxx sub foo { my %args = @_; # Requires $a my $a; die "No a provided" unless exists $args{a}; $a = $args{a}; # Requires non-null $b my $b; die "invalid b" unless exists $args{b} && defined ($b = $args{b}); # Requires non-null and bounded $c my $c; die "Invalid c" unless exists $args{c} && defined ($b = $args{b}) && ($c >= 0 && $c < $MAX_C); my ( $d, $e, $f ) = @args{ qw( d e f ) }; ... } # end foo Becomes: # IN: ... # OUT: ... sub foo<a, b, c, d, e, f> { # Implicitly defines and assigns my $a through $f # Requires non-null $b die "invalid b" unless defined $b; # Requires non-null and bounded $c die "invalid c" unless defined $c && ($c >= 0 && $c < $MAX_C); ... } # end foo Essentially, perl's compiler can be put to use for hashed-function calls in much the same way as pseudo hashes work for structs/objects. Making this a compile-time check would drastically reduce run-time errors in code (that used hash-based parameters). It would also make the code both more readible AND more efficient. There are several ways this could go. The least obtrusive would be the above; No errors are generated by the compiler. This simply serves as an aid to the programmer to alleviate the need for all those "exists" and %args fetches (which is necessary to avoid "use of undefined" warnings, which are very helpful in large-scale code). This model, however, is obviously limited in it's usefulness. One could almost write a pre-processor to perform this activity. At the other extreme, the use of <arg-list> could require these, and only these fields. In this manner, the compiler could easily convert the hash into a fixed parameter listing in a manner similar to the following: sub foo<a, b, c> { ... } foo( c => 1, b => 2, a => 3 ); foo( 8, 9, 10 ); for( %myhash ); Translates to: sub foo($$$) { my ( $a, $b, $c ) = @_; ... } foo( 3, 2, 1 ); foo( 8, 9, 10 ); foo( @myhash{ 'a', 'b', 'c' } ); To my knowledge, the GNU C compiler does this sort of parameter reorganizing. This is also similar to the way functions are passed in python.. Before people make the comment "you know where to find C and python", you can't tell me that this doesn't take out not-fun parts of coding by helping the developer (and maintainer through the proliferation of named parameters). For non trivial function calls, it is a great benifit to the maintainer to understand what each parameter is by simply looking at the function call. An obvious limitation is in the treatment of @_ within the function. Either the function assumes that it was defined without hash'd parameters, or the following construct would be needed: sub foo($$$$$$) { my ( $a, $b, $c ) = @_[ 1, 3, 5 ]; # Minor performance penalty ... } foo( a, 3, b, 2, c, 1 ); # Simple compiler-time reordering foo( a, 8, b, 9, c, 10 ); # reverses the trend above foo( map { ( $_, $myhash{ $_ } ) } qw( a b c ) ); # Obviously undesirable I would suggest the former approach, which does actually limit a special class of function calls ( which I refer to hereafter as chained function calls), where a developer may only be applying a wrapper to a deeper function. In this case, the wrapper function will want to only examine one or two parameters, passing [optionally] everything to it's wrapped function. Essentially, the above would require explicit naming of all parameters instead of just passing @_ or by making use of perl's "&func;" method which optimally passes the caller's stack up. A mere inconvinience at best though. Another obvious problem with forced named-positions is with hetero-genious arguments. First, and foremost is the mixing with class method invocation: $obj->foo( a => 1, b => 2 ); Also with Function parameters: sub my_cmp(&@) { } my_cmp { $_[0] < $_[1] } 5, 6; Likewise, there are entire classes of functions that have scalars intermingled with hashes. Though objects could be taken as a special case, and the use of explicitly-named-parameters could be optional, I feel more could be done. The simplest Hybrid could mingle the two syntaxes: sub foo($%) <a,b.c> { # Here @_[ 1 .. $#_ ] are handled prior to function call my $self = shift; ... } # end foo sub my_cmp(&$$%) <a,b,c> { # Here @_[ 3 .. $#_ ] are handled prior to function call my ( $sub, $lcmp, $rcmp ) = @_; ... } # end my_cmp Another possible Hybrid could make use of function-attributes: sub foo <self,a,b,c> : method, method_self, params_fixed { $self->{a} = $a; $self->{sum} = $b + $c; return $self; } # end foo Here, the fixed_params would allow only the named parameters, and would be a candidate for compiler optimization. The use of $self in this fashion is a seperate discussion. The name method_self was a simple fix for backward compatibility. Method attributes of this type could include: =over 4 =item locked In multi threading, locks the function unless is also declared as a method, in which the object is instead locked. =item method Compatibility, used in conjunction with 'lock' to perform object locking in multi-threading. =item method_self Specifies that a self-object reference should be implicitly created based apon the context. Other attributes determine exactly how the lexical variable is generated. For most cases, this is equivalent to prepending the function with: my $self = shift; =item params_relaxed Provides no enforcement of parameters, nor any real compiler optimizations. It serves simply to gauruntee the generation of the lexical variables (and assignments from the passed hash if present), while at the same time, explicitly defining the function (for potential use in a sort of dynamic "reflections" function-attribute query.). Extra passed parameters are ignored, and missing ones produce undef's. This would typically be equivalent to: sub foo { my ( $a, $b, $c ); { no warn; my %args = @_; $a = $args{ a }; $b = $args{ b }; $c = $args{ c }; } ... } =item params_min Same as fixed, except that extra parameters are ignored. This accomodates chaining function calls, where each function will pick and choose their own parameters, and pass the rest down the chain. Compiler optimizations for this might be difficult (if at all possible). Perhaps something like the following could work: sub foo<a, b>: params_min { ... } foo( b => 1, a => 2, c => 3 ); Translates to: sub foo( $$$$;@) { my ( $a, $b ); Internal-if: if ( called_statically ) { ( $a, $b ) = @_[ 1, 3 ]; } else { # Called with a dynamic hash no warn; my %args = @_; ( $a, $b ) = @args{ 'a', 'b' }; } ... } foo ( a, 2, b, 1, c, 3 ); Obviously, this adds complexity, plus additional information has to be passed to the function to determine if an optimization may occur or not. Sadly, this might even require adding information to the caller() function to fullfill the if statement. This should, however be able to be handled under the covers, optimization or no. =item params_defined Works in conjunction with min / fixed and adds the additional constraint that no fields can be undef. This would actually be less optimized than the fixed case since it's code would become: sub foo($$$) { die "undefined parameter" unless defined $_ foreach @_; my ( $a, $b, $c ) = @_; ... } # end foo And the params_min becomes: sub foo($$;@) { my ( $a, $b ); Internal-if: if ( called_statically ) { ( $a, $b ) = @_[ 1, 3 ]; } else { # Called with a dynamic hash no warn; my %args = @_; ( $a, $b ) = @args{ 'a', 'b' }; } die "undefined parameter" unless defined $_ foreach ( $a, $b ); ... } # end foo =back =head1 IMPLEMENTATION Various possible implementations are the following: =over 4 =item sub foo(a, b, c) { ... } Still compatible with ($$$) due to the presence of \w characters, but you can't intermix the old and new style of proto-types in the same function call. =item sub foo<a, b, c> { ... } Currently proposed method. It looks aqward, but allows mixing proto-type styles in case there are still needs for (&@) <a, b, c>, and the like. =item sub foo [ < ( ] $a, $b, $c [ < ) ] { ... } This just adds '$'s to either style above, which is a matter of taste. All hash values are scalar, so why should we have to prefix their counter-parts with '$'. The only answer I can figure is for readibility; It stands out more this way, and is more consistent with the rest of the language. More to type in my opinion. =item sub foo: params(a,b,c) { ... } Which makes good use of attribute fields, except that it just looks a little odd ( not that <..> doesn't ). =item sub foo: method_self, params_req(a,b), params_req_defined(c,d), params_opt(e), params_extra { ... } This style makes use of four optional function-attributes, which can be applied in any combination, so long as their parameter-names are mutually exclusive. (again method_self is an independant issue, but is suggested since mixing hashes and objects can be common). =over 4 =item method_self As with the above, this is a suggested optional enhancement which takes care of named parameters in OO design, by implicitly defining a $self variable. =item params_req(a,b,c) used to specify which parameters are fixed. If this is the only parameter of the three, then full optimizations can occur. =item params_req_defined(a,b,c) same as params_req, but requires fields to be defined as well. =item params_opt(a,b,c) used to specify optional parameters. Non passed parameters become undef by default. (Obviously there can be no params_op_defined equiv. ) =item params_extra Allows @_ to contain additional fields. This negates optimizations for req-fields, since the user may have other plans for @_. =back =back =head1 SUMMARY In summary, in keeping with perl's spirit, we should definately not enforce a new function / method invocation process; not even for hash-based / named parameters. Also, it makes little sence to produce an entirely new syntax for one or two special cases, which only obtain performance benifits under certain conditions. This would needlessly produce legacy code which would be difficult to maintain in the future. This RFC suggests a compatible method of named-parameters through the use of an optional compiler-level hash. Initial implementations could all be applied as a form of pre-processor. Subsequent versions could internally optimize various special cases. =head1 REFERENCES Thread pseudo-hashes
