Author: allison
Date: Sat Oct 27 21:22:00 2007
New Revision: 22543

Modified:
   trunk/docs/pdds/draft/pdd19_pir.pod

Log:
[pdd] Solifidifying PDD 19 coverage on macros and .pcc_* directives.


Modified: trunk/docs/pdds/draft/pdd19_pir.pod
==============================================================================
--- trunk/docs/pdds/draft/pdd19_pir.pod (original)
+++ trunk/docs/pdds/draft/pdd19_pir.pod Sat Oct 27 21:22:00 2007
@@ -27,6 +27,27 @@
 A valid PIR program consists of a sequence of statements, directives, comments
 and empty lines.
 
+=head3 Statements
+
+A statement starts with an optional label, contains an instruction, and is
+terminated by a newline (<NL>). Each statement must be on its own line.
+
+  [label:] [instruction] <NL>
+
+An instruction may be either a low-level opcode or a higher-level PIR
+operation, such as a subroutine call, a method call, or PIR syntactic sugar.
+
+=head3 Directives
+
+A directive provides information for the PIR compiler that is outside the
+normal flow of executable statements. Directives are all prefixed with a ".",
+as in C<.local> or C<.sub>.
+
+=head3 Comments
+
+Comments start with C<#> and last until the following newline. PIR also allows
+comments in Pod format. Comments, Pod content, and empty lines are ignored.
+
 =head3 Identifiers
 
 Identifiers start with a letter or underscore, then may contain additionally
@@ -44,25 +65,6 @@
 
 NOTE: The use of C<::> in identifiers is deprecated.
 
-=head3 Comments
-
-Comments start with C<#> and last until the following newline. PIR also allows
-comments in Pod format. Comments, Pod content, and empty lines are ignored.
-
-=head3 Statements
-
-A I<statement> starts with an optional label, contains an instruction (a Parrot
-operation or opcode), and is terminated by a newline (<NL>). Each statement
-must be on its own line.
-
-  [label:] [instruction] <NL>
-
-=head3 Directives
-
-A directive provides information for the PIR compiler that is outside the
-normal flow of executable statements. Directives are all prefixed with a ".",
-as in C<.local> or C<.sub>.
-
 =head3 Labels
 
 A label declaration consists of a label name followed by a colon. A label name
@@ -101,7 +103,7 @@
 variables is largely a matter of allocation. If you directly reference C<P99>,
 Parrot will blindly allocate 100 registers for that compilation unit. If you
 reference C<$P99> or a named variable C<foo>, on the other hand, Parrot will
-intelligently allocate a literal register in the background, so C<$P99> may be
+intelligently allocate a literal register in the background. So, C<$P99> may be
 stored in C<P0>, if it is the only register in the compilation unit.
 
 =head2 Constants
@@ -201,9 +203,9 @@
 
 =item .local <type> <identifier> [:unique_reg]
 
-Define a local name I<identifier> for this I<compilation unit> and of the
-given I<type>. You can define multiple identifiers of the same type by
-separating them with commas:
+Define a local name I<identifier> for this compilation unit with the given
+I<type>. You can define multiple identifiers of the same type by separating
+them with commas:
 
   .local int i, j
 
@@ -239,18 +241,14 @@
 =item .const <type> <identifier> = <const>
 
 Define a constant named I<identifier> of type I<type> and assign value
-I<const> to it.
-
-{{ NOTE: C<.const> is deprecated, replaced with C<.constant>. }}
+I<const> to it. The constant is stored in the constant table of the current
+bytecode file.
 
 =item .globalconst <type> <identifier> = <const>
 
 As C<.const> above, but the defined constant is globally accessible.
 
-{{ Proposal: Change name to C<.globalconstant> for consistency with
-C<.constant>. }}
-
-=item .namespace <identifier>
+=item .namespace <identifier> [deprecated]
 
 Open a new scope block. This "namespace" is not the same as the
 .namespace [ <identifier> ] syntax, which is used for storing subroutines
@@ -268,7 +266,11 @@
 All types of common language constructs such as if, for, while, repeat and such
 that have nested scopes, can use this directive.
 
-=item .endnamespace <identifier>
+{{ NOTE: this variation of C<.namespace> and C<.endnamespace> are deprecated.
+They were a hackish attempt at implementing scopes in Parrot, but didn't
+actually turn out to be useful.}}
+
+=item .endnamespace <identifier> [deprecated]
 
 Closes the scope block that was opened with .namespace <identifier>.
 
@@ -276,8 +278,8 @@
 
 Defines the namespace from this point onwards.  By default the program is not
 in any namespace.  If you specify more than one, separated by semicolons, it
-creates nested namespaces, by storing the inner namespace object with a C<\0>
-prefix in the outer namespace's global pad.
+creates nested namespaces, by storing the inner namespace object in the outer
+namespace's global pad.
 
 =item .pragma n_operators
 
@@ -299,17 +301,17 @@
 A library loaded this way is also available at runtime, as if it has been
 loaded again in C<:load>, so there is no need to call C<loadlib> at runtime.
 
-=item .HLL "hll_name", "hll_lib"
+=item .HLL "<hll_name>", "<hll_lib>"
 
-Define the HLL for the current file. If the string C<hll_lib> isn't empty
+Define the HLL for the current file. If the string I<hll_lib> isn't empty
 this compile time pragma also loads the shared lib for the HLL, so that
 integer type constants are working for creating new PMCs.
 
-=item .HLL_map 'CoreType', 'UserType'
+=item .HLL_map '<CoreType>', '<UserType>'
 
 Whenever Parrot has to create PMCs inside C code on behalf of the running
 user program it consults the current type mapping for the executing HLL
-and creates a PMC of type I<'UserType'> instead of I<'CoreType'>, if such
+and creates a PMC of type I<UserType> instead of I<CoreType>, if such
 a mapping is defined.
 
 E.g. with this code snippet ...
@@ -332,7 +334,7 @@
   .sub <quoted string> [:<flag> ...]
 
 Define a compilation unit. All code in a PIR source file must be defined in a
-compilation unit. See L<PIR Calling Conventions|imcc/calling_conventions> for
+compilation unit. See L<PDD03|docs/pdds/pdd03_calling_conventions.pod> for
 available flags.  Optional flags are a list of I<flag>, separated by empty
 spaces, and empty spaces only.
 
@@ -351,30 +353,31 @@
 
 =item .emit
 
-Define a compilation unit containing PASM code. Always paired with
-C<.eom>.
+Define a compilation unit containing PASM code (only opcodes and a limited
+subset of directives). Always paired with C<.eom>.
 
 =item .eom
 
 End a compilation unit containing PASM code. Always paired with
 C<.emit>.
 
-=item .pcc_*
+=item .begin_*, .end_*, .call
 
-Directives used for Parrot Calling Conventions. These are:
+Directives used for Parrot calling conventions. These are:
 
 =over 4
 
-=item .pcc_begin and .pcc_end
+=item .begin_call and .end_call
 
-=item .pcc_begin_return and .pcc_end_return
+=item .begin_return and .end_return
 
-=item .pcc_begin_yield and .pcc_end_yield
+=item .begin_yield and .end_yield
 
-=item .pcc_call
+=item .call
 
 =back
 
+
 {{ REVIEW: Do we still want/need the "pcc_" prefix? See #45925. }}
 
 =back
@@ -398,10 +401,9 @@
 
 =item .return <var> [:<flag> ...]
 
-Between C<.pcc_begin_return> and C<.pcc_end_return>, specify one or
+Between C<.begin_return> and C<.end_return>, specify one or
 more of the return value(s) of the current subroutine.  Available
-flags:
-C<:flat>, C<:named>.
+flags: C<:flat>, C<:named>.
 
 =back
 
@@ -411,73 +413,18 @@
 
 =item .arg <var> [:<flag> ...]
 
-Between C<.pcc_begin> and C<.pcc_call>, specify an argument to be
+Between C<.begin_call> and C<.call>, specify an argument to be
 passed.  Available flags: C<:flat>, C<:named>.
 
 =item .result <var> [:<flag> ...]
 
-Between C<.pcc_call> and C<.pcc_end>, specify where one or more return
+Between C<.call> and C<.end_call>, specify where one or more return
 value(s) should be stored.  Available flags:
 C<:slurpy>, C<:named>, C<:optional>, and C<:opt_flag>.
 
 =back
 
-=head3 Shorthand directives for PCC call and return
-
-=over 4
-
-=item ([<var1> [:<flag1> ...], ...]) = <var2>([<arg1> [:<flag2> ...], ...])
-
-This is short for:
-
-  .pcc_begin
-  .pcc_arg <arg1> <flag2>
-  ...
-  .pcc_call <var2>
-  .result <var1> <flag1>
-  ...
-  .pcc_end
-
-=item <var> = <var>([arg [:<flag> ...], ...])
-
-=item <var>([arg [:<flag> ...], ...])
-
-=item <var>."_method"([arg [:<flag> ...], ...])
-
-=item <var>._method([arg [:<flag> ...], ...])
-
-Function or method call. These notations are shorthand for a longer
-PCC function call with C<.pcc_*> directives. I<var> can denote a
-global subroutine, a local I<identifier> or a I<reg>.
-
-{{We should review the (currently inconsistent) specification of the
-method name. Currently it can be a bare word, a quoted string or a
-string register. See #45859.}}
-
-=item .return ([<var> [:<flag> ...], ...])
-
-Return from the current compilation unit with zero or more values.
-
-The surrounded parentheses are mandatory. Besides making sequence
-break more conspicuous, this is necessary to distinguish this syntax
-from other uses of the C<.return> directive that will be probably
-deprecated.
-
-=item .return <var>(args)
-
-=item .return <var>."somemethod"(args)
-
-=item .return <var>.somemethod(args)
-
-Tail call: call a function or method and return from the sub with the
-function or method call return values.
-
-Internally, the call stack doesn't increase because of a tail call, so
-you can write recursive functions and not have stack overflows.
-
-=back
-
-=head2 Parameter Passing and Getting Flags
+=head3 Parameter Passing and Getting Flags
 
 See L<PDD03|pdds/pdd03_calling_conventions.pod> for a description of
 the meaning of the flag bits C<SLURPY>, C<OPTIONAL>, C<OPT_FLAG>,
@@ -526,14 +473,14 @@
 =item if <var1> <relop> <var2> goto <identifier>
 
 The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
-to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If C<var1
-relop var2> evaluates as true, jump to the named I<identifier>.
+to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If 
+I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
 
 =item unless <var1> <relop> <var2> goto <identifier>
 
 The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
-to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. Unless C<var1
-relop var2> evaluates as true, jump to the named I<identifier>.
+to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. Unless 
+I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
 
 =item <var1> = <var2>
 
@@ -574,6 +521,10 @@
 
 =item <var> = <var> [ <key> ]
 
+{{ NOTE: keyed assignment is still valid in PIR, but the C<..> notation in keys
+is deprecated, so this syntactic sugar for slices is also deprecated. See the
+(currently experimental) C<slice> opcode instead. }}
+
 where C<key> is:
 
  <var1> .. <var2>
@@ -593,8 +544,10 @@
 
 =item <var> [ <var> ] = <var>
 
-A keyed C<set> operation or the assign C<substr> op with a length of
-1.
+A keyed C<set> operation.
+
+{{ DEPRECATION NOTE: this syntactic sugar will no longer be used for the assign
+C<substr> op with a length of 1. }}
 
 =item <var> = new '<type>'
 
@@ -640,17 +593,62 @@
 
 Return the address of a label.
 
-=back
+=item ([<var1> [:<flag1> ...], ...]) = <var2>([<arg1> [:<flag2> ...], ...])
 
+This is short for:
 
+  .begin_call
+  .arg <arg1> <flag2>
+  ...
+  .call <var2>
+  .result <var1> <flag1>
+  ...
+  .end_call
 
-=head2 Macros
+=item <var> = <var>([arg [:<flag> ...], ...])
 
-This section describes the macro layer of the PIR language.
+=item <var>([arg [:<flag> ...], ...])
 
-=head3 Current Situation
+=item <var>."_method"([arg [:<flag> ...], ...])
 
-The macro layer of the PIR compiler handles the following directives:
+=item <var>._method([arg [:<flag> ...], ...])
+
+Function or method call. These notations are shorthand for a longer
+PCC function call. I<var> can denote a global subroutine, a local I<identifier>
+or a I<reg>.
+
+{{We should review the (currently inconsistent) specification of the
+method name. Currently it can be a bare word, a quoted string or a
+string register. See #45859.}}
+
+=item .return ([<var> [:<flag> ...], ...])
+
+Return from the current compilation unit with zero or more values.
+
+The surrounded parentheses are mandatory. Besides making sequence
+break more conspicuous, this is necessary to distinguish this syntax
+from other uses of the C<.return> directive that will be probably
+deprecated.
+
+=item .return <var>(args)
+
+=item .return <var>."somemethod"(args)
+
+=item .return <var>.somemethod(args)
+
+Tail call: call a function or method and return from the sub with the
+function or method call return values.
+
+Internally, the call stack doesn't increase because of a tail call, so
+you can write recursive functions and not have stack overflows.
+
+=back
+
+
+=head2 Macros
+
+This section describes the macro layer of the PIR language. The macro layer of
+the PIR compiler handles the following directives:
 
 =over 4
 
@@ -663,44 +661,72 @@
 
 The C<.macro> directive starts the definition of a macro.
 
-=item * C<.constant>
+=item * C<.macro_const>
+
+The C<.macro_const> directive is a special type of macro; it allows the user to
+use a symbolic name for a constant value. Like C<.macro>, the substitution
+occurs at compile time.
 
-The C<.constant> directive is a special type of macro; it allows the
-user to use a symbolic name for a constant value or a register.
+{{ NOTE: C<.constant> is deprecated, replaced by C<.macro_const>. }}
 
 =back
 
+The macro layer is completely implemented in the lexical analysis phase.
+The parser does not know anything about what happens in the lexical
+analysis phase.
 
-=head3 Proposed Situation
+When the C<.include> directive is encountered, the specified file is opened
+and the following tokens that are requested by the parser are read from
+that file.
 
-The current macro layer has a few limitations. These are listed below.
+A macro expansion is a dot-prefixed identifier. For instance, if a macro
+was defined as shown below:
 
-=over 4
+ .macro foo(bar)
+ ...
+ .endm
+
+this macro can be expanded by writing C<.foo(42)>. The body of the macro
+will be inserted at the point where the macro expansion is written.
 
-=item * Macro parameter list
+A C<.macro_const> expansion is more or less the same as a C<.macro> expansion,
+except that a constant expansion cannot take any arguments, and the
+substitution of a C<.macro_const> contains no newlines, so it can be used 
within
+a line of code.
 
-If a macro defines no parameter list (not even the parentheses), then
-the macro expansion should not specify any parenthesis. This means that
-a macro defined as:
+=head3 Macro parameter list
+
+The parameter list for a macro is specified in parentheses after the name of
+the macro. Macro parameters are not typed.
+
+ .macro foo(bar, baz, buz)
+ ...
+ .endm
+
+The number of arguments in the call to a macro must match the number of
+parameters in the macro's parameter list. Macros do not perform multidispatch,
+so you can't have two macros with the same name but different parameters.
+Calling a macro with the wrong number of arguments gives the user an error.
+
+If a macro defines no parameter list, parentheses are optional on both the
+definition and the call.  This means that a macro defined as:
 
  .macro foo
  ...
  .endm
 
-can only be expanded by writing C<.foo>. Writing C<.foo()> is an error.
-If, however, the macro definition is written as:
+can be expanded by writing either C<.foo> or C<.foo()>. And a macro definition
+written as:
 
  .macro foo()
  ...
  .endm
 
-then writing C<.foo> is an error, and instead the user should write this
-as C<.foo()>. On the one hand this behavior is consistent, but on the other
-hand the error message is somewhat dubious; if the user writes C<.foo> when
-the macro was defined as above (C<foo()>), then the error message indicates
-that the macro needs 1 argument.
+can also be expanded by writing either C<.foo> or C<.foo()>.
 
-Some rationalization would be desirable.
+{{ NOTE: this is a change from the current implementation, which requires the
+definition and call of a zero-parameter macro to match in the use of
+parentheses. }}
 
 =item * Heredoc arguments
 
@@ -717,7 +743,11 @@
 
  EOS
 
-=item * Unique local variables
+{{ NOTE: This is likely because the parsing of heredocs happens later than the
+preprocessing of macros. Might be nice if we could parse heredocs at the macro
+level, but not a high priority. }}
+
+=head3 Unique local labels
 
 Within the macro body, the user can declare a unique label identifier using
 the value of a macro parameter, like so:
@@ -728,20 +758,46 @@
   ...
   .endm
 
-Currently, IMCC still allows for writing C<.local> to declare a local label,
-but that is deprecated. Use C<.label> instead.
+{{ NOTE: Currently, IMCC still allows for writing C<.local> to declare a local
+label, but that is deprecated. Use C<.label> instead. }}
+
+=head3 Unique local variables
+
+Within the macro body, the user can declare a local variable with a unique
+name.
+
+  .macro foo()
+  ...
+  .macro_local int b
+  ...
+  .b = 42
+  print .b # prints the value of the unique variable (42)
+  ...
+  .endm
+
+The C<.macro_local> directive declares a local variable with a unique name in
+the macro. When the macro C<.foo()> is called, the resulting code that is given
+to the parser will read as follows:
+
+  .sub main
+    .local int local__foo__b
+    ...
+    local__foo__b = 42
+    print local__foo__b
+
+  .end
 
-However, it would be helpful if it were possible to declare unique local 
variables
-as well. The syntax for this could be as follows:
+The user can also declare a local variable with a unique name set to the
+symbolic value of one of the macro parameters.
 
   .macro foo(b)
   ...
-  .local int $b
+  .macro_local int $b
   ...
   .$b = 42
   print .$b # prints the value of the unique variable (42)
-  print .b  # prints the name of the variable, which is the value
-            # of parameter "b".
+  print .b  # prints the value of parameter "b", which is
+            # also the name of the variable.
   ...
   .endm
 
@@ -749,108 +805,99 @@
 the value of the parameter, or that the variable by that name is meant. 
Obviously,
 the value of C<b> should be a string.
 
-Defining a non-unique variable can still be done, using the normal syntax:
+The automatic name munging on C<.macro_local> variables allows for using
+multiple macros, like so:
 
-  .macro foo(b)
-  .local int b
-  .local int $b
+  .macro foo(a)
+  .macro_local int $a
   .endm
 
-When invoking the macro C<foo> as follows:
-
-  .foo("x")
-
-there will be two variables: C<b> and C<x>. When the macro is invoked twice:
+  .macro bar(b)
+  .macro_local int $b
+  .endm
 
   .sub main
     .foo("x")
-    .foo("y")
+    .bar("x")
   .end
 
-the resulting code that is given to the parser will read as follows:
+This will result in code for the parser as follows:
 
   .sub main
-    .local int b
-       .local int x
-       .local int b
-       .local int y
+    .local int local__foo__x
+    .local int local__bar__x
   .end
 
-Obviously, this will result in an error, as the variable C<b> is defined twice.
-Of course, it would be a good idea to give the unique variable in the macro a
-special prefix, like so:
 
-    .local int local__foo__x
+{{ PROPOSAL: should C<.macro_local> also add a random value to the munged name,
+to allow multiple calls to the same macro from within the same compilation
+unit? May not be used often enough to be worth adding it. The same effect can
+be achieved by using a symbolic parameter name for the macro local, it's just
+slightly less convenient.  }}
 
-This allows for using multiple macros, like so:
+=head3 Ordinary local variables
 
-  .macro foo(a)
-  .local int $a
-  .endm
+Defining a non-unique variable can still be done, using the normal syntax:
 
-  .macro bar(b)
-  .local int $b
+  .macro foo(b)
+  .local int b
+  .macro_local int $b
   .endm
 
+When invoking the macro C<foo> as follows:
+
+  .foo("x")
+
+there will be two variables: C<b> and C<x>. When the macro is invoked twice:
+
   .sub main
     .foo("x")
-       .bar("x")
+    .foo("y")
   .end
 
-This will result in code for the parser as follows:
+the resulting code that is given to the parser will read as follows:
 
   .sub main
+    .local int b
     .local int local__foo__x
-       .local int local__bar__x
+    .local int b
+    .local int local__foo__y
   .end
 
-An additional special character, not allowed for user-defined variables,
-could be added to the generated name, so that a user-defined variable
-cannot conflict (if the user were to declare a variable by name of
-C<local__foo__x>.)
+Obviously, this will result in an error, as the variable C<b> is defined twice.
+If you intend the macro to create unique variables names, use C<.macro_local>
+instead of C<.local> to take advantage of the name munging.
 
 =back
 
-=head2 Implementation
+=head2 Assignment and Morphing
 
-The macro layer is completely implemented in the lexical analysis phase.
-The parser does not know anything about what happens in the lexical
-analysis phase.
+The C<=> syntactic sugar in PIR, when used in the simple case of:
 
-When the C<.include> directive is encountered, the specified file is opened
-and the following tokens that are requested by the parser are read from
-that file, instead of the original file that was given to the parser.
+  <var1> = <var2>
 
-A macro expansion is a dot-prefixed identifier. For instance, if a macro
-was defined as shown below:
+directly corresponds to the C<set> opcode. So, two low-level arguments (int,
+num, or string registers, variables, or constants) are a direct C assignment,
+or a C-level conversion (int cast, float cast, a string copy, or a call to one
+of the conversion functions like C<string_to_num>).
 
- .macro foo(bar)
- ...
- .endm
-
-this macro can be expanded by writing C<.foo(42)>. The body of the macro
-will be inserted at the point where the macro expansion is written.
+A PMC source with a low-level destination, calls the C<get_integer>,
+C<get_number>, or C<get_string> vtable function on the PMC. A low-level source
+with a PMC destination calls the C<set_integer_native>, C<set_number_native>,
+or C<set_string_native> vtable function on the PMC (assign to value semantics).
+Two PMC arguments are a direct C assignment (assign to container semantics).
 
-A C<.constant> expansion is more or less the same as a C<.macro> expansion,
-except that a constant expansion cannot take any arguments, and it is only
-allowed in PASM mode, or within a C<.emit> block.
+For assign to value semantics for two PMC arguments use C<assign>, which calls
+the C<assign_pmc> vtable function.
 
-{{ Is there any reason to not allow C<.constant> directives in PIR mode?
-   (Except for the fact that we have C<.const> and C<.globalconst>) }}
 
-
-
-=head1 QUESTIONS
-
-=over 4
-
-=item * morph
+{{ NOTE: response to the question:
 
        <pmichaud>      I don't think that 'morph' as a method call is a good 
idea
        <pmichaud>      we need something that says "assign to value" versus 
"assign to container"
        <pmichaud>      we can't eliminate the existing 'morph' opcode until we 
have a replacement
 
-=back
+}}
 
 =head1 ATTACHMENTS
 

Reply via email to