Author: allison
Date: Tue Jul 29 12:34:52 2008
New Revision: 29859
Modified:
trunk/docs/pdds/draft/pdd19_pir.pod
Log:
[pdd] Architectural review of PIR PDD.
Modified: trunk/docs/pdds/draft/pdd19_pir.pod
==============================================================================
--- trunk/docs/pdds/draft/pdd19_pir.pod (original)
+++ trunk/docs/pdds/draft/pdd19_pir.pod Tue Jul 29 12:34:52 2008
@@ -12,20 +12,15 @@
=head1 ABSTRACT
-This document outlines the architecture and core syntax of the Parrot
+This document outlines the architecture and core syntax of Parrot
Intermediate Representation (PIR).
-This document describes PIR, a stable, middle-level language for both
-compiler and human to target on.
-
=head1 DESCRIPTION
PIR is a stable, middle-level language intended both as a target for the
generated output from high-level language compilers, and for human use
developing core features and extensions for Parrot.
-=head1 IMPLEMENTATION
-
=head2 Basic Syntax
A valid PIR program consists of a sequence of statements, directives, comments
@@ -75,14 +70,14 @@
A label declaration consists of a label name followed by a colon. A label name
conforms to the standard requirements for identifiers. A label declaration may
occur at the start of a statement, or stand alone on a line, but always within
-a compilation unit.
+a subroutine.
A reference to a label consists of only the label name, and is generally used
as an argument to an instruction or directive.
-A PIR label is accessible only in the compilation unit where it's defined. A
-label name must be unique within a compilation unit, but it can be reused in
-other compilation units.
+A PIR label is accessible only in the subroutine where it's defined. A label
+name must be unique within a subroutine, but it can be reused in other
+subroutines.
goto label1
...
@@ -90,13 +85,8 @@
=head3 Registers and Variables
-There are three ways of referencing Parrot's registers. The first is direct
-access to a specific register by name In, Sn, Nn, Pn. The second is through a
-temporary register variable $In, $Sn, $Nn, $Pn. I<n> consists of digit(s)
-only. There is no limit on the size of I<n>.
-
-The third syntax for accessing registers is through named local variables
-declared with C<.local>.
+There are two ways of referencing Parrot's registers. The first is
+through named local variables declared with C<.local>.
.local pmc foo
@@ -104,12 +94,16 @@
corresponding to the types of registers. No other types are used. [See
RT#42769]
-The difference between direct register access and register variables or local
-variables is largely a matter of allocation. If you directly reference C<P99>,
-Parrot will blindly allocate 100 registers for that compilation unit. If you
-reference C<$P99> or a named variable C<foo>, on the other hand, Parrot will
-intelligently allocate a literal register in the background. So, C<$P99> may
-be stored in C<P0>, if it is the only register in the compilation unit.
+The second way of referencing a register is through a register variable
+C<$In>, C<$Sn>, C<$Nn>, or C<$Pn>. The capital letter indicates the type
+of the register (integer, string, number, or PMC). I<n> consists of
+digit(s) only. There is no limit on the size of I<n>. There is no direct
+correspondence between the value of I<n> and the position of the
+register in the register set, C<$P42> may be stored in the zeroth PMC
+register, if it is the only register in the subroutine.
+
+{{DEPRECATION NOTE: PIR will no longer support the old PASM-style syntax
+for registers without dollar signs: C<In>, C<Sn>, C<Nn>, C<Pn>.}}
=head2 Constants
@@ -194,11 +188,17 @@
set S0, utf8:unicode:"«"
-The encoding and charset gets attached to the string, no further processing
-is done, specifically escape sequences are not honored.
+The encoding and charset are attached to the string constant, and
+adopted by any string containter the constant is assigned to.
+
+The standard escape sequences are honored within strings with an
+alternate encoding, so in the example above, you can include a
+particular Unicode character as either a literal sequence of bytes, or
+as an escape sequence.
=item numeric constants
+Both integers (C<42>) and numbers (C<3.14159>) may appear as constants.
C<0x> and C<0b> denote hex and binary constants respectively.
=back
@@ -209,15 +209,15 @@
=item .local <type> <identifier> [:unique_reg]
-Define a local name I<identifier> for this compilation unit with the given
-I<type>. You can define multiple identifiers of the same type by separating
-them with commas:
+Define a local name I<identifier> within a subroutine with the given
+I<type>. You can define multiple identifiers of the same type by
+separating them with commas:
.local int i, j
The optional C<:unique_reg> modifier will force the register allocator to
associate the identifier with a unique register for the duration of the
-compilation unit.
+subroutine.
=item .lex <string constant>, <reg>
@@ -239,44 +239,34 @@
=item .const <type> <identifier> = <const>
-{{ PROPOSAL: add
- .const <string constant> <identifier> = <const>
- as an alternative to allow ".const 'Sub' ... "
-}}
-
Define a constant named I<identifier> of type I<type> and assign value
-I<const> to it. The constant is stored in the constant table of the current
+I<const> to it. The I<type> may be either an integer value or a string
+constant. The constant is stored in the constant table of the current
bytecode file.
=item .globalconst <type> <identifier> = <const>
As C<.const> above, but the defined constant is globally accessible.
-=item .namespace <identifier> [deprecated: See RT #48737]
+=item .sub
-Open a new scope block. This "namespace" is not the same as the
-.namespace [ <identifier> ] syntax, which is used for storing subroutines
-in a particular namespace in the global symbol table.
-This directive is useful in cases such as (pseudocode):
+ .sub <identifier> [:<flag> ...]
+ .sub <quoted string> [:<flag> ...]
- local x = 1;
- print(x); # prints 1
- do # open a new namespace/scope block
- local x = 2; # this x hides the previous x
- print(x); # prints 2
- end # close the current namespace
- print(x); # prints 1 again
+Define a subroutine. All code in a PIR source file must be defined in a
+subroutine. See the section L<Subroutine flags> for available flags.
+Optional flags are a list of I<flag>, separated by spaces.
-All types of common language constructs such as if, for, while, repeat and
-such that have nested scopes, can use this directive.
+The name of the sub may be either a bare identifier or a quoted string
+constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers>
+above), but string sub names can contain any characters, including characters
+from different character sets (see L<Constants> above).
-{{ NOTE: this variation of C<.namespace> and C<.endnamespace> are deprecated.
-They were a hackish attempt at implementing scopes in Parrot, but didn't
-actually turn out to be useful.}}
+Always paired with C<.end>.
-=item .endnamespace <identifier> [deprecated: See RT #48737]
+=item .end
-Closes the scope block that was opened with .namespace <identifier>.
+End a subroutine. Always paired with C<.sub>.
=item .namespace [ <identifier> ; <identifier> ]
@@ -295,21 +285,8 @@
The brackets are not optional, although the string inside them is.
-{{ NOTE: currently the brackets *are* optional. TODO: make decision whether
- we want the brackets optional. }}
-
-
-=item .pragma n_operators
-
-Convert arithmethic infix operators to n_infix operations. The unary opcodes
-C<abs>, C<not>, C<bnot>, C<bnots>, and C<neg> are also changed to use a C<n_>
-prefix.
-
- .pragma n_operators 1
- .sub foo
- ...
- $P0 = $P1 + $P2 # n_add $P0, $P1, $P2
- $P2 = abs $P0 # n_abs $P2, $P0
+{{ NOTE: currently the brackets *are* optional, so this is an
+implementation change. }}
=item .loadlib "lib_name"
@@ -319,73 +296,87 @@
A library loaded this way is also available at runtime, as if it has been
loaded again in C<:load>, so there is no need to call C<loadlib> at runtime.
-=item .HLL <hll_name>, <hll_lib>
+=item .HLL <hll_name>
-Define the HLL for the current file. Takes two string constants. If the string
-I<hll_lib> isn't empty this compile time pragma also loads the shared lib for
-the HLL, so that integer type constants are working for creating new PMCs.
-
-{{ PROPOSAL: make the ",<hll_lib>" part optional, so you don't have to
- specify an empty string for the library.
- (Alternatively, make this two different directives: .HLL_name, .HLL_lib)
-}}
+Define the HLL for the current file. Takes one string constant, the name
+of the HLL.
-=item .HLL_map <core_type>, <user_type>
+=item .HLL <hll_name>, <hll_lib> [deprecated]
-{{ PROPOSAL: make the ',' an "->", "=>", "=", for instance, so it's easier
- to remember what argument comes first, the core type or the user type.
-}}
+An old form of the .HLL directive that also loaded a shared lib for the
+HLL. Use C<.loadlib> instead.
+
+=item .HLL_map <core_type> = <user_type>
+
+{{ NOTE: the '=' used to be ','. }}
Whenever Parrot has to create PMCs inside C code on behalf of the running
-user program it consults the current type mapping for the executing HLL
+user program, it consults the current type mapping for the executing HLL
and creates a PMC of type I<user_type> instead of I<core_type>, if such
a mapping is defined. I<core_type> and I<user_type> may be any valid string
constant.
-For example, with this code snippet ...
+For example, with this code snippet:
.loadlib 'dynlexpad'
.HLL "Foo", ""
- .HLL_map 'LexPad', 'DynLexPad'
+ .HLL_map 'LexPad' = 'DynLexPad'
.sub main :main
...
-... all subroutines for language I<Foo> would use a dynamic lexpad pmc.
+all subroutines for language I<Foo> would use a dynamic lexpad pmc.
-{{ PROPOSAL: stop using integer constants for types RT#45453 }}
+=item .line <integer>, <string>
-=item .sub
+Set the line number and filename to the value specified. This is useful in
+case the PIR code is generated from some source file, and error messages
+should print the source file, not the line number and filename of the
+generated file.
- .sub <identifier> [:<flag> ...]
- .sub <quoted string> [:<flag> ...]
+{{ DEPRECATION NOTE: was C<<#line <integer> <string>>>. See [RT#45857],
+[RT#43269], and [RT#47141]. }}
-Define a compilation unit. All code in a PIR source file must be defined in a
-compilation unit. See the section C<Subroutine flags> for
-available flags. Optional flags are a list of I<flag>, separated by empty
-spaces.
+=item .namespace <identifier> [deprecated: See RT #48737]
-The name of the sub may be either a bare identifier or a quoted string
-constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers>
-above), but string sub names can contain any characters, including characters
-from different character sets (see L<Constants> above).
+{{ DEPRECATION NOTE: this variation of C<.namespace> and
+C<.endnamespace> are deprecated. They were a hackish attempt at
+implementing scopes in Parrot, but didn't actually turn out to be
+useful.}}
-Always paired with C<.end>.
+Open a new scope block. This "namespace" is not the same as the
+.namespace [ <identifier> ] syntax, which is used for storing subroutines
+in a particular namespace in the global symbol table.
+This directive is useful in cases such as (pseudocode):
-=item .end
+ local x = 1;
+ print(x); # prints 1
+ do # open a new namespace/scope block
+ local x = 2; # this x hides the previous x
+ print(x); # prints 2
+ end # close the current namespace
+ print(x); # prints 1 again
-End a compilation unit. Always paired with C<.sub>.
+All types of common language constructs such as if, for, while, repeat and
+such that have nested scopes, can use this directive.
-=item .line <integer>, <string>
+=item .endnamespace <identifier> [deprecated: See RT #48737]
-Set the line number and filename to the value specified. This is useful in
-case the PIR code is generated from some source file, and any error messages
-should print the source file, not the line number and filename of the
-generated file.
+Closes the scope block that was opened with .namespace <identifier>.
+
+=item .pragma n_operators [deprecated]
+
+Convert arithmethic infix operators to n_infix operations. The unary opcodes
+C<abs>, C<not>, C<bnot>, C<bnots>, and C<neg> are also changed to use a C<n_>
+prefix.
+
+ .pragma n_operators 1
+ .sub foo
+ ...
+ $P0 = $P1 + $P2 # n_add $P0, $P1, $P2
+ $P2 = abs $P0 # n_abs $P2, $P0
-{{ DEPRECATION NOTE: was C<<#line <integer> <string>>>. See [RT#45857],
-[RT#43269], and [RT#47141]. }}
=back
@@ -483,26 +474,34 @@
=item :method
-The marked C<.sub> is a method. In the method body, the object PMC
-can be referred to with C<self>.
+ .sub bar :method
+ .sub bar :method("foo")
+
+The marked C<.sub> is a method, added as a method in the class that
+corresponds to the current namespace, and not stored in the namespace.
+In the method body, the object PMC can be referred to with C<self>.
+
+If a string argument is given to C<:method> the method is stored with
+that name instead of the C<.sub> name.
=item :vtable
-The marked C<.sub> overrides a v-table method. By default, a sub with the same
-name as a v-table method does not override the v-table method. To specify that
-there should be no namespace entry (that is, it just overrides the v-table
-method but is callable as a normal method), use B<:vtable :anon>. To give the
-v-table method a different name, use B<:vtable("...")>. For example, to have
-the method B<ToString> also be the v-table method B<get_string>), use
-B<:vtable("get_string")>.
+ .sub bar :vtable
+ .sub bar :vtable("foo")
+
+The marked C<.sub> overrides a vtable function, and is not stored in the
+namespace. By default, it overrides a vtable function with the same name
+as the C<.sub> name. To override a different vtable function, use
+C<:vtable("...")>. For example, to have a C<.sub> named I<ToString> also
+be the vtable function C<get_string>), use C<:vtable("get_string")>.
When the B<:vtable> flag is set, the object PMC can be referred to with
C<self>, as with the B<:method> flag.
-
=item :outer(subname)
-The marked C<.sub> is lexically nested within the sub known by B<subname>.
+The marked C<.sub> is lexically nested within the sub known by
+I<subname>.
=item :lexid( <string_constant> )
@@ -591,7 +590,10 @@
be stored. Available flags:
C<:slurpy>, C<:named>, C<:optional>, C<:opt_flag> and C<:unique_reg>.
-=item .param <type> "<identifier>" => <identifier> [:<flag>]*
+=item .param <type> "<identifier>" => <identifier> [:<flag>]* [deprecate]
+
+{{ NOTE: if this is already implemented, deprecate, otherwise, just
+delete from spec.}}
Define a named parameter. This is syntactic sugar for:
@@ -648,59 +650,56 @@
=item if <var> goto <identifier>
-If I<var> evaluates as true, jump to the named I<identifier>. Translate to
-C<if var, identifier>.
+If I<var> evaluates as true, jump to the named I<identifier>.
=item unless <var> goto <identifier>
-Unless I<var> evaluates as true, jump to the named I<identifier>. Translate
-to C<unless var, identifier>.
+Unless I<var> evaluates as true, jump to the named I<identifier>.
=item if null <var> goto <identifier>
-If I<var> evaluates as null, jump to the named I<identifier>. Translate to
-C<if_null var, identifier>.
+If I<var> evaluates as null, jump to the named I<identifier>.
=item unless null <var> goto <identifier>
-Unless I<var> evaluates as null, jump to the named I<identifier>. Translate
-to C<unless_null var, identifier>.
+Unless I<var> evaluates as null, jump to the named I<identifier>.
=item if <var1> <relop> <var2> goto <identifier>
-The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
+The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>.
+ which translate
to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If
I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
=item unless <var1> <relop> <var2> goto <identifier>
-The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
-to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. Unless
+The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. Unless
I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
=item <var1> = <var2>
-Assign a value. Translates to C<set var1, var2>.
+Assign a value.
=item <var1> = <unary> <var2>
-The unaries C<!>, C<-> and C<~> generate C<not>, C<neg> and C<bnot> ops.
+Unary operations C<!> (NOT), C<-> (negation) and C<~> (bitwise NOT).
=item <var1> = <var2> <binary> <var3>
-The binaries C<+>, C<->, C<*>, C</>, C<%> and C<**> generate
-C<add>, C<sub>, C<mul>, C<div>, C<mod> and C<pow> arithmetic ops.
-binary C<.> is C<concat> and only valid for string arguments.
+Binary arithmetic operations C<+> (addition), C<-> (subtraction), C<*>
+(multiplication), C</> (division), C<%> (modulus) and C<**> (exponent).
+Binary C<.> is concatenation and only valid for string arguments.
-C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts C<shl> and C<shr>.
-C<E<gt>E<gt>E<gt>> is the logical shift C<lsr>.
+C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts left and right.
+C<E<gt>E<gt>E<gt>> is the logical shift right.
-C<&&>, C<||> and C<~~> are logic C<and>, C<or> and C<xor>.
+Binary logic operations C<&&> (AND), C<||> (OR) and C<~~> (XOR).
-C<&>, C<|> and C<~> are binary C<band>, C<bor> and C<bxor>.
+Binary bitwise operations C<&> (bitwise AND), C<|> (bitwise OR) and C<~>
+(bitwise XOR).
{{PROPOSAL: Change description to support logic operators (comparisons) as
-implemented (and working) in imcc.y.}}
+implemented (and working) in imcc.y. ANR: proposal not clear.}}
=item <var1> <op>= <var2>
@@ -712,8 +711,10 @@
=item <var> = <var> [ <var> ]
-This generates either a keyed C<set> operation or C<substr var, var,
-var, 1> for string arguments and an integer key.
+A keyed C<set> operation for PMCs or a substring operation for string
+arguments and an integer key.
+
+{{ DEPRECATION NOTE: Possibly deprecate the substring variant. }}
=item <var> = <var> [ <key> ]
@@ -783,30 +784,30 @@
=item <var>."_method"([arg [:<flag> ...], ...])
-=item <var>._method([arg [:<flag> ...], ...])
+=item <var>.<var>([arg [:<flag> ...], ...])
Function or method call. These notations are shorthand for a longer PCC
function call. I<var> can denote a global subroutine, a local I<identifier> or
a I<reg>.
-{{We should review the (currently inconsistent) specification of the
-method name. Currently it can be a bare word, a quoted string or a
-string register. See #45859.}}
+{{ DEPRECATION NOTE: bare word method names (e.g. C<foo.bar()> where
+C<bar> is not a local variable name) are deprecated. Use a quoted string
+instead. See #45859. }}
=item .return ([<var> [:<flag> ...], ...])
-Return from the current compilation unit with zero or more values.
+Return from the current subroutine with zero or more values.
-The surrounded parentheses are mandatory. Besides making sequence
-break more conspicuous, this is necessary to distinguish this syntax
-from other uses of the C<.return> directive that will be probably
+The parentheses surrounding the arguments are mandatory. Besides making
+sequence break more conspicuous, this is necessary to distinguish this
+syntax from other uses of the C<.return> directive that will be probably
deprecated.
=item .return <var>(args)
=item .return <var>."somemethod"(args)
-=item .return <var>.somemethod(args)
+=item .return <var>.<var>(args)
Tail call: call a function or method and return from the sub with the
function or method call return values.
@@ -827,28 +828,16 @@
or a C-level conversion (int cast, float cast, a string copy, or a call to one
of the conversion functions like C<string_to_num>).
-A PMC source with a low-level destination, calls the C<get_integer>,
-C<get_number>, or C<get_string> vtable function on the PMC. A low-level source
-with a PMC destination calls the C<set_integer_native>, C<set_number_native>,
-or C<set_string_native> vtable function on the PMC (assign to value
-semantics). Two PMC arguments are a direct C assignment (assign to container
-semantics).
+Assigning a PMC argument to a low-level argument calls the
+C<get_integer>, C<get_number>, or C<get_string> vtable function on the
+PMC. Assigning a low-level argument to a PMC argument calls the
+C<set_integer_native>, C<set_number_native>, or C<set_string_native>
+vtable function on the PMC (assign to value semantics). Two PMC
+arguments are a direct C assignment (assign to container semantics).
For assign to value semantics for two PMC arguments use C<assign>, which calls
the C<assign_pmc> vtable function.
-
-{{ NOTE: response to the question:
-
- <pmichaud> I don't think that 'morph' as a method call is a good idea
- <pmichaud> we need something that says "assign to value" versus
- "assign to container"
- <pmichaud> we can't eliminate the existing 'morph' opcode until we have a
- replacement
-
-}}
-
-
=head2 Macros
This section describes the macro layer of the PIR language. The macro layer of
@@ -867,7 +856,7 @@
runtime/parrot/include, in that order. The first file of that name to be found
is included.
-{{ Check the include directive's search order and whether it's complete }}
+{{ NOTE: the C<include> directive's search order is subject to change. }}
=item * C<.macro> <identifier> [<parameters>]
@@ -1275,6 +1264,10 @@
argument before a variable number of following arguments is the
argument count.
+=head1 IMPLEMENTATION
+
+There are multiple implementations of PIR, each of which will meet this
+specification for the syntax.
=head1 ATTACHMENTS