On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote:
> I had to poke into the guts of HLLCompiler, the new PAST, and the new 
> POST a fair bit in the process of getting Punie to work with them, so my 
> comments here are a mixture of user experience and implementation 
> details. I've grouped my comments into general categories.

Excellent.  Just as a general overall response -- PAST-pm is by
no means "finished", so many of the items that seem to be missing
are simply cases of "I haven't gotten to them yet so they aren't
implemented yet."

> Available node types:
> 
> - There's no PAST::Stmt node type? I only see PAST::Stmts and PAST::Op. 
> But statements are composed of multiple ops.  So, everything is an op?

At present there's no PAST::Stmt node type, but one can be easily
added.  I thought about putting one in based on Punie's use of
PAST::Stmt, but I hadn't quite figured out exactly _why_ it's 
important so I thought I'd leave it out until I actually needed it
somewhere.  In many ways ops are already composed of multiple ops,
so a statement can be considered just another op.  (But I do see
why someone would want a PAST::Stmt abstraction -- on the other
hand, I didn't see how it changed the resulting POST/PIR output.)

> - There's no PAST::Label node type? How do you represent labels in the 
> HLL source?

I just haven't gotten to this part yet.

> - Is there no way to indicate what type of variable a PAST::Var is? 
> Scalar/Array/Hash? (high-level types, not low-level types)

Sure, that's what 'vtype' is -- it indicates the type of value
that the variable ought to hold.

My plan has been to follow the Perl6 concept of "implementation types"
and "value types" within PAST.  Thus far I've only put in the support
for the value types, as the "vtype" attribute (and vtype can be any
high-level type the language happens to support).  I'm expecting
to add an "itype" attribute at some point when we're a bit farther
along; I'm still working out the details.

> ---
> Meaningful naming: (Be kind to your compiler writers.)

I totally agree, and I'm not yet wedded to any particular naming
scheme.

> - In the PAST nodes, I grok 'name' as the operator/function name of a 
> PAST::Op and as the HLL variable name of a PAST::Var, but making it the 
> value of a PAST::Val is going to far. It was 'value' in the old PAST, 
> which makes more sense. You're passing named parameters into 'init', so 
> I can't see a reason not to use a more meaningful name for the attribute.

I don't have a problem with switching it to 'value', I went with
'name' primarily because every PAST::Node has a name and so it just
made sense to use it there.  But let me make another weak argument
in favor of 'name'.  If a HLL programmer writes

    $a = 1.23456789E6;

then the rhs becomes a PAST::Val node.  How should we represent the 
value?  The parse-to-past translation could evaluate the contents of 
"1.23456789E1" and store the result in 'value' as (.Float) 12.3456789, 
but unfortunately when convert that .Float back into a string for 
use as PIR code it comes out as "12.3457" -- i.e., the code looks
like:

    $P0 = new .Float
    $P0 = 12.3457
    set_global '$a', $P0

I decided that in a number of cases like this, what we really want
to retain in PAST::Val is a precise string representation of the
value that goes in the resulting output, and not a native
representation that may lose precision in translation through
POST/PIR.  So, what we're really storing is the value's "name"
and not its "value".    (I did say it was a weak argument.)

Anyway, we can switch to 'value' if that's ultimately better;
I was just thinking that 'name' might be equally appropriate.

> - In PAST nodes, the attribute 'ctype' isn't actually storing a C 
> language type. Better name?

It really stands for "constant type", and is one of 'i', 'n', or
's' depending on whether it can be treated as an int, num, or
string when being handled as a constant in PIR.

> - The attribute 'vtype' is both variable type in POST::Var and value 
> type in POST::Val. Handy generalization, but it's not clear from the 
> name that 'vtype' is either of those things.

I think you meant PAST::Var/PAST::Val here, as there isn't a POST::Var
or POST::Val.  But 'vtype' really stands for "value type" in both
cases -- it's the type of value returned by either a PAST::Var
or PAST::Val node.

> - The values for both 'ctype' and 'vtype' are obscure. Better to 
> establish a general system for representing types, than to include raw 
> Parrot types or 1-letter codes in the AST.

Ultimately I expect that the types that appear in 'vtype' will
be the types defined by the HLL itself.  For example, in perl6
one would see 'vtype'=>'Str' to indicate a Perl 6 string constant.
Unfortunately it's been difficult to illustrate this in real code 
because of the HLL classname conflicts that I've been reporting 
in other contexts.

I agree the values and name for 'ctype' are a bit obscure, and
will gladly accept any suggestions for improving it.  The 'ctype' 
attribute is really just code optimization in the final output, 
and it does assume some knowledge of the target.  If no ctype
is specified, past-pm assumes that the constant value must 
first be placed into a PMC in order to be useful.  With
a ctype present, then past-pm can match up the (PIR) opcode
contexts in which the value can be directly used as an 
int/num/string in an operation.  It's the difference between

    # $b + 2                             # $b + 2
    get_global $P0, '$b'                 get_global $P0, '$b'
    new $P2, .Undef                      new $P1, .Integer
    add $P2, $P0, 2                      assign $P1, 2
                                         new $P2, .Undef
                                         add $P2, $P0, $P1

or

    # say 3, 4, 5                        # say 3, 4, 5
    "say"(3, 4, 5)                       new $P1, .Integer
                                         assign $P1, 3
                                         new $P2, .Integer
                                         assign $P2, 4
                                         new $P3, .Integer
                                         assign $P3, 5
                                         "say"($P1, $P2, $P3)


> - In PAST nodes, consider the audience when choosing attribute names 
> like 'ismy' (PAST::Var). Something like 'islexical' or 'isdeclaration' 
> (I'm not sure which you mean), is friendlier to non-Perl users, and 
> actually clearer even for Perl users.

"Ismy" means "isdeclaration" here, and I can go ahead and change it.

> - In PAST nodes again, I'm not clear on what 'pirop' (PAST::Op) 
> represents. Is it the literal name of a PIR opcode, or a generic 
> representation of standard low-level operations? I'm more in favor of 
> the latter. Better still, give compiler-writers a standard format lookup 
> table they can write to allow the PAST to POST tranformation to select 
> the right PIR operation from the HLL op name. (See the comments on 
> boundaries of abstraction.)

I think past-pm already has exactly what you want here, but it
may not be entirely clear.  First, 'pirop' does exactly what you
request in 'Better still, ...' -- it provides a way for the compiler
writer to identify the right PIR operation from the HLL op name.
In particular, in the operator-precedence specification a
compiler writer writes:

    proto infix:+ is pirop('add') { ... }
    proto infix:- is pirop('sub') { ... }

and this provides an easy way for the parse-to-past transformation
to associate the correct PIR operation from the HLL op name.
Essentially, the transformation looks for a 'pirop' trait on
the operator, and if found it puts it in the 'pirop' attribute
of the corresponding PAST::Op node.

The values of 'pirop' are really generic representations of
standard low-level operators.  Unfortunately, PIR is not as 
regular as we might like it to be -- some PIR operations will
work only with pmc operands, some will work with a variety of 
int/num/string/pmc operands, and still others won't work with
pmcs at all.  So, POST.pir has a lookup table (%pirtable)
that takes the generic name given by 'pirop' and does any
necessary coercions to get the operands to match.  So far
this table is incomplete -- I've been adding entries only
as I need them.

The idea is that a compiler writer can use 'pirop' to 
specify the mapping of HLL operators into PIR opcodes
directly in the grammar files where the HLL operators are
being defined.  Furthermore, the compiler writer doesn't have
to keep track of the low-level details for each PIR opcode;
i.e., when specifying 'pirop'=>'concat' the past-pm code
generation knows that concat needs string oprands.  (However,
if a compiler writer needs a specific PIR opcode, then they
could specify it with something like 'pirop'=>'concat_p_sc'.)

> - In PAST nodes, the 'clone' method is now 'init'.'clone' was a terrible 
> name, I agree, but 'init' isn't quite right either.

Currently Parrot uses '__init' as the method for initializing
new objects, thus I think 'init' is at least consistent with Parrot.

> - In PAST nodes, the 'add_child' method is now 'push'. I liked 
> 'add_child' better, but, maybe what we really want is not a method at 
> all, but a :vtable entry for an array push? Seems likely, since there's 
> really not any other array-like behavior the syntax-tree nodes need to have.

I went with 'push' because I know I'm also going to want an 'unshift'
operation, and those seemed more descriptive than the generic 'add_child'.
They also correspond well to the Parrot ops.

I've also thought about doing 'push' as a :vtable entry, and we can
still easily do that, but there are at least two items in favor of 
keeping a method-based approach:  (1) :vtable in subclassed items
still has some issues to be addressed (e.g., RT #40626), and
(2) when we get a high-level transformation language into TGE, it's
very likely that the operations on nodes will be method-based
and not opcode-based.

> ---
> 
> Clear boundaries between components: (Fuzzy boundaries of abstraction 
> make it difficult to allow for other implementations of the AST/OST or 
> customization of the compiler object.)
> 
> - The 'compile' method doesn't belong in the PAST object, it belongs in 
> HLLCompiler.
> ...

After a lot of thought and false starts, I ended up taking a
different approach to compilation than the "HLLCompiler specifies
the complete sequence of transformations".  Essentially I've taken
the approach that a "compiler" is simply something that transforms
a source data structure into a target data structure, and so
what we really have is a sequence of "compilers".  To this end,
I really wanted to call my compiler base class 'Compiler'
and not 'HLLCompiler', but unfortunately that classname is already
used by Parrot for something else and so 'HLLCompiler' is what I
chose until that could be resolved.  The 'HLL' probably implies
more than I intended to imply.

So, the 'Abc' compiler really is just something that converts the
'bc' language into a PAST structure, after doing that it simply 
hands the result off to the 'PAST-pm' compiler.  Similarly,
the 'PAST' compiler translates into POST and hands the result
off to the POST compiler, and POST simply does its thing and
returns a PIR or executable result.

> - The 'compile' method also doesn't belong in the main compiler 
> executable, it belongs in HLLCompiler.
> - Merge them into one 'compile' method in HLLCompiler.
> - Customization of HLLCompiler should be handled by creating a subclass 
> of HLLCompiler. (The current 'register' strategy is somewhat fragile.)

I don't have any problem with having each language subclass
HLLCompiler and override the 'compile' method in each, I'll
work on that soon.  Of course, the method still ends up one way
or another in the main compiler executable, it may simply change
the namespace.


> - Provide an 'init' method for HLLCompiler that lets the compiler writer 
> set which modules HLLCompiler will use for each stage of compilation. 
> This will cover the majority of compilers without requiring each 
> compiler writer to define their own 'compile' routine.

Because of the multi-stage approach I've taken, the compile
routines are already fairly short, and to me they're not at all
onerous for a compiler writer to create.  For each of languages/abc/,
languages/APL/, and languages/perl6/ the 'compile' method is
less than 30 lines of PIR.  (And it will only require a couple
of lines of code to abstract the existing call to 'compile' methods
of PAST/POST to instead use PAST/POST compilers.)

I also think that many compilers may end up with compiler-specific
option flags or other items that need to be taken care of, and it
seems to me that this is more easily handled by a method definition
than a module specification.

> - It would be easier to maintain (and create) the list of HLL to PIR 
> operator associations in something like a YAML file than embedded in the 
> parser grammar file. [...]

Hmm.  My feeling was that it was easier to put the operator
associations in the parser grammar file, but I can see the value
of placing them somewhere else, and I definitely would like to
keep Parrot-isms out of the AST as much as possible.

OTOH, there are many times when for optimization reasons or
other items it's useful to be able to drop some Parrot hints
directly into the AST (e.g., the 'ctype' attribute above), and
so I think that as long as full program semantics are captured
in the AST without any Parrot-specific items, it's okay to have
Parrot-specific items available in the AST as compiler hints
simply because it's sometimes easier to place them there than
elsewhere.

> At the very least, the 'pirop' property on parser rules could be handled 
> by the PAST-to-POST transformation, so the compiler writer doesn't have 
> to manually pull those values out of the parser grammar's optable when 
> creating the AST. 

Agreed -- I'll work on this.  

> (If the parser grammar module was specified in 
> HLLCompiler's 'init', then the compiler object would know where to look 
> for the optable.)

I'm thinking this is really a parameter to the AST compiler, along
with some useful support routines to make it easy to grab this
information from a YAML file or other source.  (In fact, it would
be easy to supply a utility routine to construct the 
HLL operator -> pirop translation from the optable, so that
compiler writers that wanted to specify the pirop in the
parser grammar could easily do so and still not have to manually
pull the values across.)


> ---
> Refactor into smaller units: (Easier to test, easier to maintain, easier 
> to debug, and easier to create subclasses.)
> 
> - In HLLCompiler, instead of a monolithic 'command_line' method, split 
> the components into independent methods all called from the 
> 'command_line' method. 

Agreed; HLLCompiler was thrown together quickly, and it was done
before it was even clear that we would have a standard compiler
object.

> - In HLLCompiler, split the 'compile' method out into independent 
> methods for each compilation stage ('compile_ast', 'compile_ost', 
> 'compile_pir', etc.), all called from 'compile'.

Again, I tend to think of this as being all separate compilers,
each of which automatically call its default next stage until
compiler options tell it to do otherwise.


> ---
> Useful error detection: (Be kind to your compiler writers.)
> 
> - Returning the source code string from the 'compile' method in 
> HLLCompiler when no compiler is registered isn't helpful. Throw an 
> exception or give an error. Give the compiler writer a clue to what's wrong.

I just threw in a quick default, and yes, throwing an exception
would be a better default.  Sorry.

> - Provide distinct errors/exceptions for failures at each stage of 
> compilation to make it easy to figure out which stage is failing.

Agreed -- however, exception handling in Parrot still needs
implementation and better flushing out (this is what prompted
my question about the status of exception handling implementation 
in last week's #parrotsketch, and my comment that I'm likely to
need them fairly soon.)

> ---
> Side comments:
>
> - In PGE grammars, what is the "{ ... }" at the end of every proto 
> declaration supposed to do? It seems like a dead weight. Is that where 
> the Perl 6 code defining the operator supposed to go? Can we allow 
> 'proto' declarations to end in a semi-colon when there is no Perl 6 code?

PGE grammars are trying to closely follow the Perl 6 syntax,
so the "{ ... }" is the Perl6 "yada-yada-yada" body that gets
used in function prototypes.  And yes, I could see some compilers
choosing to specify the code directly in the declaration
(which is why 'proto' can also be 'sub' or the other Perl 6
subroutine keywords).

But in the end, I didn't allow simple semicolon terminators
simply because it wasn't valid Perl 6 syntax, and in many cases
I think that having subtle differences isn't ideal as people
may get confused about what is allowed where.  But I don't have
a large objection to modifying the PGE::Grammar compiler to
represent empty declarations with semicolons as well as
yada-yada-yada blocks.

Pm

Reply via email to