Re: Optimizing PMC-based MMD

2008-12-24 Thread Allison Randal

chromatic wrote:


Within the cmp op bodies, we *know* the arity and most of the types of MMD-
participant arguments at compile time.  We can get the types of PMC 
participants within the body of the op itself.  Thus we could avoid most of 
the argument marshalling and counting and analysis if we had a way to perform 
cached MMD lookup without constructing a CallSignature PMC.  That would clear 
up a third of the work.


This we should open up to general discussion. The consequence of 
short-cutting like this is that individual PMCs will no longer be able 
to override 'cmp' to do something other than multi-dispatch. At the 
moment, developers still have the option of providing their own quick 
comparison, which gives an even more extreme speedup than this shortcut.


So, question for language developers and other PMC developers, how 
important is the ability to define a 'cmp' vtable function that's called 
when the 'cmp' opcode is invoked? Or, is defining a 'cmp' multi for your 
PMC type enough?


Another area for optimization is invoking a Sub from a signature PMC; I 
believe we're throwing away and recalculating valuable information, though we 
may have to wait for dramatic improvements until we can unify contexts and 
CallSignature.


Providing a new way of invoking Subs that uses CallSignatures all the 
way down is already planned in the coming series of calling conventions 
refactors.


The final opportunity for optimization is making the PMC multis defined in 
PMCs use PCC instead of C calling conventions.  Corresponding multis written 
in PIR already use PCC, and we want to support that, so we should unify our 
approach.  That would remove the NCI expense here, though that's probably 
minor in comparison to the CallSignature PMC expense.


Changing all NCI calls to something more like PCC calls is already 
planned in the coming series of calling conventions refactors. Changing 
the Pmc2c generator to build PCC subs instead of NCI Subs is a quick 
change that could happen now.


The calling conventions refactors are non-critical (some will likely 
land after 1.0), because the interface will stay the same, it's only the 
internals that will change.


Allison


Re: Optimizing PMC-based MMD

2008-12-24 Thread Patrick R. Michaud
On Wed, Dec 24, 2008 at 09:55:58AM -0600, Allison Randal wrote:
 Within the cmp op bodies, we *know* the arity and most of the types of MMD-
 participant arguments at compile time.  We can get the types of PMC  
 participants within the body of the op itself.  Thus we could avoid 
 most of the argument marshalling and counting and analysis if we had a 
 way to perform cached MMD lookup without constructing a CallSignature 
 PMC.  That would clear up a third of the work.

 This we should open up to general discussion. The consequence of  
 short-cutting like this is that individual PMCs will no longer be able  
 to override 'cmp' to do something other than multi-dispatch. 

Does individual PMCs here mean PMC instance or PMC classes?  I.e.,
are you saying that a specific PMC instance could choose to override
the cmp opcode for that individual PMC?  If so, do we have any examples
where this is being done now?

 At the  
 moment, developers still have the option of providing their own quick  
 comparison, which gives an even more extreme speedup than this shortcut.

 So, question for language developers and other PMC developers, how  
 important is the ability to define a 'cmp' vtable function that's called  
 when the 'cmp' opcode is invoked? Or, is defining a 'cmp' multi for your  
 PMC type enough?

From a Rakudo perspective, the ability to define custom 'cmp' vtable
functions doesn't appear to be at all important.  Comparisons are
almost invariably done by invoking :multi Sub PMCs of one form or 
another and letting those handle the MMD dispatch.  The opcode form 
seems to impose too many limitations to be used directly.

To turn the question around a bit: I can tell that a lot of work
has gone into Parrot to make MMD possible at the vtable level,
but I haven't see how vtable MMD is at all useful or usable in
languages where operator overloading is possible from the HLL itself.
And most dynamic languages I'm looking at seem to support that
in one form or another.

If someone (Allison) could make an example of how vtable MMD is 
intended to improve things -- i.e., taking an HLL language
statement and showing how that translates to PIR that is improved
by vtable MMD, that would be very helpful.

 The calling conventions refactors are non-critical (some will likely  
 land after 1.0), because the interface will stay the same, it's only the  
 internals that will change.

Oh, I'm very disappointed to hear this.  Named and positional argument
handling still has an odd behavior [*], and Perl 6 still really
needs the :lookahead option described earlier in the year.  I thought
that was going to be made possible by the refactor, and is partially
why PDS had calling conventions schedule for the December 2008 release.

[*]  Currently named parameters are filled from any leftover positionals
 in the argument list -- there's no way to declare an argument that
 can _only_ be filled by name, short of defining a :slurpy array 
 that grabs any extra positional arguments and then checking
 that the slurpy is empty.
   
And, Jonathan can correct me on this if I'm mistaken, but
I suspect the other big reason that calling convention refactor was 
scheduled for the December 2008 release is that it's likely a blocker 
or important component for the custom dispatcher that Jonathan will 
be creating for Rakudo as part of his funded grant.  That's due to be 
completed by the end of January, IIRC.

Pm


Optimizing PMC-based MMD

2008-12-21 Thread chromatic
The following code performs far more work than it has to, mostly due to 
crossing the C/PCC boundary multiple times, as well as throwing away known 
information:

$P0 = box 10
$I0 = cmp $P0, 10

This:

- calls VTABLE_cmp on $P1, reaching VTABLE_cmp in the Default PMC
- calls Parrot_mmd_multi_dispatch_from_c_args
- passing 'cmp', 'PP-I' signature, and args as varargs
- builds sig object from varargs
- loops through signature string
- creates a new CallSignature PMC
- creates a new return PMC for all return argument
- creates a new CPointer for each return argument
- pushes arguments onto the CallSignature PMC
- builds a type tuple for MMD
- loops through signature stored in CallSignature to find
MMD-participant arguments
- loops through type signature to set argument types
- checks MMD cache
- use cached candidate if possible
- find new candidate
- creates new array PMC for candidate list
- searches CallSignature's namespace for candidates (?)
- searches global MULTI namespace for candidates
- sorts candidate list by MMD type tuple
- loops through candidate list
- calculates distance to each candidate
- loops through each argument (parallel iteration
over type tuple and argument list)
- loops over all elements in MRO for each argument
type
- calls Parrot_pcc_invoke_sub_from_sig_object
- converts CallSignature string to C string
- creates array PMCs for arguments and results
- counts number of arguments and return values (looping over 
signature string)
- sets up input parameters in current context
- loops over the C signature
- assigns each parameter to the appropriate context
- invokes the Parrot sub (NCI)
- calls the NCI thunk (pcf_I_JPP)
- calls Parrot_init_arg_nci
- inits data structures
- calls Parrot_init_arg_indexes_and_sig_pmc
- calls Parrot_init_arg_sig
- calls C function
- calls set_nci_I to store return value
- converts argument to INTVAL if necessary
- stores argument into register
- assigns return values from the context to the CallSignature
- loops over the C signature
- assigns each return value appropriately

The default Integer case performs a C-level  comparison.  Most of this 
codepath is new as of the MMD branch merge.

Within the cmp op bodies, we *know* the arity and most of the types of MMD-
participant arguments at compile time.  We can get the types of PMC 
participants within the body of the op itself.  Thus we could avoid most of 
the argument marshalling and counting and analysis if we had a way to perform 
cached MMD lookup without constructing a CallSignature PMC.  That would clear 
up a third of the work.

Another area for optimization is invoking a Sub from a signature PMC; I 
believe we're throwing away and recalculating valuable information, though we 
may have to wait for dramatic improvements until we can unify contexts and 
CallSignature.

The final opportunity for optimization is making the PMC multis defined in 
PMCs use PCC instead of C calling conventions.  Corresponding multis written 
in PIR already use PCC, and we want to support that, so we should unify our 
approach.  That would remove the NCI expense here, though that's probably 
minor in comparison to the CallSignature PMC expense.

-- c