Author: allison Date: Thu Aug 28 09:09:48 2008 New Revision: 30620 Added: trunk/docs/pdds/draft/pdd31_hll_interop.pod
Log: [pdd] Adding an early draft PDD for HLL interoperability, from Bob Rogers. Added: trunk/docs/pdds/draft/pdd31_hll_interop.pod ============================================================================== --- (empty file) +++ trunk/docs/pdds/draft/pdd31_hll_interop.pod Thu Aug 28 09:09:48 2008 @@ -0,0 +1,466 @@ +# Copyright (C) 2008, The Perl Foundation. +# $Id: $ + +=head1 NAME + +docs/pdds/pddxx_language_interop.pod - Inter-language calling + +=head1 VERSION + +$Revision: 28231 $ + +=head1 ABSTRACT + +This PDD describes Parrot's conventions and support for communication between +high-level languages (HLLs). It is focused mostly on what implementors should +do in order to provide this capability to their users. + +=head1 DESCRIPTION + +The ability to mix different high-level languages at runtime has always been +an important design goal of Parrot. Another important goal, that of +supporting all dynamic languages, makes language interoperability especially +interesting -- where "interesting" means the same as it does in the Chinese +curse, "May you live in interesting times." It is expected that language +implementers, package authors, and package users will have to be aware of +language boundaries when writing their code. It is hoped that this will not +become too burdensome. + +None of what follows is binding on language implementors, who may do whatever +they please. Nevertheless, we hope they will at least follow the spirit of +this document so that the code they produce can be used by the rest of the +Parrot community, and save the fancy footwork for intra-language calling. +However, this PDD B<is> binding on Parrot implementors, who must provide a +stable platform for language interoperability to the language implementors. + +=head2 Ground rules + +In order to avoid N**2 complexity and the resulting coordination headaches, +each language compiler provides an interface as a target for other languages +that should be designed to require a minimum of translation. In the general +case, some translation may be required by both the calling language and the +called language: + + | + | + | Calling sub + | | + | Language X | + | V + | Calling stub + +================ | + | + "plain Parrot" | + | + +================ | + | V + | Called wrapper + | | + | | + | Language Y V + | Called sub + | + +Where necessary, a language may need to provide a "wrapper" sub to interface +external calls to the language's internal calling and data representation +requirements. Such wrappers are free to do whatever translation is required. + +Similarly, the caller may need to emit a stub that converts an internal call +into something more generic. + +{{ Of course, "stub" is really too close to "sub", so we should find a better +word. Doesn't the C community call these "bounce routines"? Or something? +-- rgr, 31-Jul-08. }} + +{{ I am discovering that there are five different viewpoints here, +corresponding to the five layers (including "plain Parrot") of the diagram +above. I need to make these viewpoints clearer, and describe the +responsibilities of each of these parties to each other. -- rgr, +31-Jul-08. }} + +Languages are free to implement the stub and wrapper layers (collectively +called "glue") as they see fit. In particular, they may be inlined in the +caller, or integral to the callee. + +Ideally, of course, the "plain Parrot" layer will be close enough to the +semantics of both languages that glue code is unnecesary, and the call can be +made directly. Language implementors are encouraged to dispense with glue +whenever possible, even if glue is sometimes required for the general case. + +In summary: + +=over 4 + +=item * + +Each HLL gets its own namespace subtree, within which C<get_hll_global> and +C<set_hll_global> operate. In order to make external calls, the HLL must +provide a means of identifying the language, the function, and enough +information about the arguments and return values for the calling language to +generate the call correctly. This is necessarily language-dependent, and is +beyond the scope of this document. + +=item * + +When calling across languages, both the caller and the callee should try to +use "plain Parrot semantics" to the extent possible. This is explained in +more detail below, but essentially means to use the simplest calling +conventions and PMC classes possible. Ideally, if an API uses only PMCs that +are provided by a "bare Parrot" (i.e. one without any HLL runtime code), then +it should be possible to use this API from any other language. + +=item * + +It is acceptable for languages to define subs for internal calling that are +not suitable for external calling. Such subs should be marked as such, and +other languages should respect those distinctions. (Or, if they choose to +call intra-language subs, they should be very sure they understand that +language's calling conventions. + +=back + +=head1 HALF-BAKED IDEAS + +{{ Every draft PDD should have one of these. ;-} -- rgr, 28-Jul-08. }} + +=head2 Common syntax for declaring exported functions? + +I assume we will need some additional namespace support. Not clear yet +whether it's better to mark the ones that or OK for external calling, or the +ones that are not. + +(As you can guess, I don't have a strong suggestion for what to call these +functions yet. Do we call them "external"? Would that get confused with +intra-language public interfaces?) + +Beyond that, we probably need additional metainformation on the external subs +so that calling compilers will know what code to emit. Putting them on the +subs means that the calling compiler just needs to load the PBC in order to +access the module API (though it may need additional hints). Of course, that +also requires a PIR API for accessing this metainformation . . . + +Crazy idea: This is more or less the same information (typing) required for +multimethods. If we encourage the export of multisubs, then the exporting +language could provide multiple interfaces, and the calling compiler could +query the set of methods for the one most suitable. + +=head2 More namespace complexity? + +It might be good to have some way for HLLs to define a separate external +definition for a given sub (i.e. one that provides the wrapper) that can be +done without too much namespace hair. I.e. + + .sub foo :extern + +defines the version that is used by interlanguage calling, and + + .sub foo + +defines the version that is seen by other code written in that language +(i.e. via C<get_hll_global>). If there is no plain C<foo>, the C<:extern> +version is used for internal calls. That way, the compiler can emit both +wrapper code and internal code without having to do anything special (much), +even if different calling conventions and/or data conversions are required. + +{{ Of course, this wouldn't be necessary if all external subs were multisubs. +-- rgr, 31-Jul-08. }} + +=head2 Multiple type hierarchies? + +Different languages will have to "dress up" the Parrot type/class hierarchy +differently. For example, Common Lisp specifies that C<STRING> is a subtype +of C<VECTOR>, which in turn is a subtype of C<ARRAY>. This is not likely to +be acceptable to other languages, so Lisp needs its own view of type +relationships, which must affect multimethod dispatch for Lisp generic +functions, i.e. a method defined for C<VECTOR> must be considered when passed +a string as a parameter. + +The language that owns the multisub gets to define the type hierarchy and +dispatch rules used when it gets called. In order to handle objects from +foreign languages, the "owning" language must decide where to graft the +foreign class inheritance graph into its own graph. {{ It would be nice if +some Parrot class, e.g. C<Object>, could be defined as the conventional place +to root language-specific object class hierarchies; that way, a language would +only have to include C<Object> in order to incorporate objects from all other +conforming languages. -- rgr, 26-Aug-08. }} + +Note that common Parrot classes will in general appear in different places in +different languages' dispatch hierarchies, so it is important to bear in mind +which language "owns" the dispatch. + +=head1 DEFINITIONS + +{{ Collect definitions of new jargon words here, once we figure out what they +should be. -- rgr, 29-Jul-08. }} + +=head1 IMPLEMENTATION + +=head2 Plain Parrot Semantics + +Fortunately, "plain Parrot" is pretty powerful, so the "common denominator" is +not in fact the lowest possible. For example, not all Parrot languages +support named, optional, or repeated arguments. For the called language, this +is never a problem; calling module can only use the subset API anyway. +Implementers of subset calling languages are encouraged to provide their users +with an extended API for the interlanguage call; typically, this is only +required for named arguments. + +{{ This needs more? -- rgr, 28-Jul-08. }} + +=head2 Strings + + {{ I am probably not competent to write this section. At the very least, + it requires discussion of languages that expect strings to be mutable + versus . . . Java. -- rgr, 28-Jul-08. }} + +=head2 Other scalar data types + +All Parrot language implementations should stick to native Parrot PMC types +for scalar data, except in case of dire need. To see with this is so, take +the particular case of integer division, which differs significantly between +languages. + +In Tcl, "the integer three divided by the integer five" produces the integer +value 0. + +In Perl 5 and Lua, this division produces the floating-point value 0.6. (This +happens to be Parrot's native behavior as well.) + +In Common Lisp, this division produces "3/5", a number of type C<RATIO> with +numerator 3 and denominator 5 that represents the mathematically-exact result. + +Furthermore, no Perl 5 code, when given two integers to divide, will expect a +Common Lisp ratio as a result. Any Perl 5 implementation that does this has a +bug, even if both those integers happen to come from Common Lisp. Ditto for a +floating-point result from Common Lisp code that happens to get two integers +from Perl or Lua (or both!). + +Even though these languages all use "/" to represent division, they do not all +mean the same thing by it, and similarly for most (if not all) other built-in +arithmetic operators. However, they pretty clearly B<do> mean the same thing +by (e.g.) "the integer with value five," so there is no need to represent the +inputs to these operations differently; they can all be represented by the +same C<Integer> PMC class. + +{{ Must also discuss morphing: If some languages do it and other do not, then +care must be taken at the boundaries. -- rgr, 31-Jul-08. }} + +=head3 Defining new scalar data types + +There will be cases where existing Parrot PMC classes cannot represent a +primitive HLL scalar type, and so a new PMC class is required. In this case, +interoperability cannot be guaranteed, since it may not be possible to define +behavior for such objects in other languages. But the choice of a new PMC is +forced, so we must make the best of it. + +A good case in point is that of complex rational numbers in Common Lisp. The +C<Complex> type provided by Parrot assumes that its components are +floating-point numbers. This is a suitable representation type for C<(COMPLEX +REAL)>, but CL partitions "COMPLEX" into C<(COMPLEX REAL)> and C<(COMPLEX +RATIONAL)>, with the latter being further divided into C<(COMPLEX RATIO)>, +C<(COMPLEX INTEGER)>, etc. The straighforward way to provide this +functionality is to define a C<ComplexRational> PMC that is built on +C<Complex> and has real and imaginary PMC components that are constrained to +be Integer, Bigint, or Ratio PMCs. + +So how do we make C<(COMPLEX RATIONAL)> arithmetic work as broadly as +possible? + +The first aspect is defining how the new type actually works within its own +language. The Lisp arithmetic operators will usually return a ComplexRational +if given one, but need to return a RATIONAL subtype if the imaginary part is +zero, and that may not be suitable for other languages, so Lisp needs its own +set of basic arithmetic operators. We must therefore define methods on these +multis that specialize ComplexRational (and probably the generic arithmetic to +redispatch on the type of the real and imaginary parts; you know the drill). +But, in case we are also passed another operand that is another language's +exotic type, we should take care to use the most general possible class to +specialize the other operands, in the hope that other exotics are subclasses +of these. + +The other aspect is extending other languages' arithmetic to do something +reasonable with our exotic types. If we're lucky, Parrot will provide a basic +multisub that takes care of most cases, and we just need to add method(s) to +that. If not, we will have to add specialized methods on the other language's +multisub, trying to redispatch to the other language's arithmetic ops passing +the (hopefully more generic) component PMCs. Doing so is still the +responsibility of the language that defines the exotic class, since it is in +charge of its internal representation. + +{{ We can define multimethods on another language without loading it, can't +we? If not, then making this work may require negotiation between language +implementors, if it is feasible at all. -- rgr, 31-Jul-08. }} + +This brings us to a number of guidelines for defining language-specific +arithmetic so as to maximize interoperability: + +=over 4 + +=item 1. + +Define language-specific operations using multimethods (to avoid conflict with +other languages). + +=item 2. + +Define them on the highest (most general) possible PMC classes (in order that +they continue to work if passed a subclass by a call from a different +language). + +=item 3. + +Don't define a language-specific PMC class unless there is clear need for a +different internal representation. (And even then, you might consider +donating it to become part of the Parrot core.) + +=back + +The rest of this section details exceptions and caveats in dealing with scalar +data types. + +=head3 "Fuzzy" scalars + +Some languages are willing to coerce strings to numbers and vice versa without +any special action on the part of the programmer and others are not. The +problem arises when such "fuzzy" scalars are passed (or returned) to languages +that do not support "fuzzy" coercion . . . + +{{ This section is meant to answer Geoffrey's "What does Lisp do with a Perl 5 +Scalar?" question. I gotta think about this more. -- rgr, 29-Jul-08. }} + +=head3 C<Complex> numbers + +Not all languages support complex numbers, so if an exported function requires +a complex argument, it should either throw a suitable error, or coerce an +acceptable numeric argument. In the latter case, be sure to advertise this in +the documentation, so that callers without complex numbers can tell their +compiler that acceptable numeric type. + +=head3 C<Ratio> numbers + +Not all languages support ratios (rather few, actually), so if an exported +function requires a ratio as an argument, it should either throw a suitable +error, or convert an acceptable numeric value. + +However, since ratios are rare (and it is rather eccentric for a program to +insist on a ratio as a parameter), it is strongly advised to accept a floating +point or integer value, and convert it in the wrapper. + + {{ Parrot does not support these yet, so this is not a current issue. -- + rgr, 28-Jul-08. }} + +=head2 Aggregate data types + +{{ I probably haven't done these issues justice; I don't know enough Java or +Tcl to grok this part of the list discussion. -- rgr, 28-Jul-08. }} + +Aggregates (hashes, arrays, and struct-like thingies) can either be passed +directly, or mapped by wrapper or caller code into something different. The +problem with mapping, besides being slow, is that if I<either> the caller or +the callee does this, the aggregate is effectively read-only. (It is possible +for the wrapper to stuff the changes back in the original structure by side +effect, but this has its own set of problems.) + +In other words, if the mapping is not straightforward, it may not be possible. +If the mapping C<is> straightforward it may not be necessary -- and an +unnecessary mapping may limit use of the called module's API. + +Struct-like objects are problematic. They are normally considered as +low-level and language-specific, and handled by emitting special code for slot +accessor/setter function, which other language compilers won't necessarily +know how to do. The choices are therefore to (a) treat them like black boxes +in the other language, or (b) provide a separate functional or OO API (or +both) for calling from other languages. + +Several questions arise for languages with multiple representations for +aggregate types. Typically, this is because these types are more restricted +in some fashion. [finish. -- rgr, 29-Jul-08.] + +=head2 Functional data types + +In a sense, functional types (i.e. callable objects) are the easiest things to +pass across languages, since they require no mapping at all. On the other +hand, if a language doesn't support functional arguments, then there is no +hope of using an API written in another language that requires them. + +=head2 Datum vs. object + +Some languages present everything to the programmer as an object; in such +languages, code only exists in methods. A few languages have no methods, only +functions (and/or subroutines) and "passive" data. The remainder have both, +and pose no problem calling into the others. + +But how does an obligate OO language call a non-OO language, or vice versa? +An extreme case would be Ruby (which has only objects) and Scheme (which (as +far as Ruby is concerned) has none). What good is a Ruby object as a datum to +a Scheme program if Scheme can't access any of the methods? Similarly, what +could Ruby do with a Scheme list when it can't even get to the Scheme C<car> +function? + +{{ Methinks the right thing would be to define a common introspection API (a +good thing in its own right). Scheme and Ruby should each define their own +implementation of the same in "plain Parrot semantics" terms, independently. +The caller can then use his/her language's binding of the introspection API to +poke around in the other module, and find the necessary tools to call the +other. For Scheme, this would mean functions for finding Ruby classes and +providing functional wrappers around methods. For Ruby, I admit this would +probably be even wierder. In any case, it is important that the calling user +not need anything out of the ordinary, from either language or the called +module author. -- rgr, 29-Jul-08. }} + +=head3 Defining methods across language boundaries + +{{ Is the term "unimethod" acceptable here? -- rgr, 29-Jul-08. }} + +There will be cases where a module user wants to extend that module by +defining a new method on an externally-defined class, or add a multimethod to +an externally-defined multisub. Since a class with unimethod dispatch belongs +wholly to the external language, the calling language (i.e. the one adding the +method) must use the semantics of the external language. If the external +language uses a significantly different metamodel, simply adding the +C<:method> pragma may not cut it. + +There are two cases: (1) The calling language is adding a new method, which +cannot therefore interfere with existing usage in the called language; and (2) +the calling language is attempting to extend an existing interface provided by +the called language. In the first case, the calling compiler has the option +of treating the new method as part of the calling language, and dispensing +with the glue altogether. In the second case, the compiler must treat the new +method as part of the foreign language, and provide B<both> glue layers (as +necessary) around it. It is therefore not expected that all compilers will +provide a way to define methods on all foreign classes for all language pairs. + +Multimethods are easier; although the multisub does belong conceptually to one +language (from whose namespace the caller must find the multisub), multis are +more loosely coupled to their original language. + +The cases for multimethods are similar, though: (1) If the calling language +method is specialized to classes that appear only in the calling module, then +other uses of the multisub will never call the new method, and the calling +language can choose to treat as internal. (2) If the calling method is +specialized only on Parrot or called-language classes, then the compiler +should take care to make it generally usable. + +=head3 Subclassing across language boundaries + +{{ This is an important feature, but requires compatible metamodels. -- rgr, +29-Jul-08. }} + +=head3 Method vs. multimethod + +{{ This is the issue where some languages (e.g. Common Lisp) use only +multimethods, where others (e.g. Ruby) use only unimethods. (S04 says +something about MMD "falling back" to unimethods, but so far this is not +described in Parrot.) Calling is easy; multimethods look like functions, so +the MM language just has to create a function (or MM) wrapper for the UM +language, and a UM language can similarly treat a MM call as a normal function +call. (Which will require the normal "make the function look like a method" +hack for obligate OO languages like Ruby.) Defining methods across the +boundary is harder, and may not be worth the trouble. -- rgr, 29-Jul-08. }} + +=cut + +__END__ +Local Variables: + fill-column:78 +End: