Author: allison
Date: Thu Aug 28 09:09:48 2008
New Revision: 30620

Added:
   trunk/docs/pdds/draft/pdd31_hll_interop.pod

Log:
[pdd] Adding an early draft PDD for HLL interoperability, from Bob Rogers.


Added: trunk/docs/pdds/draft/pdd31_hll_interop.pod
==============================================================================
--- (empty file)
+++ trunk/docs/pdds/draft/pdd31_hll_interop.pod Thu Aug 28 09:09:48 2008
@@ -0,0 +1,466 @@
+# Copyright (C) 2008, The Perl Foundation.
+# $Id: $
+
+=head1 NAME
+
+docs/pdds/pddxx_language_interop.pod - Inter-language calling
+
+=head1 VERSION
+
+$Revision: 28231 $
+
+=head1 ABSTRACT
+
+This PDD describes Parrot's conventions and support for communication between
+high-level languages (HLLs).  It is focused mostly on what implementors should
+do in order to provide this capability to their users.
+
+=head1 DESCRIPTION
+
+The ability to mix different high-level languages at runtime has always been
+an important design goal of Parrot.  Another important goal, that of
+supporting all dynamic languages, makes language interoperability especially
+interesting -- where "interesting" means the same as it does in the Chinese
+curse, "May you live in interesting times."  It is expected that language
+implementers, package authors, and package users will have to be aware of
+language boundaries when writing their code.  It is hoped that this will not
+become too burdensome.
+
+None of what follows is binding on language implementors, who may do whatever
+they please.  Nevertheless, we hope they will at least follow the spirit of
+this document so that the code they produce can be used by the rest of the
+Parrot community, and save the fancy footwork for intra-language calling.
+However, this PDD B<is> binding on Parrot implementors, who must provide a
+stable platform for language interoperability to the language implementors.
+
+=head2 Ground rules
+
+In order to avoid N**2 complexity and the resulting coordination headaches,
+each language compiler provides an interface as a target for other languages
+that should be designed to require a minimum of translation.  In the general
+case, some translation may be required by both the calling language and the
+called language:
+
+        |
+       |
+        |                        Calling sub
+        |                             |
+        |   Language X                |
+        |                             V
+        |                        Calling stub
+        +================             |
+                                      |
+          "plain Parrot"              |
+                                      |
+        +================             |
+        |                            V
+        |                      Called wrapper
+        |                             |
+        |                             |
+        |   Language Y                V
+        |                         Called sub
+       |                
+
+Where necessary, a language may need to provide a "wrapper" sub to interface
+external calls to the language's internal calling and data representation
+requirements.  Such wrappers are free to do whatever translation is required.
+
+Similarly, the caller may need to emit a stub that converts an internal call
+into something more generic.
+
+{{ Of course, "stub" is really too close to "sub", so we should find a better
+word.  Doesn't the C community call these "bounce routines"?  Or something?
+-- rgr, 31-Jul-08. }}
+
+{{ I am discovering that there are five different viewpoints here,
+corresponding to the five layers (including "plain Parrot") of the diagram
+above.  I need to make these viewpoints clearer, and describe the
+responsibilities of each of these parties to each other.  -- rgr,
+31-Jul-08. }}
+
+Languages are free to implement the stub and wrapper layers (collectively
+called "glue") as they see fit.  In particular, they may be inlined in the
+caller, or integral to the callee.
+
+Ideally, of course, the "plain Parrot" layer will be close enough to the
+semantics of both languages that glue code is unnecesary, and the call can be
+made directly.  Language implementors are encouraged to dispense with glue
+whenever possible, even if glue is sometimes required for the general case.
+
+In summary:
+
+=over 4
+
+=item *
+
+Each HLL gets its own namespace subtree, within which C<get_hll_global> and
+C<set_hll_global> operate.  In order to make external calls, the HLL must
+provide a means of identifying the language, the function, and enough
+information about the arguments and return values for the calling language to
+generate the call correctly.  This is necessarily language-dependent, and is
+beyond the scope of this document.
+
+=item *
+
+When calling across languages, both the caller and the callee should try to
+use "plain Parrot semantics" to the extent possible.  This is explained in
+more detail below, but essentially means to use the simplest calling
+conventions and PMC classes possible.  Ideally, if an API uses only PMCs that
+are provided by a "bare Parrot" (i.e. one without any HLL runtime code), then
+it should be possible to use this API from any other language.
+
+=item *
+
+It is acceptable for languages to define subs for internal calling that are
+not suitable for external calling.  Such subs should be marked as such, and
+other languages should respect those distinctions.  (Or, if they choose to
+call intra-language subs, they should be very sure they understand that
+language's calling conventions.
+
+=back
+
+=head1 HALF-BAKED IDEAS
+
+{{ Every draft PDD should have one of these.  ;-}  -- rgr, 28-Jul-08.  }}
+
+=head2 Common syntax for declaring exported functions?
+
+I assume we will need some additional namespace support.  Not clear yet
+whether it's better to mark the ones that or OK for external calling, or the
+ones that are not.
+
+(As you can guess, I don't have a strong suggestion for what to call these
+functions yet.  Do we call them "external"?  Would that get confused with
+intra-language public interfaces?)
+
+Beyond that, we probably need additional metainformation on the external subs
+so that calling compilers will know what code to emit.  Putting them on the
+subs means that the calling compiler just needs to load the PBC in order to
+access the module API (though it may need additional hints).  Of course, that
+also requires a PIR API for accessing this metainformation . . .
+
+Crazy idea:  This is more or less the same information (typing) required for
+multimethods.  If we encourage the export of multisubs, then the exporting
+language could provide multiple interfaces, and the calling compiler could
+query the set of methods for the one most suitable.
+
+=head2 More namespace complexity?
+
+It might be good to have some way for HLLs to define a separate external
+definition for a given sub (i.e. one that provides the wrapper) that can be
+done without too much namespace hair.  I.e.
+
+       .sub foo :extern
+
+defines the version that is used by interlanguage calling, and
+
+       .sub foo
+
+defines the version that is seen by other code written in that language
+(i.e. via C<get_hll_global>).  If there is no plain C<foo>, the C<:extern>
+version is used for internal calls.  That way, the compiler can emit both
+wrapper code and internal code without having to do anything special (much),
+even if different calling conventions and/or data conversions are required.
+
+{{ Of course, this wouldn't be necessary if all external subs were multisubs.
+-- rgr, 31-Jul-08. }}
+
+=head2 Multiple type hierarchies?
+
+Different languages will have to "dress up" the Parrot type/class hierarchy
+differently.  For example, Common Lisp specifies that C<STRING> is a subtype
+of C<VECTOR>, which in turn is a subtype of C<ARRAY>.  This is not likely to
+be acceptable to other languages, so Lisp needs its own view of type
+relationships, which must affect multimethod dispatch for Lisp generic
+functions, i.e. a method defined for C<VECTOR> must be considered when passed
+a string as a parameter.
+
+The language that owns the multisub gets to define the type hierarchy and
+dispatch rules used when it gets called.  In order to handle objects from
+foreign languages, the "owning" language must decide where to graft the
+foreign class inheritance graph into its own graph.  {{ It would be nice if
+some Parrot class, e.g. C<Object>, could be defined as the conventional place
+to root language-specific object class hierarchies; that way, a language would
+only have to include C<Object> in order to incorporate objects from all other
+conforming languages.  -- rgr, 26-Aug-08. }}
+
+Note that common Parrot classes will in general appear in different places in
+different languages' dispatch hierarchies, so it is important to bear in mind
+which language "owns" the dispatch.
+
+=head1 DEFINITIONS
+
+{{ Collect definitions of new jargon words here, once we figure out what they
+should be.  -- rgr, 29-Jul-08. }}
+
+=head1 IMPLEMENTATION
+
+=head2 Plain Parrot Semantics
+
+Fortunately, "plain Parrot" is pretty powerful, so the "common denominator" is
+not in fact the lowest possible.  For example, not all Parrot languages
+support named, optional, or repeated arguments.  For the called language, this
+is never a problem; calling module can only use the subset API anyway.
+Implementers of subset calling languages are encouraged to provide their users
+with an extended API for the interlanguage call; typically, this is only
+required for named arguments.
+
+{{ This needs more?  -- rgr, 28-Jul-08. }}
+
+=head2 Strings
+
+    {{ I am probably not competent to write this section.  At the very least,
+    it requires discussion of languages that expect strings to be mutable
+    versus . . . Java.  -- rgr, 28-Jul-08. }}
+
+=head2 Other scalar data types
+
+All Parrot language implementations should stick to native Parrot PMC types
+for scalar data, except in case of dire need.  To see with this is so, take
+the particular case of integer division, which differs significantly between
+languages.
+
+In Tcl, "the integer three divided by the integer five" produces the integer
+value 0.
+
+In Perl 5 and Lua, this division produces the floating-point value 0.6.  (This
+happens to be Parrot's native behavior as well.)
+
+In Common Lisp, this division produces "3/5", a number of type C<RATIO> with
+numerator 3 and denominator 5 that represents the mathematically-exact result.
+
+Furthermore, no Perl 5 code, when given two integers to divide, will expect a
+Common Lisp ratio as a result.  Any Perl 5 implementation that does this has a
+bug, even if both those integers happen to come from Common Lisp.  Ditto for a
+floating-point result from Common Lisp code that happens to get two integers
+from Perl or Lua (or both!).
+
+Even though these languages all use "/" to represent division, they do not all
+mean the same thing by it, and similarly for most (if not all) other built-in
+arithmetic operators.  However, they pretty clearly B<do> mean the same thing
+by (e.g.) "the integer with value five," so there is no need to represent the
+inputs to these operations differently; they can all be represented by the
+same C<Integer> PMC class.
+
+{{ Must also discuss morphing:  If some languages do it and other do not, then
+care must be taken at the boundaries.  -- rgr, 31-Jul-08. }}
+
+=head3 Defining new scalar data types
+
+There will be cases where existing Parrot PMC classes cannot represent a
+primitive HLL scalar type, and so a new PMC class is required.  In this case,
+interoperability cannot be guaranteed, since it may not be possible to define
+behavior for such objects in other languages.  But the choice of a new PMC is
+forced, so we must make the best of it.
+
+A good case in point is that of complex rational numbers in Common Lisp.  The
+C<Complex> type provided by Parrot assumes that its components are
+floating-point numbers.  This is a suitable representation type for C<(COMPLEX
+REAL)>, but CL partitions "COMPLEX" into C<(COMPLEX REAL)> and C<(COMPLEX
+RATIONAL)>, with the latter being further divided into C<(COMPLEX RATIO)>,
+C<(COMPLEX INTEGER)>, etc.  The straighforward way to provide this
+functionality is to define a C<ComplexRational> PMC that is built on
+C<Complex> and has real and imaginary PMC components that are constrained to
+be Integer, Bigint, or Ratio PMCs.
+
+So how do we make C<(COMPLEX RATIONAL)> arithmetic work as broadly as
+possible?
+
+The first aspect is defining how the new type actually works within its own
+language.  The Lisp arithmetic operators will usually return a ComplexRational
+if given one, but need to return a RATIONAL subtype if the imaginary part is
+zero, and that may not be suitable for other languages, so Lisp needs its own
+set of basic arithmetic operators.  We must therefore define methods on these
+multis that specialize ComplexRational (and probably the generic arithmetic to
+redispatch on the type of the real and imaginary parts; you know the drill).
+But, in case we are also passed another operand that is another language's
+exotic type, we should take care to use the most general possible class to
+specialize the other operands, in the hope that other exotics are subclasses
+of these.
+
+The other aspect is extending other languages' arithmetic to do something
+reasonable with our exotic types.  If we're lucky, Parrot will provide a basic
+multisub that takes care of most cases, and we just need to add method(s) to
+that.  If not, we will have to add specialized methods on the other language's
+multisub, trying to redispatch to the other language's arithmetic ops passing
+the (hopefully more generic) component PMCs.  Doing so is still the
+responsibility of the language that defines the exotic class, since it is in
+charge of its internal representation.
+
+{{ We can define multimethods on another language without loading it, can't
+we?  If not, then making this work may require negotiation between language
+implementors, if it is feasible at all.  -- rgr, 31-Jul-08. }}
+
+This brings us to a number of guidelines for defining language-specific
+arithmetic so as to maximize interoperability:
+
+=over 4
+
+=item 1.
+
+Define language-specific operations using multimethods (to avoid conflict with
+other languages).
+
+=item 2.
+
+Define them on the highest (most general) possible PMC classes (in order that
+they continue to work if passed a subclass by a call from a different
+language).
+
+=item 3.
+
+Don't define a language-specific PMC class unless there is clear need for a
+different internal representation.  (And even then, you might consider
+donating it to become part of the Parrot core.)
+
+=back
+
+The rest of this section details exceptions and caveats in dealing with scalar
+data types.
+
+=head3 "Fuzzy" scalars
+
+Some languages are willing to coerce strings to numbers and vice versa without
+any special action on the part of the programmer and others are not.  The
+problem arises when such "fuzzy" scalars are passed (or returned) to languages
+that do not support "fuzzy" coercion . . .
+
+{{ This section is meant to answer Geoffrey's "What does Lisp do with a Perl 5
+Scalar?" question.  I gotta think about this more.  -- rgr, 29-Jul-08.  }}
+
+=head3 C<Complex> numbers
+
+Not all languages support complex numbers, so if an exported function requires
+a complex argument, it should either throw a suitable error, or coerce an
+acceptable numeric argument.  In the latter case, be sure to advertise this in
+the documentation, so that callers without complex numbers can tell their
+compiler that acceptable numeric type.
+
+=head3 C<Ratio> numbers
+
+Not all languages support ratios (rather few, actually), so if an exported
+function requires a ratio as an argument, it should either throw a suitable
+error, or convert an acceptable numeric value.
+
+However, since ratios are rare (and it is rather eccentric for a program to
+insist on a ratio as a parameter), it is strongly advised to accept a floating
+point or integer value, and convert it in the wrapper.
+
+    {{ Parrot does not support these yet, so this is not a current issue.  --
+    rgr, 28-Jul-08. }}
+
+=head2 Aggregate data types
+
+{{ I probably haven't done these issues justice; I don't know enough Java or
+Tcl to grok this part of the list discussion.  -- rgr, 28-Jul-08. }}
+
+Aggregates (hashes, arrays, and struct-like thingies) can either be passed
+directly, or mapped by wrapper or caller code into something different.  The
+problem with mapping, besides being slow, is that if I<either> the caller or
+the callee does this, the aggregate is effectively read-only.  (It is possible
+for the wrapper to stuff the changes back in the original structure by side
+effect, but this has its own set of problems.)
+
+In other words, if the mapping is not straightforward, it may not be possible.
+If the mapping C<is> straightforward it may not be necessary -- and an
+unnecessary mapping may limit use of the called module's API.
+
+Struct-like objects are problematic.  They are normally considered as
+low-level and language-specific, and handled by emitting special code for slot
+accessor/setter function, which other language compilers won't necessarily
+know how to do.  The choices are therefore to (a) treat them like black boxes
+in the other language, or (b) provide a separate functional or OO API (or
+both) for calling from other languages.
+
+Several questions arise for languages with multiple representations for
+aggregate types.  Typically, this is because these types are more restricted
+in some fashion.  [finish.  -- rgr, 29-Jul-08.]
+
+=head2 Functional data types
+
+In a sense, functional types (i.e. callable objects) are the easiest things to
+pass across languages, since they require no mapping at all.  On the other
+hand, if a language doesn't support functional arguments, then there is no
+hope of using an API written in another language that requires them.
+
+=head2 Datum vs. object
+
+Some languages present everything to the programmer as an object; in such
+languages, code only exists in methods.  A few languages have no methods, only
+functions (and/or subroutines) and "passive" data.  The remainder have both,
+and pose no problem calling into the others.
+
+But how does an obligate OO language call a non-OO language, or vice versa?
+An extreme case would be Ruby (which has only objects) and Scheme (which (as
+far as Ruby is concerned) has none).  What good is a Ruby object as a datum to
+a Scheme program if Scheme can't access any of the methods?  Similarly, what
+could Ruby do with a Scheme list when it can't even get to the Scheme C<car>
+function?
+
+{{ Methinks the right thing would be to define a common introspection API (a
+good thing in its own right).  Scheme and Ruby should each define their own
+implementation of the same in "plain Parrot semantics" terms, independently.
+The caller can then use his/her language's binding of the introspection API to
+poke around in the other module, and find the necessary tools to call the
+other.  For Scheme, this would mean functions for finding Ruby classes and
+providing functional wrappers around methods.  For Ruby, I admit this would
+probably be even wierder.  In any case, it is important that the calling user
+not need anything out of the ordinary, from either language or the called
+module author.  -- rgr, 29-Jul-08. }}
+
+=head3 Defining methods across language boundaries
+
+{{ Is the term "unimethod" acceptable here?  -- rgr, 29-Jul-08. }}
+
+There will be cases where a module user wants to extend that module by
+defining a new method on an externally-defined class, or add a multimethod to
+an externally-defined multisub.  Since a class with unimethod dispatch belongs
+wholly to the external language, the calling language (i.e. the one adding the
+method) must use the semantics of the external language.  If the external
+language uses a significantly different metamodel, simply adding the
+C<:method> pragma may not cut it.
+
+There are two cases:  (1) The calling language is adding a new method, which
+cannot therefore interfere with existing usage in the called language; and (2)
+the calling language is attempting to extend an existing interface provided by
+the called language.  In the first case, the calling compiler has the option
+of treating the new method as part of the calling language, and dispensing
+with the glue altogether.  In the second case, the compiler must treat the new
+method as part of the foreign language, and provide B<both> glue layers (as
+necessary) around it.  It is therefore not expected that all compilers will
+provide a way to define methods on all foreign classes for all language pairs.
+
+Multimethods are easier; although the multisub does belong conceptually to one
+language (from whose namespace the caller must find the multisub), multis are
+more loosely coupled to their original language.
+
+The cases for multimethods are similar, though:  (1) If the calling language
+method is specialized to classes that appear only in the calling module, then
+other uses of the multisub will never call the new method, and the calling
+language can choose to treat as internal.  (2) If the calling method is
+specialized only on Parrot or called-language classes, then the compiler
+should take care to make it generally usable.
+
+=head3 Subclassing across language boundaries
+
+{{ This is an important feature, but requires compatible metamodels.  -- rgr,
+29-Jul-08. }}
+
+=head3 Method vs. multimethod
+
+{{ This is the issue where some languages (e.g. Common Lisp) use only
+multimethods, where others (e.g. Ruby) use only unimethods.  (S04 says
+something about MMD "falling back" to unimethods, but so far this is not
+described in Parrot.)  Calling is easy; multimethods look like functions, so
+the MM language just has to create a function (or MM) wrapper for the UM
+language, and a UM language can similarly treat a MM call as a normal function
+call.  (Which will require the normal "make the function look like a method"
+hack for obligate OO languages like Ruby.)  Defining methods across the
+boundary is harder, and may not be worth the trouble.  -- rgr, 29-Jul-08. }}
+
+=cut
+
+__END__
+Local Variables:
+  fill-column:78
+End:

Reply via email to