Re: [cellml-discussion] Describing rules for translating expressionsinto arbitrary languages

2007-05-01 Thread David Nickerson
Hi Andrew,

This is looking quite good. I'm just a bit worried about the use of the 
C style arrays and their names in the int definition. I guess there is 
no way around this and that similar functionality would be available for 
other languages. But perhaps there just needs to be something in your 
format specification defining the arrays which are available for the 
developer to include in their translation specification?


Andre.

Andrew Miller wrote:
 Hi,
 
 I am looking at refactoring the CCGS into several components, as 
 discussed in an earlier e-mail. As part of this, I am looking at how I 
 can separate out the language specific parts of code generation. At this 
 stage, I am focusing on how expressions get generated, rather than 
 entire assignments. This will then be combined with code to generate the 
 procedural steps required to evaluate a model. Writing a program to 
 generate code for a new language will then be as simple as iterating 
 through the procedural steps, writing out assignments of expressions 
 into variables, in addition to supplying all the language specific glue 
 to the integrator.
 
 I have defined a file format specification, called MAL (or 
 MathML-language mapping) designed to contain all the information needed 
 to generate expressions for a specific programming language. I would 
 welcome any feedback anyone may have on the specification. I would be 
 particularly interested in hearing if you can think of some extension to 
 the language which is needed to support generation for a certain 
 language. The specification follows...
 
 MAL Format is intended as a succinct but complete description of how to
 translate expressions from MathML into the syntax of another programming
 language. It is intended to be both simpler but more powerful (within the
 problem domain it is trying to address) than more generic approaches such as
 XSLT.
 
 Format:
 The format consists of a series of tags. Each tag has a series of 
 alphanumeric
 characters(the tag name), followed by a collon and a space (: ), 
 followed by a
 series of characters (the tag value). The tag is terminated by a 
 carriage return
 or line-feed character, and the next tag starts at the first character which
 isn't a carriage return or line feed.
 
 Where line-length formatting transforms (such as for FORTRAN 77), a
 post-processing stage must be used to achieve this. The reason for this 
 design
 decision is that expressions alone do not determine line length.
 
 The following tags are defined:
 
 Name: opengroup
 Value: A string which can be appended before another string to force that
   string to have the highest precedence.
 Examples:
   opengroup: (
   Sets the open group string to be (, which is the open group character in
   languages like C.
 
 Name: closegroup
 Value: A string which can be appended after another string to force that
   string to have the highest precedence.
 Examples:
   closegroup: )
   Sets the close group string to be ), which is the close group character in
   languages like C.
 
 Name: The name of any MathML operator.
 Value: A string describing the format. This string shall start with a
   description of operator precedence in the target language, and then 
 describe
   a pattern for generating the target language expression.
 
   A precedence description is specified between #prec[ and ]. The following
   precedence descriptions can be used:
 
   #prec[n(m)] where n and m are integers between 0 and 1000. Sets the outer
   precedence to n (this is a precedence score for the resulting expression),
   and the inner precedence to m (this is a precedence score below which
   operands must be if they are not to require opengroup / closegroup strings
   around them.
 
   #prec[n] where n is an integer is a shorthand for #prec[n(n)]
 
   #prec[H] is a shorthand for #prec[1000(0)].
 
   In an operator description, character sequences which are not matched 
 below
   are written directly out to the output mathematics.
 
   #expri reference the recursive expansion (according to the rules
   in the MAL file) of the ith operand, where i is a positive integer. The
   highest i value present also acts as the number of operands which must be
   present in the MathML to avoid an error.
 
   #exprs[text] expands to the concatenation of each consecutive operand 
 after
   expansion according to the rules. The string text intervenes between 
 operands,
   but is not added before the first operand or after the last.
 
   #logbase expands to the expansion of the logbase element contents. This is
   only valid for log. If no logbase element is found, the string 10 will be
   inserted.
 
   #degree expands to the expansion of the degree element contents. It is 
 only
   valid for root. If no degree element is found, the string 2 will be 
 inserted.
 
   #bvarIndex expands to the text of the bvarIndex annotation (as 
 retrieved by
   the AnnotationSet supplied to MaLaES) on the source of the bound 

Re: [cellml-discussion] Describing rules for translating expressionsinto arbitrary languages

2007-05-01 Thread Andrew Miller
David Nickerson wrote:
 Hi Andrew,

 This is looking quite good. I'm just a bit worried about the use of the 
 C style arrays and their names in the int definition. I guess there is 
 no way around this and that similar functionality would be available for 
 other languages. But perhaps there just needs to be something in your 
 format specification defining the arrays which are available for the 
 developer to include in their translation specification?
   
Hi Andre,

The intention is that MaLaES is a lower-level API, in the sense that it 
only describes how to convert MathML into some sort of flat text-based 
representation of the equations. It does not define any semantics for 
the interpretation of the results, because the semantics are up to the 
user of MaLaES services.

In this case, I have simply translated the table that drives the C code 
generation in CCGS into my new syntax. However, non-C code could find 
another way to write the same thing. For example, if we were generating 
code for a language that supported lambdas and closures, you could write 
a completely different definition, perhaps something like this (for a 
fictitious language, but it should give the idea):

int: #prec[H]evaluateDefiniteIntegral(variables, lambda x: (return #expr1), 
#bvarIndex)


while in a Ruby-style language allowing for blocks, you might write:

int: #prec[H]evaluateDefiniteIntegral variables, :#bvarIndex { |#bvarIndex| 
#expr1 }


I think that solving definite integrals in FORTRAN77 is probably harder, 
because it doesn't allow for function pointers, so you would either have 
to write out a complete numerical integrator for each integral to be 
evaluated (at least the top-level framework, much of the work could be 
pushed down to functions and subroutines which could be shared). 
However, I think that is a FORTRAN77 language issue, so I can't think of 
any features that I could add that would make this any easier (although 
perhaps allowing some sort of utility commands for constructing case 
table functions for FORTRAN, so the definite integral solver only has to 
be written once, might reduce generated code size if there are lots of 
such definite integrals).

Note that the MaLaES user gets to define bvarIndex by annotating the 
variable (it doesn't strictly speaking have to be an index, so I could 
rename it to bvarAnnotation or something).

As you can see, the array names are interpreted by the program which 
calls MaLaES (or any compiler which gets called as a result of this), 
and it is up to the caller to supply any surrounding context, and 
likewise the caller provides the MAL file input.  Therefore, defining 
array names wouldn't make sense at the level of the format specification.

Best regards,
Andrew

 Andre.

 Andrew Miller wrote:
   
 Hi,

 I am looking at refactoring the CCGS into several components, as 
 discussed in an earlier e-mail. As part of this, I am looking at how I 
 can separate out the language specific parts of code generation. At this 
 stage, I am focusing on how expressions get generated, rather than 
 entire assignments. This will then be combined with code to generate the 
 procedural steps required to evaluate a model. Writing a program to 
 generate code for a new language will then be as simple as iterating 
 through the procedural steps, writing out assignments of expressions 
 into variables, in addition to supplying all the language specific glue 
 to the integrator.

 I have defined a file format specification, called MAL (or 
 MathML-language mapping) designed to contain all the information needed 
 to generate expressions for a specific programming language. I would 
 welcome any feedback anyone may have on the specification. I would be 
 particularly interested in hearing if you can think of some extension to 
 the language which is needed to support generation for a certain 
 language. The specification follows...

 MAL Format is intended as a succinct but complete description of how to
 translate expressions from MathML into the syntax of another programming
 language. It is intended to be both simpler but more powerful (within the
 problem domain it is trying to address) than more generic approaches such as
 XSLT.

 Format:
 The format consists of a series of tags. Each tag has a series of 
 alphanumeric
 characters(the tag name), followed by a collon and a space (: ), 
 followed by a
 series of characters (the tag value). The tag is terminated by a 
 carriage return
 or line-feed character, and the next tag starts at the first character which
 isn't a carriage return or line feed.

 Where line-length formatting transforms (such as for FORTRAN 77), a
 post-processing stage must be used to achieve this. The reason for this 
 design
 decision is that expressions alone do not determine line length.

 The following tags are defined:

 Name: opengroup
 Value: A string which can be appended before another string to force that
   string to have the highest precedence.