Re: [cellml-discussion] Describing rules for translating expressionsinto arbitrary languages
Hi Andrew, This is looking quite good. I'm just a bit worried about the use of the C style arrays and their names in the int definition. I guess there is no way around this and that similar functionality would be available for other languages. But perhaps there just needs to be something in your format specification defining the arrays which are available for the developer to include in their translation specification? Andre. Andrew Miller wrote: Hi, I am looking at refactoring the CCGS into several components, as discussed in an earlier e-mail. As part of this, I am looking at how I can separate out the language specific parts of code generation. At this stage, I am focusing on how expressions get generated, rather than entire assignments. This will then be combined with code to generate the procedural steps required to evaluate a model. Writing a program to generate code for a new language will then be as simple as iterating through the procedural steps, writing out assignments of expressions into variables, in addition to supplying all the language specific glue to the integrator. I have defined a file format specification, called MAL (or MathML-language mapping) designed to contain all the information needed to generate expressions for a specific programming language. I would welcome any feedback anyone may have on the specification. I would be particularly interested in hearing if you can think of some extension to the language which is needed to support generation for a certain language. The specification follows... MAL Format is intended as a succinct but complete description of how to translate expressions from MathML into the syntax of another programming language. It is intended to be both simpler but more powerful (within the problem domain it is trying to address) than more generic approaches such as XSLT. Format: The format consists of a series of tags. Each tag has a series of alphanumeric characters(the tag name), followed by a collon and a space (: ), followed by a series of characters (the tag value). The tag is terminated by a carriage return or line-feed character, and the next tag starts at the first character which isn't a carriage return or line feed. Where line-length formatting transforms (such as for FORTRAN 77), a post-processing stage must be used to achieve this. The reason for this design decision is that expressions alone do not determine line length. The following tags are defined: Name: opengroup Value: A string which can be appended before another string to force that string to have the highest precedence. Examples: opengroup: ( Sets the open group string to be (, which is the open group character in languages like C. Name: closegroup Value: A string which can be appended after another string to force that string to have the highest precedence. Examples: closegroup: ) Sets the close group string to be ), which is the close group character in languages like C. Name: The name of any MathML operator. Value: A string describing the format. This string shall start with a description of operator precedence in the target language, and then describe a pattern for generating the target language expression. A precedence description is specified between #prec[ and ]. The following precedence descriptions can be used: #prec[n(m)] where n and m are integers between 0 and 1000. Sets the outer precedence to n (this is a precedence score for the resulting expression), and the inner precedence to m (this is a precedence score below which operands must be if they are not to require opengroup / closegroup strings around them. #prec[n] where n is an integer is a shorthand for #prec[n(n)] #prec[H] is a shorthand for #prec[1000(0)]. In an operator description, character sequences which are not matched below are written directly out to the output mathematics. #expri reference the recursive expansion (according to the rules in the MAL file) of the ith operand, where i is a positive integer. The highest i value present also acts as the number of operands which must be present in the MathML to avoid an error. #exprs[text] expands to the concatenation of each consecutive operand after expansion according to the rules. The string text intervenes between operands, but is not added before the first operand or after the last. #logbase expands to the expansion of the logbase element contents. This is only valid for log. If no logbase element is found, the string 10 will be inserted. #degree expands to the expansion of the degree element contents. It is only valid for root. If no degree element is found, the string 2 will be inserted. #bvarIndex expands to the text of the bvarIndex annotation (as retrieved by the AnnotationSet supplied to MaLaES) on the source of the bound
Re: [cellml-discussion] Describing rules for translating expressionsinto arbitrary languages
David Nickerson wrote: Hi Andrew, This is looking quite good. I'm just a bit worried about the use of the C style arrays and their names in the int definition. I guess there is no way around this and that similar functionality would be available for other languages. But perhaps there just needs to be something in your format specification defining the arrays which are available for the developer to include in their translation specification? Hi Andre, The intention is that MaLaES is a lower-level API, in the sense that it only describes how to convert MathML into some sort of flat text-based representation of the equations. It does not define any semantics for the interpretation of the results, because the semantics are up to the user of MaLaES services. In this case, I have simply translated the table that drives the C code generation in CCGS into my new syntax. However, non-C code could find another way to write the same thing. For example, if we were generating code for a language that supported lambdas and closures, you could write a completely different definition, perhaps something like this (for a fictitious language, but it should give the idea): int: #prec[H]evaluateDefiniteIntegral(variables, lambda x: (return #expr1), #bvarIndex) while in a Ruby-style language allowing for blocks, you might write: int: #prec[H]evaluateDefiniteIntegral variables, :#bvarIndex { |#bvarIndex| #expr1 } I think that solving definite integrals in FORTRAN77 is probably harder, because it doesn't allow for function pointers, so you would either have to write out a complete numerical integrator for each integral to be evaluated (at least the top-level framework, much of the work could be pushed down to functions and subroutines which could be shared). However, I think that is a FORTRAN77 language issue, so I can't think of any features that I could add that would make this any easier (although perhaps allowing some sort of utility commands for constructing case table functions for FORTRAN, so the definite integral solver only has to be written once, might reduce generated code size if there are lots of such definite integrals). Note that the MaLaES user gets to define bvarIndex by annotating the variable (it doesn't strictly speaking have to be an index, so I could rename it to bvarAnnotation or something). As you can see, the array names are interpreted by the program which calls MaLaES (or any compiler which gets called as a result of this), and it is up to the caller to supply any surrounding context, and likewise the caller provides the MAL file input. Therefore, defining array names wouldn't make sense at the level of the format specification. Best regards, Andrew Andre. Andrew Miller wrote: Hi, I am looking at refactoring the CCGS into several components, as discussed in an earlier e-mail. As part of this, I am looking at how I can separate out the language specific parts of code generation. At this stage, I am focusing on how expressions get generated, rather than entire assignments. This will then be combined with code to generate the procedural steps required to evaluate a model. Writing a program to generate code for a new language will then be as simple as iterating through the procedural steps, writing out assignments of expressions into variables, in addition to supplying all the language specific glue to the integrator. I have defined a file format specification, called MAL (or MathML-language mapping) designed to contain all the information needed to generate expressions for a specific programming language. I would welcome any feedback anyone may have on the specification. I would be particularly interested in hearing if you can think of some extension to the language which is needed to support generation for a certain language. The specification follows... MAL Format is intended as a succinct but complete description of how to translate expressions from MathML into the syntax of another programming language. It is intended to be both simpler but more powerful (within the problem domain it is trying to address) than more generic approaches such as XSLT. Format: The format consists of a series of tags. Each tag has a series of alphanumeric characters(the tag name), followed by a collon and a space (: ), followed by a series of characters (the tag value). The tag is terminated by a carriage return or line-feed character, and the next tag starts at the first character which isn't a carriage return or line feed. Where line-length formatting transforms (such as for FORTRAN 77), a post-processing stage must be used to achieve this. The reason for this design decision is that expressions alone do not determine line length. The following tags are defined: Name: opengroup Value: A string which can be appended before another string to force that string to have the highest precedence.