Re: [cellml-discussion] Using CellML to represent huge CellML models:Has anyone worked on this already?

2007-04-23 Thread David Nickerson
 I am working on developing a CellML model (using external code) of 
 transcriptional control  in yeast which is 23 MB in size. I hope to 
 eventually do a similar thing for organisms which have much more 
 complicated sets of interactions, in which case this size may grow 
 substantially.

so you have 23MB of XML? Cool! Even combining all my models I have less 
than 7MB, and even then I'm sure that figure includes some simulation 
results.

I guess an interesting test would be uploading it to the model 
repository to see how that handles such a large model (presuming you 
have a CellML 1.0 model).

 If anyone on this list is interested in similar problems (I presume 
 similar issues come up in a range of systems biology problems, whether 
 you are working with CellML or SBML), I would welcome your feedback and 
 suggestions, and perhaps we could collaborate .

I really have no idea what an transcriptional control in yeast model 
looks like, but my initial thought would be to abstract out any similar 
math and import common declarations - I'm guessing you have already done 
this if its possible.

 This creates some unique issues for CellML processing tools:
 1) Just parsing the CellML model (especially with a DOM-type parser 
 which stores all the nodes into a tree, but probably with any type of 
 parser) is very slow.

it might be interesting to look at doing some simple task to check the 
performance of DOM vs SAX based tools? I have found in the past that 
with 500MB fieldML files that the SAX parser used in CMGUI was quite 
fast at parsing the file - especially if you go from a gzip compressed file.

 2) The CellML model might not all fit in memory at the same time, 
 especially if the model gets to be multi-gigabyte. It might be possible 
 to make use of swap to deal with this, but if the algorithms don't have 
 explicit control over when things are swapped in and out, it will be 
 hard to work with such a model.

I think if you have a model getting that large then there needs to be 
some serious thinking about how to handle such models...but generally 
can't you just let the OS worry about swapping in and out as required? 
Or would you expect a customised scheme for a particular application to 
be more efficient?

 C) Another leaner API, read-only CellML API (perhaps based off the same 
 IDLs, but with certain functionality, like the ability to modify the 
 model, or set mutation event listeners, unavailable). We could add a 
 SAX-style event dispatcher instead, to allow users to save any 
 information they do want from extension elements, which will also not be 
 kept in the model. Comments, white-space, and so on would all be 
 stripped unlike in the current CellML API implementation. Tools which 
 are currently using the full CellML API but only require read-only 
 access (e.g. the CCGS) might be able to just 'flick the switch' and 
 benefit from the leaner API.

This would probably be beneficial even for those of us without such 
large models - especially if it is as easy as flicking a switch to swap 
between the complete and restricted implementations.


Andre.
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Using CellML to represent huge CellML models:Has anyone worked on this already?

2007-04-23 Thread Andrew Miller
David Nickerson wrote:
 I am working on developing a CellML model (using external code) of 
 transcriptional control  in yeast which is 23 MB in size. I hope to 
 eventually do a similar thing for organisms which have much more 
 complicated sets of interactions, in which case this size may grow 
 substantially.
 

 so you have 23MB of XML? Cool! Even combining all my models I have less 
 than 7MB, and even then I'm sure that figure includes some simulation 
 results.
   
My model is entirely generated from experimental data, none of it is 
written by hand (aside from a one-page script used to generate CellML 
from the relational database).
 I guess an interesting test would be uploading it to the model 
 repository to see how that handles such a large model (presuming you 
 have a CellML 1.0 model).
   
It is currently a CellML 1.0 model. I'm not sure I want to break the 
live Plone, however. I'm not sure it is much use to anyone else at this 
stage, however.
   
 If anyone on this list is interested in similar problems (I presume 
 similar issues come up in a range of systems biology problems, whether 
 you are working with CellML or SBML), I would welcome your feedback and 
 suggestions, and perhaps we could collaborate .
 

 I really have no idea what an transcriptional control in yeast model 
 looks like, but my initial thought would be to abstract out any similar 
 math and import common declarations - I'm guessing you have already done 
 this if its possible.
   
My model only has machine-learning external-code in it, it doesn't have 
any equations at the moment. Just to give you an idea of what it looks 
like...


model xmlns=http://www.cellml.org/cellml/1.0#; name=interactions
  component name=PAU8
variable name=sig_PAU8 initial_value=0 units=signal_level 
public_interface=out/
variable name=sig_SUT1 units=signal_level public_interface=in/
variable name=sig_STE12 units=signal_level public_interface=in/
variable name=sig_ADR1 units=signal_level public_interface=in/
variable name=sig_YAP5 units=signal_level public_interface=in/
variable name=sig_RME1 units=signal_level public_interface=in/
variable name=sig_TEC1 units=signal_level public_interface=in/
variable name=sig_SWI5 units=signal_level public_interface=in/
variable name=sig_ARR1 units=signal_level public_interface=in/
variable name=sig_MET31 units=signal_level public_interface=in/
variable name=sig_RLM1 units=signal_level public_interface=in/
variable name=sig_INO4 units=signal_level public_interface=in/
variable name=sig_RAP1 units=signal_level public_interface=in/
variable name=sig_MOT3 units=signal_level public_interface=in/
math xmlns=http://www.w3.org/1998/Math/MathML;
  applyeq/
cisig_PAU8/ci
apply
  csymbol 
definitionURL=http://www.bioeng.auckland.ac.nz/people/miller/black_box/k-nearest-neighbours;blackbox/csymbol
  cisig_SUT1/ci
  cisig_STE12/ci
  cisig_ADR1/ci
  cisig_YAP5/ci
  cisig_RME1/ci
  cisig_TEC1/ci
  cisig_SWI5/ci
  cisig_ARR1/ci
  cisig_MET31/ci
  cisig_RLM1/ci
  cisig_INO4/ci
  cisig_RAP1/ci
  cisig_MOT3/ci
/apply
  /apply
/math
  /component
  component name=YAL067W_A
variable name=sig_YAL067W_A initial_value=0 
units=signal_level public_interface=out/
variable name=sig_SPT23 units=signal_level public_interface=in/
variable name=sig_STE12 units=signal_level public_interface=in/
variable name=sig_DAL80 units=signal_level public_interface=in/
variable name=sig_YAP5 units=signal_level public_interface=in/
variable name=sig_BAS1 units=signal_level public_interface=in/
variable name=sig_DIG1 units=signal_level public_interface=in/
variable name=sig_PHO2 units=signal_level public_interface=in/
variable name=sig_HAP2 units=signal_level public_interface=in/
variable name=sig_PHD1 units=signal_level public_interface=in/
variable name=sig_GLN3 units=signal_level public_interface=in/
math xmlns=http://www.w3.org/1998/Math/MathML;
  applyeq/
cisig_YAL067W_A/ci
apply
  csymbol 
definitionURL=http://www.bioeng.auckland.ac.nz/people/miller/black_box/k-nearest-neighbours;blackbox/csymbol
  cisig_SPT23/ci
  cisig_STE12/ci
  cisig_DAL80/ci
  cisig_YAP5/ci
  cisig_BAS1/ci
  cisig_DIG1/ci
  cisig_PHO2/ci
  cisig_HAP2/ci
  cisig_PHD1/ci
  cisig_GLN3/ci
/apply
  /apply
/math
  /component
  ...

Note that the initial_value=0 is a place-holder,
I could abstract out my blackbox function calls based on the number of 
parameters (it is variable, from 1 through to 41, in this case, although 
there is no theoretical limit on how many putative transcription factors 
could affect a signal). However, I suspect that this would not solve the 
performance problems (it takes 

Re: [cellml-discussion] Dimensional consistency andunitsconversions (was [Fwd: Re: ten Tusscher model])

2007-04-23 Thread Matt
So one way out to avoid having to hope that software does the right
thing is to make it compulsory that there is units consistency for
each dimension across all variables (defined in variable elements) in
a CellML component so that the only units conversions that need to
take place are at the interfaces. Quantities that are dimensionless
after simplification aren't going to affect dimensional analysis of
the math, so we would be safe there.




On 4/23/07, David Nickerson [EMAIL PROTECTED] wrote:
  So where does the problem lie? This says that all you supposedly
  dimensionless constants should have units. Does it need to be clearer
  that you are not allowed to simplify them out into dimensionless
  yourself?

 yes - I think this is the issue. Also that tools shouldn't simplify them
 into dimensionless before doing the multiplication and/or units
 consistency checking.

  Also. What does it matter that some software simplifies them out
  before multiplicating them? So long as it checks units consistency
  prior to simplifying them (if it really needs to do that anyway) then
  the result should be the same.

 yep - that is the key. The units must be there when an application
 checks units consistency.


 Andre.
 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


[cellml-discussion] Proposal: Refactoring the CCGS into smaller, re-usable components

2007-04-23 Thread Andrew Miller
Hi,

I have been wanting to use some functionality present in the CellML Code 
Generation Service, for a quite different type of problem (not a system 
of ODEs). The problem is, the functionality isn't exposed. I think there 
is actually quite a lot of useful functionality in the CCGS which could 
be exposed. I am therefore proposing that the CCGS be split into a 
number of smaller, independently useful components...

1) CUSES: The CellML Unit Simplification And Expansion Service.

This service will allow a string, describing a unit, together with a 
component or units element in which the unit appears, to be passed in. 
It will then return another object, which represents the unit in a 
special canonical. This canonical form will consist of an ordered 
sequence of base units (for some unique ordering of base units). Base 
units will include built-in base units as well as user-defined base 
units. The canonical sequence will also carry multipliers, offsets, and 
exponents for each base unit.

It will be possible to compare to canonical forms of units.

There will also be an option for whether to combine exponents when units 
have a different multiplier, so that, for example, millimetre . 
(millisecond ^ -1) . microsecond will only be simplified to micrometre 
if you turn the option on. I think it is this specific type of 
simplification that has lead to the recent debate over units.

It is hoped that this module will be useful for editing tools, 
validators, and code generators, as well as any other software that 
needs to worked with units. If you work on such software, let me know if 
this sounds useful.

2) CeVAS: The CellML Variable Association Service.

This service provides an efficient way to find all the variables which 
are connected to each other, even if they are different components, and 
perhaps in different imported models.
 It also allows annotations to be made per actual variable (i.e. group 
of connected variables), and it uses CUSES to compute conversion factors 
and offsets needed to convert between one CellML variable and another 
one connected to it.

This should be useful for a range of simulators and code generators, 
hopefully even ones which are not ODE-based.

3) MaLaES: The MathML to Language Expression Service.

This provides facilities for translating individual MathML expressions 
into code in a specific language. The code will provide a programmatic 
interface for setting up tables used to drive the code generation, which 
will aim to support a range of common languages (earlier FORTRANs being 
the hardest, due to the line-length restrictions). The code will use 
CeVAS annotations to look up the name of the variables, so you will be 
able to use arbitrary variable names.

This service will also be able to determine certain information about 
the mathematics, such as returning a set of variables on each side of 
the equation, and determining whether external code is used.

The CCGS will keep the code used to determine the order of the 
expressions, as well as generation of the actual expressions.

I welcome any opinions on whether this is useful, suggested 
improvements, and so on.

Best regards,
Andrew

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


[cellml-discussion] some repository stats

2007-04-23 Thread James Lawson
Hi Folks,

I have now checked every model in the repository in PCEnv and recorded
whether they run, and if so, whether they integrate, or if they don't
run, what the error message is. The great majority contain undefined
variables.

As of 24/04/07:
Out of a total of 250 different models (I only counted once, so make
that plus or minus a few,) 58 models are functional in PCEnv and have
therefore made it to a one star rating. That's 23.2 %, which isn't too bad.

There are 412 files in the repository, so that equates to an average of
1.65 versions/variants per model.

Most of the variants got uploaded automatically from the old repository
as versions, so that's something I'll fix eventually.

As of now, a star next to the model name in the repository list
definitively means that at least one version of that model runs and
integrates in PCEnv.

James
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion