Re: [cellml-discussion] Using proposed CellML 1.2 features to create more re-usable metabolic models

2008-01-30 Thread Michael Cooling
First just to check that in the the connection elements 'flux' should  
be 'fluxes'? For example, 'substance_a' component contains no variable  
'flux', I assume you meant the set_of_lambda_of_real 'fluxes'?

I think I like this! When will it be released?  ;-P

I don't know enough about the lambda elements in MathML to check the  
syntax (BTW which MathML spec would we use for CellML 1.2, MathML  
2.0?) but I think I get the idea and presuming it works like how the  
species are tracking their own ODEs and the reactions just deal in  
fluxes for the species involved. We can do this with cellml 1.0 but  
better to minimise the 'gluing code' of course.

Minor question: what is the purpose of the component abxy_system? Is  
this to help monitor the individual fluxes from one location?

Very minor point: I notice the units for your kb terms should be just  
'per_second'?

directionality - I suspect we can just leave those redundant (in/out)  
tags out?

connections structure - makes me think about something that is more a  
toolset UI issue:
One thing I've found when constructing models is that I don't think  
about connections
from an A-B, then A-C then A-D perspective (letters here are  
components), but by considering all connections from A to other  
components. For example, if I have a new component A  I'd much rather  
declare something like:

connections from A
variable whatever1 to B:whateverinB
variable whatever2 to C:whateverinC
/connections from A

if you see what I mean, rather than hunt around the CellML for each  
variable connection which is, in my opinion, tedious and poor  
workflow. Now I don't think what I've written there is as good a  
structure as the current CellML and I'm not suggesting CellML should  
be changed, but what I'm getting to is I think it would be helpful for  
the PCEnv/COR tool if from the UI a more workflow-friendly linking was  
possible, and it sorted out the more elegant connection components as  
written in CellML from that.

Cheers,


Quoting Andrew Miller [EMAIL PROTECTED]:

 Hi all,

 To aid in working out what features we should include in CellML 1.2, I
 have been looking into one of the major difficulties with creating
 re-usable metabolic models at the moment: that to compute a derivative,
 you need to know all the fluxes, but when a model is extended, new
 fluxes can be added to the model. Ideally, we should be able to define
 CellML 1.2 so that structured types can be leveraged to avoid this issue.

 I have come up with an example of how this might look in CellML 1.2.
 This is based in part off a discussion that I had earlier with Poul, so
 most of the credit for this goes to him, while most of the blame for the
 inelegant parts goes to me.

 Note that the lambda constructs without any bound variables are there to
 defer interpretation of the fluxes until they are taken from the set.
 The rationale for needing this is that if we had two statements that a
 certain real-valued flux belongs in the set, and two of them happened to
 have the same value at a particular point in time, then summing over the
 set would only include the flux value once (e.g. saying x is an element
 of N, where N is in the natural numbers, means that the value of x is a
 natural number. Saying x=5, y=5, x in N, y in N does not mean that 5 is
 in the set of natural numbers twice). On the other hand, we can have two
 different zero argument lambda functions in a set, which just happen to
 evaluate to the same value. This is consistent with the declarative
 nature of CellML - we are not saying 'add this flux to the set of
 fluxes', but rather, we are making a series of statements about what is
 in the set of fluxes, and the processing software is then summing over
 all fluxes which have been explicitly mentioned. Because we can connect
 the set of fluxes up to an importing model, doing things this way gives
 a great deal of extra flexibility.

 Notes:
 1) In practice, the substance could become an import which is re-used
 many times, and likewise for components representing various general
 types of chemical reactions.
 2) I have invented a possible way in which we could remove
 directionality from connections. No one has come up with a formal
 proposal to do it this way yet.
 3) I have followed Randall's earlier suggestion about how to structure
 connections without using component_ref. This again needs a formal
 proposal and discussion.
 4) The model uses a potential way in which we could get rid of the
 generality of groups, by replacing group and relationship_ref with a
 simple encapsulation element. This again needs to be formally proposed
 and discussed at some point.

 ?xml version=1.0 encoding=UTF-8?
 model
   xmlns=http://www.cellml.org/cellml/1.1#;
   xmlns:c11=http://www.cellml.org/cellml/1.1#;
   xmlns:c12=http://www.cellml.org/cellml/1.2#;
   xmlns:m=http://www.w3.org/1998/Math/MathML;
   name=_1_2_example
  
   component name=substance_x
 c12:variable 

Re: [cellml-discussion] Using proposed CellML 1.2 features to create more re-usable metabolic models

2008-01-30 Thread Andrew Miller
Michael Cooling wrote:
 First just to check that in the the connection elements 'flux' should  
 be 'fluxes'? For example, 'substance_a' component contains no variable  
 'flux', I assume you meant the set_of_lambda_of_real 'fluxes'?
   

Thanks for pointing that out - it is hard to validate something before 
there is a specification or tools, so I'm sure there are lots of little 
mistakes like that in my example.

 I think I like this! When will it be released?  ;-P

 I don't know enough about the lambda elements in MathML to check the  
 syntax (BTW which MathML spec would we use for CellML 1.2, MathML  
 2.0?)

We really can't refer to the drafts for MathML 3, so we are stuck with 
MathML 2 unless MathML 3 goes final before CellML 1.2 is frozen.

  but I think I get the idea and presuming it works like how the  
 species are tracking their own ODEs and the reactions just deal in  
 fluxes for the species involved.

I'm not sure what you mean by 'species are tracking their own ODEs' - 
there is one state variable per species, representing the concentration, 
and the ODE for [species] is in the component for that species.

  We can do this with cellml 1.0 but  
 better to minimise the 'gluing code' of course.
   

You could express an equivalent mathematical model in CellML 1.0 or 
CellML 1.1, but the point is that the model would not be re-usable if 
you wanted to add a new chemical reaction to it. Instead, you would be 
forced to take one of the following strategies:

  a) add an extra real scalar term into the rate equation, and expose 
that all the way up through abxy_system, and then either connect it to a 
variable with constant value 0 if you don't want to add anything to the 
system, or connect it to the new fluxes otherwise. You would need to do 
this for every species in the model, making it very cumbersome to work with.
  b) actually change the original model to add in the new fluxes to the 
rate equation every time you add in a new reaction (i.e. copy and edit 
rather than re-use by reference).

 Minor question: what is the purpose of the component abxy_system? Is  
 this to help monitor the individual fluxes from one location?
   

abxy_system is a convenience component which provides a top-level 
interface to the system. The model could be run as it is right now, or 
alternatively, you could import abxy_system into another model, and you 
might also want to import the bcyz_system, which has some species in 
common and some species different. The top-level model would import the 
convenience components from abxy_system and bcyz_system, and would then 
connect flux_b from abxy_system to flux_b from bcyz_system, and flux_y 
from abxy_system to flux_y from bcyz_system.

The mathematical model you get from this would be sub-optimal without 
some intelligence on the part of software tools - there would be two 
state variables with identical rates and initial values for the 
concentration of b and the concentration of y (something tools might be 
able to detect and optimise if there was sufficient demand to do so), 
but nevertheless, the two models would have been correct composited into 
a single model.

 Very minor point: I notice the units for your kb terms should be just  
 'per_second'?
   

Good catch.

 directionality - I suspect we can just leave those redundant (in/out)  
 tags out?
   

I'm not sure what you mean there. In CellML 1.1, there can be attributes 
like public_interface=in private_interface=out, I have used 
something more like public_interface=yes or public_interface=no 
here, thereby removing the directionality - in and out correspond to 
yes, none corresponds to no.

 connections structure - makes me think about something that is more a  
 toolset UI issue:
 One thing I've found when constructing models is that I don't think  
 about connections
 from an A-B, then A-C then A-D perspective (letters here are  
 components), but by considering all connections from A to other  
 components. For example, if I have a new component A  I'd much rather  
 declare something like:

 connections from A
   variable whatever1 to B:whateverinB
   variable whatever2 to C:whateverinC
 /connections from A

 if you see what I mean, rather than hunt around the CellML for each  
 variable connection which is, in my opinion, tedious and poor  
 workflow. Now I don't think what I've written there is as good a  
 structure as the current CellML and I'm not suggesting CellML should  
 be changed, but what I'm getting to is I think it would be helpful for  
 the PCEnv/COR tool if from the UI a more workflow-friendly linking was  
 possible, and it sorted out the more elegant connection components as  
 written in CellML from that.
   
Yes, this would have to be provided by tools, because if connections 
don't have directionality, then it makes no sense in the language to say 
that a connection is from A to B, as opposed to from B to A, and we 
wouldn't want to force users to duplicate 

Re: [cellml-discussion] Using proposed CellML 1.2 features to create more re-usable metabolic models

2008-01-30 Thread Michael Cooling
 because if connections don't have directionality, then it makes no  
 sense in the language to say
 that a connection is from A to B, as opposed to from B to A, and we
 wouldn't want to force users to duplicate information and provide both

Oops I didn't mean to imply directionality. I shouldn't have used the  
words 'from' and 'to', sorry. It's not the language that I want to  
change.

 I'm not sure what you mean there. In CellML 1.1, there can be attributes
 like public_interface=in private_interface=out, I have used
 something more like public_interface=yes or public_interface=no
 here, thereby removing the directionality - in and out correspond to
 yes, none corresponds to no.

Ah I see. I suspect just having an optional public_interface=yes  
would be the way to go, private=yes being always
implicit. I.e. private by default, and public includes private. As you  
say this will no doubt be discussed later.

 The mathematical model you get from this would be sub-optimal without
 some intelligence on the part of software tools - there would be two
 state variables with identical rates and initial values for the
 concentration of b and the concentration of y

This is why I am wary of such convenience componentsthey imply an  
interface that doesn't really exist - I think they can be useful but  
since one never
knows what might get connected to what
If one tries to use them as an interface to the system that MUST be  
used (not that you were doing this)
then you limit the usefulness of your model...essentially you are  
assuming that your interface is sufficient for all future
requirements. Even for ion channels, that might not be true in the future.
I think duplications may also be resolved by chosing a particular substance_b
(ie human decision) at model aggregation-time facilitated by tools  
identifying this situation.

Quoting Andrew Miller [EMAIL PROTECTED]:

 Michael Cooling wrote:
 First just to check that in the the connection elements 'flux' should
 be 'fluxes'? For example, 'substance_a' component contains no variable
 'flux', I assume you meant the set_of_lambda_of_real 'fluxes'?


 Thanks for pointing that out - it is hard to validate something before
 there is a specification or tools, so I'm sure there are lots of little
 mistakes like that in my example.

 I think I like this! When will it be released?  ;-P

 I don't know enough about the lambda elements in MathML to check the
 syntax (BTW which MathML spec would we use for CellML 1.2, MathML
 2.0?)

 We really can't refer to the drafts for MathML 3, so we are stuck with
 MathML 2 unless MathML 3 goes final before CellML 1.2 is frozen.

  but I think I get the idea and presuming it works like how the
 species are tracking their own ODEs and the reactions just deal in
 fluxes for the species involved.

 I'm not sure what you mean by 'species are tracking their own ODEs' -
 there is one state variable per species, representing the concentration,
 and the ODE for [species] is in the component for that species.

  We can do this with cellml 1.0 but
 better to minimise the 'gluing code' of course.


 You could express an equivalent mathematical model in CellML 1.0 or
 CellML 1.1, but the point is that the model would not be re-usable if
 you wanted to add a new chemical reaction to it. Instead, you would be
 forced to take one of the following strategies:

   a) add an extra real scalar term into the rate equation, and expose
 that all the way up through abxy_system, and then either connect it to a
 variable with constant value 0 if you don't want to add anything to the
 system, or connect it to the new fluxes otherwise. You would need to do
 this for every species in the model, making it very cumbersome to work with.
   b) actually change the original model to add in the new fluxes to the
 rate equation every time you add in a new reaction (i.e. copy and edit
 rather than re-use by reference).

 Minor question: what is the purpose of the component abxy_system? Is
 this to help monitor the individual fluxes from one location?


 abxy_system is a convenience component which provides a top-level
 interface to the system. The model could be run as it is right now, or
 alternatively, you could import abxy_system into another model, and you
 might also want to import the bcyz_system, which has some species in
 common and some species different. The top-level model would import the
 convenience components from abxy_system and bcyz_system, and would then
 connect flux_b from abxy_system to flux_b from bcyz_system, and flux_y
 from abxy_system to flux_y from bcyz_system.

 The mathematical model you get from this would be sub-optimal without
 some intelligence on the part of software tools - there would be two
 state variables with identical rates and initial values for the
 concentration of b and the concentration of y (something tools might be
 able to detect and optimise if there was sufficient demand to do so),
 but nevertheless, the two 

Re: [cellml-discussion] Using proposed CellML 1.2 features to create more re-usable metabolic models

2008-01-30 Thread Andrew Miller
Michael Cooling wrote:
 because if connections don't have directionality, then it makes no  
 sense in the language to say
 that a connection is from A to B, as opposed to from B to A, and we
 wouldn't want to force users to duplicate information and provide both
 

 Oops I didn't mean to imply directionality. I shouldn't have used the  
 words 'from' and 'to', sorry. It's not the language that I want to  
 change.
   
   
 I'm not sure what you mean there. In CellML 1.1, there can be attributes
 like public_interface=in private_interface=out, I have used
 something more like public_interface=yes or public_interface=no
 here, thereby removing the directionality - in and out correspond to
 yes, none corresponds to no.
 

 Ah I see. I suspect just having an optional public_interface=yes  
 would be the way to go, private=yes being always
 implicit. I.e. private by default, and public includes private. As you  
 say this will no doubt be discussed later.
   

This particular aspect has already been discussed. See 
http://www.cellml.org/pipermail/cellml-discussion/2008-January/001144.html 
and the follow on discussion for why public and private interfaces are 
not analogous to public and private in most object orientated 
programming languages, and why public does not imply private.

By default, we have public_interface=no private_interface=no, which 
means that the variable cannot be used outside of the component. 
public_interface=yes means that encapsulating components can connect 
to the variable, while private_interface=yes means that encapsulated 
components can connect to the variable.

   
 The mathematical model you get from this would be sub-optimal without
 some intelligence on the part of software tools - there would be two
 state variables with identical rates and initial values for the
 concentration of b and the concentration of y
 

 This is why I am wary of such convenience componentsthey imply an  
 interface that doesn't really exist - I think they can be useful but  
 since one never
 knows what might get connected to what
   

Removing directionality, and using sets for the list of all fluxes mean 
that it doesn't matter what the flux set is connected to behind the 
interface. The duplication of state variables is unfortunate, but it 
doesn't actually have any theoretical implications for the correctness 
of the model, if both state variables have the same initial values and 
rates (which they would, since they are the sum over the same set of 
fluxes). This means that the model should just work in all cases.

 If one tries to use them as an interface to the system that MUST be  
 used (not that you were doing this)
 then you limit the usefulness of your model...essentially you are  
 assuming that your interface is sufficient for all future
 requirements.

Yes, and that is a good assumption if you provide everything on the 
interface.

  Even for ion channels, that might not be true in the future.
 I think duplications may also be resolved by chosing a particular substance_b
 (ie human decision) at model aggregation-time facilitated by tools  
 identifying this situation.
   

I don't think we want to do that manually. The duplication is only a 
problem because it artificially inflates the dimensionality of the 
model, creating a performance problem, and the algorithm to detect this 
artificial inflation of dimensionality and merge the state variables for 
computational purposes should be tractable if we limit it to the case 
where identical variables are being put through an identical equations, 
rather than going for a more general and costly isomorphic mathematical 
structure detection.

Best regards,
Andrew

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Using proposed CellML 1.2 features to create more re-usable metabolic models

2008-01-30 Thread Andrew Miller
Michael Cooling wrote:
 if both state variables have the same initial values and
 rates (which they would...
 

 why should they have the same initial values? I agree if they did then  
 it makes
 no difference to the correctness of the model but it seems very possible to
 create a model of system 1 with substance_b and a model of system 2  
 with substance_b and give them
 both different initial conditions, then try to combine them. In  
 practice I think this would happen more often than not, at least for  
 the systems I've dealt with so far.

 Are you talking about AFTER you've realised the conflict and have  
 already decided which value(s) to go with? Or have I missed something?
   
If two models contradict each other (such as by each stating a initial 
value for concentrations of the same species, or a different mechanism 
for the exact same reaction), then this contradiction has to be fixed 
before the models can be composed.

I am therefore focusing on clean ways to compose non-contradictory 
models which involve some overlap of chemical species (and so the 
assumption is that the initial values are the same).

Best regards,
Andrew
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Using proposed CellML 1.2 features to create more re-usable metabolic models

2008-01-30 Thread Michael Cooling
 If two models contradict each other (such as by each stating a  
 initial value for concentrations of the same species, or a different  
 mechanism for the exact same reaction), then this contradiction has  
 to be fixed before the models can be composed.

It's not just about composition - during model simulation work often  
initial conditions are altered before running. If we had two (or more)  
'substances' that actually relate to the same substance, it would be  
easy to make a contradiction after the model has been composed just by  
altering one of those initial conditions.

 I am therefore focusing on clean ways to compose non-contradictory  
 models which involve some overlap of chemical species (and so the  
 assumption is that the initial values are the same).

I wonder if in general allowing duplications is not the best we could  
do. Especially if we hope to build larger models from smaller ones  
with the least cellml code editing (including resetting initial  
conditions).

I think as well the good work you have done here it would be valuable  
to explore as an extension ways of resolving such conflicts possibly  
with cellml language elements, or at least not advocating the creation  
of higher level structures which could easily leads to such conflict  
(not that we can prevent people building them if they try hard enough  
:) ).

Metadata and tools contribution to the model composition process  
notwithstanding.



This message was sent using IMP, the Internet Messaging Program.

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Using proposed CellML 1.2 features to createmore re-usable metabolic models

2008-01-30 Thread David Nickerson
Hi Andrew,

This looks quite intriguing. As you mention later in this thread I 
currently take the approach of adding an extra real scalar term and 
exposing that as a way to hook future (in my case) currents into ion 
concentration rate equations. Such and approach works well when I only 
have three ion concentrations to deal with, but gets overwhelming when 
considering many different species, such as happens when adding 
metabolic processes to an electrophysiology model.

Just to try and better understand your proposal by use of an example 
closer to my own work and perhaps to also check that this approach is 
not limited to metabolic models...

Using this method, I would be able to say that each calcium membrane 
current is in the set of calcium fluxes and then formulate the time 
derivative of intracellular calcium concentration to be a sum over all 
fluxes in that set. Then as I extend the model I simply specify any new 
calcium currents as being in the set of calcium fluxes and the 
intracellular calcium concentration automatically has them included.

Have I got that right?

Assuming I have understood this, how hard do you expect it to be to 
implement support for such set summation techniques in the current 
Auckland API implementation? I would guess this falls into the code 
generation services rather than the core API implementation?


Andre.

Andrew Miller wrote:
 Hi all,
 
 To aid in working out what features we should include in CellML 1.2, I 
 have been looking into one of the major difficulties with creating 
 re-usable metabolic models at the moment: that to compute a derivative, 
 you need to know all the fluxes, but when a model is extended, new 
 fluxes can be added to the model. Ideally, we should be able to define 
 CellML 1.2 so that structured types can be leveraged to avoid this issue.
 
 I have come up with an example of how this might look in CellML 1.2. 
 This is based in part off a discussion that I had earlier with Poul, so 
 most of the credit for this goes to him, while most of the blame for the 
 inelegant parts goes to me.
 
 Note that the lambda constructs without any bound variables are there to 
 defer interpretation of the fluxes until they are taken from the set. 
 The rationale for needing this is that if we had two statements that a 
 certain real-valued flux belongs in the set, and two of them happened to 
 have the same value at a particular point in time, then summing over the 
 set would only include the flux value once (e.g. saying x is an element 
 of N, where N is in the natural numbers, means that the value of x is a 
 natural number. Saying x=5, y=5, x in N, y in N does not mean that 5 is 
 in the set of natural numbers twice). On the other hand, we can have two 
 different zero argument lambda functions in a set, which just happen to 
 evaluate to the same value. This is consistent with the declarative 
 nature of CellML - we are not saying 'add this flux to the set of 
 fluxes', but rather, we are making a series of statements about what is 
 in the set of fluxes, and the processing software is then summing over 
 all fluxes which have been explicitly mentioned. Because we can connect 
 the set of fluxes up to an importing model, doing things this way gives 
 a great deal of extra flexibility.
 
 Notes:
 1) In practice, the substance could become an import which is re-used 
 many times, and likewise for components representing various general 
 types of chemical reactions.
 2) I have invented a possible way in which we could remove 
 directionality from connections. No one has come up with a formal 
 proposal to do it this way yet.
 3) I have followed Randall's earlier suggestion about how to structure 
 connections without using component_ref. This again needs a formal 
 proposal and discussion.
 4) The model uses a potential way in which we could get rid of the 
 generality of groups, by replacing group and relationship_ref with a 
 simple encapsulation element. This again needs to be formally proposed 
 and discussed at some point.
 
 ?xml version=1.0 encoding=UTF-8?
 model
   xmlns=http://www.cellml.org/cellml/1.1#;
   xmlns:c11=http://www.cellml.org/cellml/1.1#;
   xmlns:c12=http://www.cellml.org/cellml/1.2#;
   xmlns:m=http://www.w3.org/1998/Math/MathML;
   name=_1_2_example
  
   component name=substance_x
 c12:variable name=fluxes type=set_of_lambda_of_real
   units=mol_per_litre_per_second public_interface=yes /
 c12:variable name=concentration units=mol_per_litre
   public_interface=yes /
 c12:variable name=time units=second public_interface=yes /
 m:math
   m:applym:eq/
 m:applym:diff/
   m:bvarm:citime/m:ci/m:bvar
   m:ciconcentration/m:ci
 /m:apply
 m:applym:sum /
   m:bvarm:cif/m:ci/m:bvar
   m:condition
 m:apply
   m:in/
   m:cif/m:ci
   m:cifluxes/m:ci
 /m:apply
   

Re: [cellml-discussion] Using proposed CellML 1.2 features to createmore re-usable metabolic models

2008-01-30 Thread Andrew Miller
David Nickerson wrote:
 Hi Andrew,

 This looks quite intriguing. As you mention later in this thread I 
 currently take the approach of adding an extra real scalar term and 
 exposing that as a way to hook future (in my case) currents into ion 
 concentration rate equations. Such and approach works well when I only 
 have three ion concentrations to deal with, but gets overwhelming when 
 considering many different species, such as happens when adding 
 metabolic processes to an electrophysiology model.

 Just to try and better understand your proposal by use of an example 
 closer to my own work and perhaps to also check that this approach is 
 not limited to metabolic models...

 Using this method, I would be able to say that each calcium membrane 
 current is in the set of calcium fluxes and then formulate the time 
 derivative of intracellular calcium concentration to be a sum over all 
 fluxes in that set. Then as I extend the model I simply specify any new 
 calcium currents as being in the set of calcium fluxes and the 
 intracellular calcium concentration automatically has them included.

 Have I got that right?
   

That sounds right to me - metabolic modelling was indeed only a specific 
example.

 Assuming I have understood this, how hard do you expect it to be to 
 implement support for such set summation techniques in the current 
 Auckland API implementation? I would guess this falls into the code 
 generation services rather than the core API implementation?
   

I think that supporting CellML 1.2 will require additions to both the 
core API implementation and the code generation services. Internally, 
the code generation services don't have support for anything other than 
scalars, and so adding in data types such as sets (and implementing 
operators over them) will require some fairly significant internal 
changes. However, there is no reason to believe that adding such support 
is infeasible in the long term.

Best regards,
Andrew

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion