Re: [cellml-discussion] Call for community input: decision about theaddition of a CellML API side RDF parsing service
I have appended my comments to the tracker item. In short, I believe a custom implementation will help you to manage RDF internally and help resolve the dependencies you are currently bound to, but this should not be looked at to provide useful RDF interfaces for reading and writing specific types of metadata; the most obvious implication of that is you would also need to write an RDF Schema library also. So the public API considered here may be very small - consume or produce triples. cheers Matt On 1/05/2008, at 9:49 PM, David Nickerson wrote: Hi Justin, As I mentioned on the tracker item, it would be really good if you could put together a proposal (perhaps as a document under your cellml.org member page) which describes exactly what it is you are proposing here. Something along the lines of what Andrew presented when putting forward the proposed refactoring of the code generation service. I'm really not sure how a RDF parsing service on its own is going to help meet the goals you describe. I am also wondering exactly what you mean by an intermediate conclusion? Thanks, Andre. Justin Marsh wrote: Hi all, For those who may be interested, there has been some discussion amongst those involved with the CellML API recently about a proposed addition of an CellML API side RDF parsing service; this would, for example, allow us to remove our dependency on patching Mozilla, allowing us to build PCEnv from an unmodified build of the Mozilla framework. The discussion has moved over to tracker item 358 ( https://tracker.physiomeproject.org/show_bug.cgi?id=358 ) Other reasons for such an addition have been for use in any future metadata service, the increasing use of rdf, and for use in annotating systems of equations. Reasons against such an addition have included the availability of preexisting libraries, the possibility of scope creep, the possibility of introducing changes or dependencies in the existing CellML API, the broadness of the current proposal, and a possible conceptual uncleanness or incorrectness. I would appreciate any feedback, comments about, or refinements of this; however, unless the discussion is still raging, we want to come to at least an intermediate conclusion by Friday the 9th of May. Best Regards, Justin Marsh ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Representing the next version of the CellML Specification
On 8/11/2007, at 10:27 AM, Randall Britten wrote: Hi all Another option to add to the mix: using a Wiki. In this case, I would specifically suggest MediaWiki (a la Wikipedia). Pros: -Widely used, lots of user familiarity. -Easy collaboration: edits done via web interface. -Built in diffs and revision history. -Linking when done via web interface works well. -Can be rendered as PDF on demand (haven't tested this myself, but docs say it can be done). Cons: -Requires setup and maintenance of another content management system. Unsure: -Usually Mathml handled with LaTex substrings, not sure how to handle MathML in MediaWiki. For the purpose of presentation math, LaTex substrings get my vote. They are simple to express and their diffs in my opinion are easier to interpret than diffs of MathML/XML. More generally, I think XML source formats should be avoided if possible. Regards, Randall ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Representing the next version of the CellMLSpecification
More generally, I think XML source formats should be avoided if possible. Just wondering if you can explain your reasoning for this? reading plain text in a text editor is more pleasant reading diffs of plain text is more pleasant ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
[cellml-discussion] HFM domains
Hi Aaron and Dane, I need the following set up for HFM. They are obviously not the final ones ... but we need to used them heavily soon for testing and then letting HFM add real content. The names are: www.production.hfm.endev.co.nz and admin.production.hfm.endev.co.nz supervisor.production.hfm.endev.co.nz HFM is hosted on 10.0.0.18 All of these can be pointed at port 80. One big request is that this is not setup using the current apache system on the firewall. Either a layer 2 approach or something like pound would be preferable. I don't want HFM to be part of the apache deathmatch that exists there at the moment. cheers Matt ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
[cellml-discussion] CellML Versioning Strategy
Andrew was opposed to the idea of changing all the namespaces, and suggested changing the namespace of a particular element in only some circumstances: I agree very strongly with this. It would make writing out xpath expressions simpler since you know absolutely what namespace for what elements you want to target. The namespace argument also applies to new attributes - they need to be placed into a new namespace too and references explicitly as such in a document since the rule for CellML is that unnamespaced attributes will acquire the namespace of the element owning them. Poul thinks that mixing namespaces means you have to scan the entire document before you can determine that you don't support a particular version of the model. I don't understand that. You might want to scan a document to see what versions the model conforms up to, but one of the nice things about pushing these new elements/attributes into new namespaces is that you can still treat a model as say 1.1 even if it contains 1.2 elements and attributes. So the scanning is already done implicitly by a library that is simply trying to use a CellML model and is reading it at the version level it is capable of. Of course CellML 1.1 is broken in this sense. There was some discussion about what namespace the model element should be in CellML 1.2. Randall suggested it should be in CellML 1.1 and not CellML 1.0 Can we apply this to all existing elements and attributes then? So that when 1.2 comes along and its interpretation we only really have 1.2 and 1.1 to deal with. cheers Matt On 9/19/07, Andrew Miller [EMAIL PROTECTED] wrote: Hi all, At the break-away session on the versioning strategy for CellML (which followed the Auckland CellML meeting today) we discussed the future of how we would version CellML, including whether we would put all elements for the next version of CellML in a completely different namespace, or only the elements that had changed. A summary of the discussion is up at http://www.cellml.org/meeting_minutes/MeetingMinutes19September2007/ under Breakaway session on versioning strategy for CellML. Note that the participants at the session have not had a chance to correct errors in it yet, and it may not yet accurately reflect everyone's view. However, it does lay out the options, and so may provide a starting point for any suggestions or comments from the community. Please send and such suggestions or comments to the CellML discussion mailing list prior to the 3rd October 2007. Best regards, Andrew ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] CellML Versioning Strategy
On 9/19/07, Andrew Miller [EMAIL PROTECTED] wrote: Matt Halstead wrote: Andrew was opposed to the idea of changing all the namespaces, and suggested changing the namespace of a particular element in only some circumstances: I agree very strongly with this. It would make writing out xpath expressions simpler since you know absolutely what namespace for what elements you want to target. The namespace argument also applies to new attributes - they need to be placed into a new namespace too and references explicitly as such in a document since the rule for CellML is that unnamespaced attributes will acquire the namespace of the element owning them. This is something which I think we should change ASAP - it is a deviation from the XML specification which we should not be declaring at the CellML level. I think that once this is sorted out, versioning the elements is sufficient, and there is no need to mix namespaces of attributes within the same element (if the attribute definitions change, then the semantics of the element have changed, so we change its namespace). Yup Poul thinks that mixing namespaces means you have to scan the entire document before you can determine that you don't support a particular version of the model. I don't understand that. You might want to scan a document to see what versions the model conforms up to, but one of the nice things about pushing these new elements/attributes into new namespaces is that you can still treat a model as say 1.1 even if it contains 1.2 elements and attributes. So the scanning is already done implicitly by a library that is simply trying to use a CellML model and is reading it at the version level it is capable of. Of course CellML 1.1 is broken in this sense. There was some discussion about what namespace the model element should be in CellML 1.2. Randall suggested it should be in CellML 1.1 and not CellML 1.0 Can we apply this to all existing elements and attributes then? So that when 1.2 comes along and its interpretation we only really have 1.2 and 1.1 to deal with. I think that was the intention - model was only an example of an element with semantics that we don't plan to change, and any other element which is neither new nor changed in CellML 1.2 would be treated along the same lines. Then we can just implement 1.2 (and perhaps 1.0) without worrying about explicitly implementing 1.1 as a separate task. Best regards, Andrew cheers Matt On 9/19/07, Andrew Miller [EMAIL PROTECTED] wrote: Hi all, At the break-away session on the versioning strategy for CellML (which followed the Auckland CellML meeting today) we discussed the future of how we would version CellML, including whether we would put all elements for the next version of CellML in a completely different namespace, or only the elements that had changed. A summary of the discussion is up at http://www.cellml.org/meeting_minutes/MeetingMinutes19September2007/ under Breakaway session on versioning strategy for CellML. Note that the participants at the session have not had a chance to correct errors in it yet, and it may not yet accurately reflect everyone's view. However, it does lay out the options, and so may provide a starting point for any suggestions or comments from the community. Please send and such suggestions or comments to the CellML discussion mailing list prior to the 3rd October 2007. Best regards, Andrew ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Include_in_CellML_1.2 requested: [Tracker Item 153] Allow multiple connections between the same pair of components
Semantically I think this is fine and theoretically does not change the meaning of connections. It's important to highlight that software developers will need to: 1) relax the validation constraint for the existing rule (i.e. only one connection between any two components) 2) understand that component_1 and component_2 of map_components can change order over connection elements between the same components (some software may have used the current notion of there being only one connection and one order to component_1 and component_2 to optimise in memory object references) I think this could have some pronounced effects on some software. I wouldn't mind reworking the connection syntax altogether ... but that's another proposal. On 8/29/07, Andrew Miller [EMAIL PROTECTED] wrote: Hi all, Are there any objections to marking this as something we should include in CellML 1.2? Best regards, Andrew [EMAIL PROTECTED] wrote: Andrew Miller [EMAIL PROTECTED] has asked for Include_in_CellML_1.2: Tracker Item 153: Allow multiple connections between the same pair of components http://bowmore.elyt.com/bugzilla/show_bug.cgi?id=153 --- Additional Comments from Andrew Miller [EMAIL PROTECTED] Section 3.2.4 of CellML 1.1 states, in the second sentence of the second paragraph: Only one connection may be created between any given pair of components in a model. This is a fairly pointless restriction from all fronts: * From a model authors perspective, it creates a burden on the author to consolidate all their connections which may have been created for different purposes, and current model authors claim that such consolidation is time consuming and error prone. * From a model readability perspective, it is also burdensome because connections between variables may not be in a logical order (this is less of an issue if tools are used, but the point still holds). * Implementation experience suggests that it is no harder to allow multiple connections between the same pair of components when writing simulation software, but the extra constraint imposes more work on developers when writing tools which try to validate the model. To fix this, we could simply drop the first two sentences of the second paragraph of Section 3.2.4, and perhaps replace them with a short explanation. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] curation of BioModels Database
Hi Nicolas, Users can currently submit models in CellML and SBML (VCML is coming). All the models are then converted to SBML, which is our internal format. Can you point me to the transforms/code/alogorithm for this? cheers Matt ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] CellML 1.1.1 specification
More generally I think this issue is tapping on the bigger issue of a core modelling language and specification, and then special use cases, of which biological modelling would be a particular case of. Without reactions, we pretty much have that core language. My inclination is that biological modelling is a specification of minimum required annotation of mathematical models using the 'core' language. Poul's turn. On 7/20/07, James Lawson [EMAIL PROTECTED] wrote: David Nickerson wrote: Matt wrote: It seems there is some misunderstanding as to whether we are discussing a proposal to remove the reaction element from CellML or a proposed new specification. I thought it was the latter but you seem to be talking about the former... Both. I think. So I will try another explanation. If we had our specification in a version control system and tagged out releases and release candidates etc, and if we followed a protocol of releasing at least one stable minor release that marks depreciation only, then the following would be the result (in my mind) - The current trunk is the development version of cellml 1.2 (i.e. unreleased-dev). - This current trunk look likes CellML 1.1 and the associated definitions in DTDs etc. - We update this to mark out that reaction elements are going to be depreciated, this includes comments in DTDs etc. We don't remove reaction elements from the specification at this stage because that's where we hang the depreciation notices. - We tag this as 1.1.1 and release it - We then delete reaction elements from the specification that is on trunk. Now, this is the kind of process I think covers the steps you have been talking about and at the end makes available a trunk version of 1.2-dev-unreleased that doesn't have reaction elements that people can check out an play with (this is essentially the proposal page the Andrew wrote up - though I think there are issues remaining now with the absence of biology from a Cell ML standard. yep - thats how I would see the specification evolving over time, subject, of course, to the various proposals being accepted and assigned to an appropriate version. I think the absence of biology from the core specification is probably a good thing, It might be worth adding an editorial comment or similar to note that the metadata is where the vast majority of biological information is defined. but there needs to be clear annotation of the specification describing how reactions should now be represented in a world without reactions - another best practice recommendation and examples in the model repository at the least, I would hope. Yep, the 'signal transduction' tutorial will no longer be needed, since its main purpose is to describe the best practice for use of reaction elements to describe biochemical pathways. The question will be, do we just remove it, or do we create a new tutorial that is biochem specific, but doesn't talk about reaction elements. For example, there are two main ways to code up a biochem model: either you can use equations that describe, for example, the rate of change of conc. of species A, where species A might have a few different processes acting on it, so this equation would be a summation of the effect of these processes, OR, you can split the equation into 'fluxes,' which represent just the effect of one process on a species at a time. I think it would be worth writing a best practice guide for writing biochem models, even if it is relatively short, since there are a few things that are different from how an ephys model should be coded up. But what I am also saying is that this is still just an idea, so it should be put forward as a proposal that has not been accepted. I.e. that the steps I described above are purely hypothetical at the moment, since we haven't had the chance to hear arguments from people about it - it might turn out to be a silly proposal. definitely. Your steps describe the process for how the specification may be updated, developed, etc., but each release will be the result of a set of proposals being accepted and assigned to that particular release. this is why the proposal to remove the reaction element should have first been put forward independently of any specific future version of CellML. Perhaps we could start by formalising the actual proposal to get rid of the rxn element. Why are we doing it? How will we replace the purpose it served, using metadata for example? I realise that this has been discussed a lot in an informal manner, and was in fact decided well before my time, but I guess if we're going to do something as major as create a new spec version, we should build the foundations first. In this discussion forum we could then debate the merits of this proposal and, if deemed suitable, develop a schedule for the implementation of the proposal (i.e., mark
Re: [cellml-discussion] Concerning the CellML Model Repository
This is my view of where things should be heading: The main impetus for this thread is moving the cellml.org site forward. In this sense I would like to see a description of what it currently does and what features have been informally slated. Then I'd like to see a document that re-writes these out as use-cases that don't depend on technology (but can certainly borrow ideas from various technologies). A large part of this are the cellml.org use-cases around the use of metadata in the models. While the underlying implementation of the repository is something to discuss, I think that it is a red herring at the moment. The issues seems more to do with various use-cases being difficult to represent in the current style of model naming and the difficulty of reflecting someone's local filesystem workflow/layout. I think there is too much of a rush to solve the repository issue quickly based on these idiosyncrasies of the cellml.org model naming problem. Some(!!) considerations: - how is a modelers local workspace organized? e.g. we have talked about the possible need for a manifest file; the possibility of metadata sitting separate from the model itself; etc. Is the idea of a workspace appropriate? Would people have multiple workspaces, say one for each model, or one workspace for all their models, or both? - do people want to use a single central repository? Or should they be able to work independently in their own instance of a repository and perhaps at some point transfer their project to another one? - there has been an assumption that the base unit stored in a repository should be a cellml/xml model - why is this? check the reasons why this is believed to be the way it should be. - don't try to figure out the URI scheme right now - even in use cases. The only attention to URI will be the bahviour it might exhibit in the modeling process: for example, you want someone to be able to move from tracking a volatile branch of a model in their imports to a stable one (that's all you have to say, not what the URIs might look like). - don't attach specific technologies to the repository system until the use-case space has been filled out The evolution of the repository is a non-small task (it's actually someone's PhD topic). So once there is a pretty certain idea of what the repository may be, then how does the current system in the plone site sit with respect to this? Are there technologies that take us a step closer that could be weaved into the current product? etc.. What are the priorities for cellml.org? ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] [team-cellml] @cellml.org addresses
This seems like it's going in circles. I'm not really sure why anyone would want to contact us personally with something they didn't want to send to the list. Thinking about this more we should probably try: 1) cellml-discussion@cellml.org 2) [EMAIL PROTECTED] - for specific enquiries that you don't want publicly available. It would make sense to have a nominated person or persons that address mail in there - James I would think - who decides if an email should be forwarded to the list because it really is a public issue - or respond and acknowledge the email and seek a response from those in the team that it seems appropriate to. 3) a team page where everyone who is on the team-cellml list has a picture and a small blurb (kind of like http://sbml.org/contacts/) ... which is really just to give a face to those who are quite deeply involved. I would imagine people like Penny Noble to be on that. If someone contacts someone on that page then it will likely be quite personal. On 6/25/07, Andrew Miller [EMAIL PROTECTED] wrote: David Brooks wrote: See below... On 25/06/2007 4:32 p.m., Andrew Miller wrote: James Lawson wrote: Andrew Miller wrote: I don't think we should use the word 'project team' because there is no formal project team. Perhaps we can just have a list of people categorised by their interest in the CellML project, and then a contact page which helps people find certain people (for example, we could have a category for technical issues with cellml.org, which would list Tommy, a category for people with the ability to curate cardiac electrophysiology models, which would list James, and a category for people with an interest in cardiac electrophysiological modelling, which would list anyone who wanted to be on the list). There is then the issue of whether we use our own email addresses or @cellml.org addresses. Andre is keen on the latter, and I agree. Although I am not entirely convinced that it is necessary or beneficial, and I think that we risk harming the community nature of CellML by saying that only certain people can get a cellml.org e-mail address. Surely there's no harm in having a small number of generic @cellml.org email addresses that reflect the roles people play? (eg [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]). I don't think it's a good idea to have lots of these addresses (as this can get confusing), nor should the roles be too specialised. We have tried this in the past, and it resulted in the fragmentation of the community, and it had several negative outcomes: 1) People were sending all messages of a given type to the aliases, instead of to the list. However, because these aliases were closed mailing lists with generally out of date membership, mails sent to the lists were essentially getting forgotten about when there were people on the main list who could have answered the message. 2) There was no archive so there was no way to tell if a question was answered. 3) People often referred to e-mails sent to these lists at the CellML meetings, but it was hard to tell what they were talking about because only some people at the meeting got the messages. 4) Because the aliases were open, they got a lot of spam, which made it hard to see the signal over the noise. 5) Because the traffic was fragmented, it looked to anyone looking at the cellml-discussion archives like there was nothing happening with the CellML project. As a result of this, we decided over a year ago to get rid of info, tools and other lists like that and consolidate them all into cellml-discussion. I don't really think we want to go back to the way it was before without addressing all the problems it caused last time. Best regards, Andrew Regards, Dave ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] [team-cellml] @cellml.org addresses
On 6/25/07, David Nickerson [EMAIL PROTECTED] wrote: Matt wrote: On 6/25/07, David Nickerson [EMAIL PROTECTED] wrote: Matt wrote: This seems like it's going in circles. I'm not really sure why anyone would want to contact us personally with something they didn't want to send to the list. Thinking about this more we should probably try: 1) cellml-discussion@cellml.org 2) [EMAIL PROTECTED] - for specific enquiries that you don't want publicly available. It would make sense to have a nominated person or persons that address mail in there - James I would think - who decides if an email should be forwarded to the list because it really is a public issue - or respond and acknowledge the email and seek a response from those in the team that it seems appropriate to. for all the same reasons why we dropped [EMAIL PROTECTED], I can't see this being a good idea. Right. But I don't see any other resolution given the current circles. I think 'team' is a little more focussed than 'info'. I just vote to give it a try with James managing it actively and see how it goes. but if we go with 3 below why do we still need a [EMAIL PROTECTED] at all? a catchall for the whole team for example, I don't really know who I would want to bother personally if I had a personal problem with the sbml site or wanted to invite the team to a conference, or was rejected from the mailing list, etx; I would just use the Email: [EMAIL PROTECTED] link they have at the top of the team page. 3) a team page where everyone who is on the team-cellml list has a picture and a small blurb (kind of like http://sbml.org/contacts/) ... which is really just to give a face to those who are quite deeply involved. I would imagine people like Penny Noble to be on that. If someone contacts someone on that page then it will likely be quite personal. this is exactly what we were originally discussing and the SBML page is what I thought we'd be working the cellml.org/team page into. good ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion -- David Nickerson, PhD Research Fellow Division of Bioengineering Faculty of Engineering National University of Singapore Email: [EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
On 6/22/07, Tommy Yu [EMAIL PROTECTED] wrote: Matt wrote: Hi Tommy, Can you continue to update/fill out your document as well as begin associated proposals with information contained in the replies people are submitting. The goal of this process is a scoping document with associated content. It will be done when I am done refining all my thoughts about the threads here, along with the other thoughts I already have but not written down there. More comments below. Likewise. On 6/22/07, Tommy Yu [EMAIL PROTECTED] wrote: Matt wrote: Hi Tommy, I found the document seemed to be too far ahead of itself. I also didn't find any of the pros and cons very compelling because they don't address specific problems and those problems are not described. 1) What are you actually trying to achieve? It would be useful to describe the parts of the current system that are giving you grief and look to give you more grief based on the use cases and any axes of scale. Starting with what I envisioned. Who is the repository catered for? 1) People who would like to work on models, using it as a place to store work-in-progress models. 2) Reviewers to review models. 3) Website users to browse models. 1) What do the model builders want? - Their own workspace (home directory) - A place to let reviewers review their models - Also to publish their models First point is not addressed by what we have now. Second and third point is quite ad-hoc. Also, version control is very ad-hoc right now. Each of these points need to be filled out, e.g. what does it mean to have a workspace for a CellML modeller?, What are the scenarios and workflows for reviewers of CellML models? Workspace is like a home directory. Or are you comfortable with a flat filesystem where each file is owned by different people are all over the place. This is about organization according to what the model builders want. I'm more comfortable with the latter; but exactly what that looks like is difficult to know or perhaps ever to predict. Some work on a manifest description and a best practice/hint would be a good start. Models are by default private to the owner, but s/he can expose it via the layer that binds subversion and the database together which manages permissions. Try and stay away from specific underlying pieces at the moment. They key is the description of the workflow states, transitions, and actions. Other modelers could import their collegues' models (provided permissions are given). CellML import element kind of imports? Reviewers simply gets access to a model, a URI to a specific revision of a model (and associated files, at model builder's options) will be generated which s/he could use. If reviewer has rights s/he can publish the model to the public. This should probably be model workspace, the concept of a single model is a bit vague at the moment unless we define some rule that there will always be a single top level model. I presume where we are heading is that TTW, people will be accessing an index.html that processes a manifest file and creates a pretty view of the workspace. 2,3) Reviewers and website users - A centralized location to browse models. - They would like to see how models may relate to each other. How do models relate to each other? Relations between models come from all sorts of data within models, and within any associated metadata (so more than just our current cellml metadata specification). It would be useful to write out the details of the relationships that are important here as these pretty much form the basis of many of the queries that will need to be performed. It will be done. I can see users wanting to know which component of a model was imported by other models, and finding all other dependency of a particular model. More will come. First point is already addressed, but second point is definitely not possible as the current repository does not support 1.1. Why does it not support CellML 1.1? i.e. what is the technology block here to extending the current system to support it? None, aside from the lack of a proper code versioning system in the backend. With a few changes to the copy/paste code, CellML 1.1 will then be able to be stored into the repository. I could go ahead and do this, but it will only further compound the issues we have now. Okay, fine, refactor Model.py and have new classes inherit from that, but we still lack certain key features, such as a proper versioning backend. So lack of support for CellML 1.1 is not a reason to rebuild the system, but implementing CellML 1.1 support means pressure on other ugly bits like dealing with import dependencies (perhaps uploaded as seperate files) and would mean more work for people to manually ensure that versions used in import URIs are correct? Maybe as an experiment I could
Re: [cellml-discussion] Concerning the CellML Model Repository
Hi Tommy, I found the document seemed to be too far ahead of itself. I also didn't find any of the pros and cons very compelling because they don't address specific problems and those problems are not described. 1) What are you actually trying to achieve? It would be useful to describe the parts of the current system that are giving you grief and look to give you more grief based on the use cases and any axes of scale. 2) What are the use cases? An initial set should be extracted from the current site. You have written out some, but they only covered a small set of function of the site, especially when it comes to relations between models or workflow and curation states. I understand some of the details that are causing you pain with the current implementation, but I think the first part of this is to be charitable to the current system and adequately describe the two points above. Before rethinking the implementation of this site I think the following need to also be done: - a specification for assigning a URI to these models (as would be used by CellML 1.1 imports) - a specification for how a manifest file is to be constructed, or some set of rules for interpreting a directory structure of models, especially in those cases where there are multiple local models used in imports and we need to point to at least the top level model. - a suggested solution to the bqs problem. Research existing standards. Generally: Relational databases are useful, but so are the combination of ZCatalog and Sets. It really depends on the structure of the data and the queries you want to perform. You should write out a reasonable set of these in natural language to get the focus right. Maybe a proof of concept using various mechanisms is required. The frustration with metadata handling at the moment is a result of some difficulties in the metadata specification for the metadata you are using the most and also the use of a quite esoteric system: 4Suite's Versa RDF query interface. RDQL or SPARQL are better SQL-like equivalents and certainly have a wide acceptance. Subversion offers a nice philosophy of code management and the guess is that this would apply well to the modeling process. It also offers the potential for building URIs for versioned material - individual files and whole changesets (which is something we are after). The default webdav URI scheme may not be what we want, so it is also worth looking at others; for example, the trac browser interface to a subversion repository form quite nice URIs. Workflow and security as defined and implemented by Zope/CMF/Plone is a very nice model that should be reflected in our workflow and security use-cases. We discussed a few weeks ago that if this environment is going to provide the security layer, then there needs to be a relationship between this and the subversion repository at quite a detailed level. cheers Matt On 6/21/07, Tommy Yu [EMAIL PROTECTED] wrote: Hi, I have written down some of my thoughts on how the model repository could be put together. http://www.cellml.org/Members/tommy/repository_redesign.html It is still a pretty rough document. The usage example section gives a rough outline on what I see people might be doing with the repository and how this design could address those issues, which I think it will be of interest to users. It is not an exhaustive list, yet. I must also note the design outlined is quite a drastic departure from what we have now (it will be yet another new repository). However, it is more true to the one envisioned before according to http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition layer that will assist in pulling content and drawing relationships between models. Feel free to take it apart and/or build on top of it. Cheers, Tommy. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
Hi Tommy, Can you continue to update/fill out your document as well as begin associated proposals with information contained in the replies people are submitting. The goal of this process is a scoping document with associated content. More comments below. On 6/22/07, Tommy Yu [EMAIL PROTECTED] wrote: Matt wrote: Hi Tommy, I found the document seemed to be too far ahead of itself. I also didn't find any of the pros and cons very compelling because they don't address specific problems and those problems are not described. 1) What are you actually trying to achieve? It would be useful to describe the parts of the current system that are giving you grief and look to give you more grief based on the use cases and any axes of scale. Starting with what I envisioned. Who is the repository catered for? 1) People who would like to work on models, using it as a place to store work-in-progress models. 2) Reviewers to review models. 3) Website users to browse models. 1) What do the model builders want? - Their own workspace (home directory) - A place to let reviewers review their models - Also to publish their models First point is not addressed by what we have now. Second and third point is quite ad-hoc. Also, version control is very ad-hoc right now. Each of these points need to be filled out, e.g. what does it mean to have a workspace for a CellML modeller?, What are the scenarios and workflows for reviewers of CellML models? 2,3) Reviewers and website users - A centralized location to browse models. - They would like to see how models may relate to each other. How do models relate to each other? Relations between models come from all sorts of data within models, and within any associated metadata (so more than just our current cellml metadata specification). It would be useful to write out the details of the relationships that are important here as these pretty much form the basis of many of the queries that will need to be performed. First point is already addressed, but second point is definitely not possible as the current repository does not support 1.1. Why does it not support CellML 1.1? i.e. what is the technology block here to extending the current system to support it? Issues: - Flat file system. Sure, using ZCatalog it is possible to emulate users' home directories and the like, but it still does not get away from what we have now. I don't understand this. What are you aiming for in a home space and why doesn't the current system support it? - Version/Variant It already clogged up the system. There is no proper revision control mechanism, what we have now is an ad-hoc emulated system. I don't think it has clogged the system I just think it has been improperly used both by authors and by the user interface. This is no fault of the authors, there is simply a specification for versioning that is missing. The hope is that subversion applies well to this. - It's CellML Code, right? Why not put code in a real code management system, like Subversion? Subversion works well for filesystems of code and text data and to some extent binary data that we don't really need to query the contents of. If this applies well for CellML modelling, then subversion is probably a good match. Subversion will bring its own complexities when we are dealing with applying security to file objects, and security/publishing in general will get even more complex if we are proxying remote repositories - which we talked about a few weeks ago. Generally, I think the concept of cellml modelling being laid out in a filesystem and subversion versioning concepts applied to it is good, but untested. For instance, take a reasonably complex model of Andre's and work out how it will look on the filesystem and what subversion versioning would result in. While in this thread, I don't believe metadata should be treated any differently to model data. Adding special rules for versioning of some data and not others is going to complicate the versioning process and I can't see any compelling reason to do this. Remember that the subversion system is versioning file objects which will contain both metadata and cellml model data. What is important is how and where metadata is stored. Perhaps metadata should be seperated into its own document sitting next to the model in the filesystem. My inclination is that an implementation using subversion plus some subversion hooks will be ok, but we haven't worked out details or done any proof of concept for this - which should be agnositic to cellml and focussed on how to apply zope+cmf security and workflows to data objects stored in subversion repositories. - Zope has revision control Until someone packs the database. Perhaps you should look at http://plone.org/products/plone/roadmap/8 (which is now completed and merged into Plone 3). There are some other add on products - some listed in http://plone.org/products/by-category
Re: [cellml-discussion] PMR categories
I think that list is a good start for a top level set of terms. I agree with Andre that other should not be a selectable term. I would probably offer a primary keyword which forces a selection from the current list of terms ('none of these') being one of the items, and then a dynamic set of free text keyword fields (by using say the DataGridField plone product http://plone.org/products/datagridfield) so that Authors can add their own (especially if they don't fit into one of the primaries). There can be a note to say that if you don't fit into a primary your model will be listed under other, but to point out that it is important for them to add the free form keywords so that 'we' can look over the models that end up in other and perhaps add another primary keyword option if something looks as if it should be. I really want to be able to capture author coined keywords since this is the best input for trying to organise an ontology of keywords (including synonyms etc that come up). cheers Matt On 6/7/07, Peter Hunter [EMAIL PROTECTED] wrote: Dear All, The intention of this discussion was to decide on a list of items for a drop-down list of predefined terms that would be available when choosing 'key words' for a new model and which would be the list of terms used to display models on www.cellml.org/models (together with the default 'All models' item). The idea was that choosing one or more of these key words terms would be mandatory when defining model metadata but that one could also enter additional keywords for more advanced searching. It may be that the additional key words should adhere to terms from an ontology as Matt suggests and should use the predictive completion facility that Andre suggests. But I am keen to keep this first list of terms fairly short. My suggestion is the following list. I've checked through the repository and less than 10% of the models would end up solely under 'Other'. I am sure we will need to expand this list as the repository grows and I suggest we have a policy of keeping the number that end up solely in the 'Other' category to less than 10% of the total. We may also later need a policy to refine the classification when too many models are displayed under one term. Calcium dynamics Cell cycle Cell migration Circadian rhythms Electrophysiology Excitation-contraction coupling Gene regulation Mechanical constitutive laws Metabolism Myofilament mechanics Signal transduction Other (the default key word in the list of predefined terms) Let me know if you can think of other more appropriate terms or additional ones, then I'll ask Tommy to implement it. I'm happy to then go through and classify all current models in the repository into these categories. Cheers, Peter David Nickerson wrote: One thing I have found useful in other taxonomy/keyword type web interfaces (e.g., see drupal) is that when entering such keywords the interface dynamically completes the terms and/or presents alternatives based on what the user enters. I'd imagine such an interface would work well at pulling terms out of the ontologies Matt is talking about. David. Tommy Yu wrote: Just had a discussion with Peter, Randall and James about this. The keywords are in the metadata for the models, and there is no limit to what can go in there. The concern about that is the list could get too big (for minor categories), or variations in the name (electrophysiology vs electrophysiological), or just spelling in general. What was decided is to have the same category list, but it would act as a blessed list of keywords that will serve as a guide to what should be added to the model, and as a broad category filter for the main repository listing. Users would still be able to add or search by other keywords (from the advance search interface) if they wish. Tommy. Matt wrote: On 6/6/07, David Nickerson [EMAIL PROTECTED] wrote: James Lawson wrote: David Nickerson wrote: Would I be correct in assuming that these terms will be key words added to the model metadata and that the division into categories on the main repository page will be assembled from queries on each of these predefined key words? Well potentially, there could be many many different keywords, so Peter suggested that we might not necessarily want to base the categories on just the keywords. At the moment, Tommy's sorting function is based on keywords but he suggested that we could have both a keyword and a more general category selection system. not sure I like the idea of a separate category, seems to me adding some special piece of metadata to models just to make a repository dump look pretty isn't the way to go. It would be nicer to make use of the keywords (which are genuinely useful metadata to more than just the model repository), possibly with the addition of a guided part of the metadata editing workflow which prompts the user
Re: [cellml-discussion] PMR categories
I'm not sure what the physiome ontology is. Currently the anatomy ontology is the one I've been working on and this has no physiological processes in it yet. I was hoping I had been clear in my previous emails that I want the current and future author supplied keywords to help drive the ontology, not the other way around. On 6/8/07, James Lawson [EMAIL PROTECTED] wrote: Peter Hunter wrote: It may be that the additional key words should adhere to terms from an ontology as Matt suggests and should use the predictive completion facility that Andre suggests. Will we use the Physiome ontology for this? It will require changing the current keywords that are defined in the metadata for many of the models so they fit an ontology. Should we be using ontology terms for the major categories as well? A quick flick through the Physiome ontology suggests that we might have trouble finding terms in it that would fit what we want. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] PMR categories
Some of those are subsets of others. You might want to generalise a bit more and then fit some of the useful specifics into that. I would be interested to see what you come up with. cheers Matt On 6/6/07, James Lawson [EMAIL PROTECTED] wrote: Hi folks, Tommy is currently working on a sorting function for the main model list of the PMR. Peter is looking for some ideas on what categories people think should be included (with respect to biology, not curation - that will be separate, coming soon!) The old repository obviously has models listed under categories. Do we want to keep those categories? We don't want too many categories, so what is important? As well as just biological function, perhaps the model type is important too. Particularly when the repository goes 1.1 we'll start getting multiscale models. Here's a list I compiled, which represents most of the models we currently have. Multiscale Cardiobiology/Cardiophysiology Neurobiology/Neurophysiology Beta Cells/Insulin Electrophysiology Biomechanics Biochemistry Pharmacology Signal Transduction/Signalling Metabolism Energy Metabolism Cell Cycle Immunology Virology HIV Circadian Rhythms Calcium Dynamics Protein Structure Function Please discuss. Kind regards, James Lawson ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] PMR categories
On 6/6/07, David Nickerson [EMAIL PROTECTED] wrote: Would I be correct in assuming that these terms will be key words added to the model metadata and that the division into categories on the main repository page will be assembled from queries on each of these predefined key words? I would hope so. And if so, I'm gonna further assume that there are no issues with having a model in more than one category, right? I would hope so. And what are the consequences for a model not fitting into any of these categories? It has to fit somewhere, I don't think the list is easily determined from the top down like this. I would prefer that keywords were added for each model and then we look at the accumulation of terms post this process and work out sets and subsets and maybe places where merging or specialisation can occur. I'd suggest that multiscale is a bit too general to be useful in this sort of setting, as its conceivable that pretty much every model in the repository is multiscale in some sense. Yeah, it's too vague a term at the moment. David. James Lawson wrote: Hi folks, Tommy is currently working on a sorting function for the main model list of the PMR. Peter is looking for some ideas on what categories people think should be included (with respect to biology, not curation - that will be separate, coming soon!) The old repository obviously has models listed under categories. Do we want to keep those categories? We don't want too many categories, so what is important? As well as just biological function, perhaps the model type is important too. Particularly when the repository goes 1.1 we'll start getting multiscale models. Here's a list I compiled, which represents most of the models we currently have. Multiscale Cardiobiology/Cardiophysiology Neurobiology/Neurophysiology Beta Cells/Insulin Electrophysiology Biomechanics Biochemistry Pharmacology Signal Transduction/Signalling Metabolism Energy Metabolism Cell Cycle Immunology Virology HIV Circadian Rhythms Calcium Dynamics Protein Structure Function Please discuss. Kind regards, James Lawson ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion -- David Nickerson, PhD Research Fellow Division of Bioengineering Faculty of Engineering National University of Singapore Email: [EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] PMR categories
I think you need to re-read my last email. Ontologies are blessed lists at the most simple interpretation. I don't care too much about different terms for the same thing at the moment, the impetous should be to retrieve all the current keywords, then 'bless the list and update the keywords and add others, then give this list to me to try and formalize in the ontology. On 6/6/07, Tommy Yu [EMAIL PROTECTED] wrote: Just had a discussion with Peter, Randall and James about this. The keywords are in the metadata for the models, and there is no limit to what can go in there. The concern about that is the list could get too big (for minor categories), or variations in the name (electrophysiology vs electrophysiological), or just spelling in general. What was decided is to have the same category list, but it would act as a blessed list of keywords that will serve as a guide to what should be added to the model, and as a broad category filter for the main repository listing. Users would still be able to add or search by other keywords (from the advance search interface) if they wish. Tommy. Matt wrote: On 6/6/07, David Nickerson [EMAIL PROTECTED] wrote: James Lawson wrote: David Nickerson wrote: Would I be correct in assuming that these terms will be key words added to the model metadata and that the division into categories on the main repository page will be assembled from queries on each of these predefined key words? Well potentially, there could be many many different keywords, so Peter suggested that we might not necessarily want to base the categories on just the keywords. At the moment, Tommy's sorting function is based on keywords but he suggested that we could have both a keyword and a more general category selection system. not sure I like the idea of a separate category, seems to me adding some special piece of metadata to models just to make a repository dump look pretty isn't the way to go. It would be nicer to make use of the keywords (which are genuinely useful metadata to more than just the model repository), possibly with the addition of a guided part of the metadata editing workflow which prompts the user to choose at least one of the predefined category keywords and a filter smart enough to put models without one of the special keywords into an other category. This way the main repository page layout could be easily changed to add or remove keywords that get pulled out as categories without having to change the models. It would also be nice if we can analyse all the repository searching to keep track of the most popular keywords and adjust the categories on the main page accordingly :-) Well, I'm hoping to steal all the keywords and lay them out in the physiome ontology and then put them back in as bioentities (or math related) metadata pointing into this. So the long term relationship between keywords and this ontology metadata is where my thinking lies. I like the idea of reflecting this information into keywords, e.g. for 'cardiovascular' the bioentity would be some big long uri pointing into the instance of the term 'cardiovascular' in the ontology, so it would be nice if the keywords were at least dynamically generated from the labels of these ontological term instances. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] removal of reaction components from models
Hi, First off ... Horray for the demise of the reaction element. From your description, the script of Andrew's pulls apart the components and adds in the formerly implicit rate*stoichiometry as I would expect, though I would like stoichiometry to be represented as an explicit variable so we can refer to it and not have to infer from constants in the math. There may be some argument to more decomposition of this into separate components, but that is work in progress for Best Practices and Sarala's annotation work. As for keeping some sane biology in the models that is lost by removing reactions. Yes, Sarala's work will end up being the practice we want to take for this, but at the moment it is not proven and considering this is a public resource lets stick to something already specified for cellml metadata which in time can be automatically migrated to (or simply complimented with) something in Sarala's domain. Briefly, Biopax addresses the role of an entity by way of its place in an interaction process, whereas our reaction elements were quite explicit about a role. I would propose something very simple. Use the biological entity metadata (see section 4.10 Biological Entity of http://www.cellml.org/specifications/metadata/cellml_metadata_1.0#sec_general_metadata) to refer to a prescribed role within our own controlled vocab that is designed only for the purpose of maintaining this role data. The cmeta:bio_entity can contain a collection of references which allow for general concepts to be mapped to a CellML element as well as a specific physical entity from say a protein database. This means that each variable that had a role in the reaction element would now need a cmeta id assigned to it and the respective rdf written out for the bioentity data. The rdf:value of the identifier would be the URI of the respective role in the role vocab. It would make sense(and help the migration/complement of Srala's work) if the roles for modulators could follow those set out in biopax (see pages 17-19 of http://www.biopax.org/release/biopax-level2-documentation.pdf). So we would end up with simple URIs such as the following (which map into terms within some ontological context in the imaginary document http://cellml.org/vocabularies/2007/05/17/reactionmapping): For the role of entities http://cellml.org/vocabularies/2007/05/17/reactionmapping#reactant http://cellml.org/vocabularies/2007/05/17/reactionmapping#product http://cellml.org/vocabularies/2007/05/17/reactionmapping#catalyst For the kinds of modulators http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-allosteric http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-competitive http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-irreversable http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-noncompetitive http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-other http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-uncompetitive http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation-nonallosteric http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation-allosteric Sarala has a rule for generating the cmeta ids of variables and math, so perhaps it is best to make up an xslt that takes a cellml model and generates the cmeta ids for the elements and puts them back in to the elements and perhaps even creates stub rdf description elements for each of them. If you send me a component that has been decomposed from a reaction element using Andrew's script, then I'll add in the rdf metadata in the way I was thinking and post it back here. cheers Matt On 5/17/07, James Lawson [EMAIL PROTECTED] wrote: Dear All, This email is aimed at anyone who has comments, but we particularly want to draw Matt's attention. In the CellML meeting yesterday I brought up the issue of replacement of reaction components with straight math. At present PCEnv isn't handling reaction components well - models which use reaction components aren't integrating, for one. There are also issues with math elements not being picked up if they are under role elements. Andrew has written a script to pull these math elements up a level so that they're a direct child of the component, not the role element. The script also defines delta variables as rate * stoichiometry. Running this script on the models which contain reaction components has cleared up most of the errors with undefined delta variables, so now many of the models with reaction components can now be loaded in PCEnv. The problem is that *none of them* will integrate properly. I am making the assumption that this effect is due to the reaction component, not the models, since it is so widespread among many very different models. I'm going to start
Re: [cellml-discussion] Dimensional consistency andunitsconversions (was [Fwd: Re: ten Tusscher model])
So one way out to avoid having to hope that software does the right thing is to make it compulsory that there is units consistency for each dimension across all variables (defined in variable elements) in a CellML component so that the only units conversions that need to take place are at the interfaces. Quantities that are dimensionless after simplification aren't going to affect dimensional analysis of the math, so we would be safe there. On 4/23/07, David Nickerson [EMAIL PROTECTED] wrote: So where does the problem lie? This says that all you supposedly dimensionless constants should have units. Does it need to be clearer that you are not allowed to simplify them out into dimensionless yourself? yes - I think this is the issue. Also that tools shouldn't simplify them into dimensionless before doing the multiplication and/or units consistency checking. Also. What does it matter that some software simplifies them out before multiplicating them? So long as it checks units consistency prior to simplifying them (if it really needs to do that anyway) then the result should be the same. yep - that is the key. The units must be there when an application checks units consistency. Andre. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Dimensional consistency and units conversions (was [Fwd: Re: ten Tusscher model])
The CellML specification distinguishes expansion and simplification. The expansion algorithm given maintains a scale factor in the expanded unit, and hence the *expansion* of m/km is *not* the same as dimensionless. It is only when you then simplify that the scale factor is lost. For checking dimensional consistency, this is fine. For performing units conversions, it is not, and so the units conversion algorithm in the spec does *not* drop the scale factor, even though it claims in the text that the units involved are fully expanded and simplified, this is according to C.3.4 (expansion) not C.3.1 (simplification). So there are two separate questions here: 1) checking the dimensional consistency of mathematics Which we can already do given our variables and constants need to all have dimensions and it is easy to discover if they do not. 2) ensuring a sensible semantics for simulating a model What are sensible semantics in the case of 'units'? How do you ensure something like units consistency has been applied correctly by an author? You are never going to be able to ensure any unit multiplication factors are correct, even if they are given units of their own. going back to the following equation Andrew supplied: amount_Na [nmol] = 10^-6 [dimensionless] * conc_Na [nmol / L] * vol[microL] How do you make this more 'valid'? (it is already valid dimensionally and numerically). Also, I'm not sure what 'converting units at connections' has to do with this discussion - that only ensures value consistency in the assigning of values to inputs and outputs of components and has nothing to do with ensuring units consistency inside the math. cheers Matt ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] CellML Validation, Curation and Model Composition
I'll reply way back here but forward reference where appropriate. On 4/14/07, Randall Owen [EMAIL PROTECTED] wrote: One topic that seemed to come up in a lot during the recent CellML Workshop in Auckland was model validation, curation, and composition. This caused me to think about a couple of questions about which I would really appreciate the CellML community's thoughts: 1. What does validation mean to the CellML community? I agree with Andre's list. I would add that if we introduce the ability to add external code to the models through the bindings to the math that Andrew has proposed (http://www.cellml.org/Members/miller/bcp-external-models/) then in addition to testing the MathML for schema validation there would need to be some 'test' to validate the external code. What this means is a bit unclear, but would at least include: - code compiles/loads without language based errors - code passes tests provided. What tests these are will be a lively discussion, but at least they should demonstrate that they perform what the external code metadata says it does. I also think there is room for some more theoretical validation, especially in the composition/model reuse area. A simple example would be Mike Cooling's accounting cycles do such cycles exist? Have they been broken? Another simple example would be mixing components that have math which don't share operating domains - e.g. the range of an output variable will not overlap (or minimally overlaps) with the domain of another variable which uses this as input. Another one would be assumptions of a chosen model, e.g. the steady state assumptions of various chemical kinetics equations. I think that this involves a lot of work, but would provide very useful data for filtering and choosing models or parts of models to reuse. 2. What does decomposition and composition mean to the CellMl community? I agree with Andre's comments. I would like to stress the part about best practices. Many of these could be programmably applied to existing models, and should be. There is also a case that needs to be debated where all models are decomposed into models of single components with single pieces of math that represent only one top level apply and then the original models recomposed out of these. This has come up, and I'm not sure I would support it in the practical sense (it's a nice theoretical challenge). The problem would be how to create the intermediate submodels that created logical subtrees/submodels that formed the 'parts' of a more complex model. E.g. a single channel could be made up of many submodels, and the value of the model of the channel is it's functional unit. What this does say to me is that a high level of human decision making in the creation of submodels would provide us with the most meaningful decomposition graphs - where the nodes of a graph are models and the edges are imports, and a model represents some logical unit. I would also add to the composition argument what it means to combine the metadata of two or more models. Some of this may require processing - such as connecting up a biological graph model. E.g. can you generate a composite biopax model from a collection of biopax models, and does this represent the intention of the composition in the cellml model. So while the composition and decomposition in CellML is explicitly topological, the intention will be quite functional. 3. What does re-usability mean to the CellML community? I agree with Andre's comments. Also, there should be an emphasis on at least the following: - best practices that allow for reuse without having to heavily remodel the submodel you want to reuse - methods for identifying the best choices from a library for your intended use - methods of decomposing models where practices haven't exposed a useful structure - e.g. need to bubble up some variables from encapsulated variables. - methods to automatically compose simple reuse such as hooking up a different set of parameter values to a generic model. I guess I am pointing towards the 'practical application' of reuse here. cheers Matt Each of the above have very specific meanings within the software engineering and computing communities. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] CellML Validation, Curation and Model Composition
I'll reply way back here but forward reference where appropriate. On 4/14/07, Randall Owen [EMAIL PROTECTED] wrote: One topic that seemed to come up in a lot during the recent CellML Workshop in Auckland was model validation, curation, and composition. This caused me to think about a couple of questions about which I would really appreciate the CellML community's thoughts: 1. What does validation mean to the CellML community? I agree with Andre's list. I would add that if we introduce the ability to add external code to the models through the bindings to the math that Andrew has proposed (http://www.cellml.org/Members/miller/bcp-external-models/) then in addition to testing the MathML for schema validation there would need to be some 'test' to validate the external code. What this means is a bit unclear, but would at least include: - code compiles/loads without language based errors - code passes tests provided. What tests these are will be a lively discussion, but at least they should demonstrate that they perform what the external code metadata says it does. I also think there is room for some more theoretical validation, especially in the composition/model reuse area. A simple example would be Mike Cooling's accounting cycles do such cycles exist? Have they been broken? Another simple example would be mixing components that have math which don't share operating domains - e.g. the range of an output variable will not overlap (or minimally overlaps) with the domain of another variable which uses this as input. Another one would be assumptions of a chosen model, e.g. the steady state assumptions of various chemical kinetics equations. I think that this involves a lot of work, but would provide very useful data for filtering and choosing models or parts of models to reuse. 2. What does decomposition and composition mean to the CellMl community? I agree with Andre's comments. I would like to stress the part about best practices. Many of these could be programmably applied to existing models, and should be. There is also a case that needs to be debated where all models are decomposed into models of single components with single pieces of math that represent only one top level apply and then the original models recomposed out of these. This has come up, and I'm not sure I would support it in the practical sense (it's a nice theoretical challenge). The problem would be how to create the intermediate submodels that created logical subtrees/submodels that formed the 'parts' of a more complex model. E.g. a single channel could be made up of many submodels, and the value of the model of the channel is it's functional unit. What this does say to me is that a high level of human decision making in the creation of submodels would provide us with the most meaningful decomposition graphs - where the nodes of a graph are models and the edges are imports, and a model represents some logical unit. I would also add to the composition argument what it means to combine the metadata of two or more models. Some of this may require processing - such as connecting up a biological graph model. E.g. can you generate a composite biopax model from a collection of biopax models, and does this represent the intention of the composition in the cellml model. So while the composition and decomposition in CellML is explicitly topological, the intention will be quite functional. 3. What does re-usability mean to the CellML community? I agree with Andre's comments. There should be an emphasis on at least the following: - best practices that allow for reuse without having to heavily remodel the submodel you want to reuse - methods for identifying the best choices from a library for your intended use - methods of decomposing models where practices haven't exposed a useful structure - e.g. need to bubble up some variables from encapsulated variables. - methods to automatically compose simple reuse such as hooking up a different set of parameter values to a generic model. I guess I am pointing towards the 'practical application' of reuse here. cheers Matt Each of the above have very specific meanings within the software engineering and computing communities. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] model variants(?): e.g. where multiple cell types are described by the same model in different files
This scratches a couple of important pending issues: 1) I feel the term 'variant' is odd (even though I originally suggested it). It was intended to mean that the model labelled as a variant is a variation of the one it is a variant of. However, this isn't really a very applicable definition, especially if one may consider a model to be a variant of more than one other model. Since variant is bound up in the URI and name then this makes for dilemmas. My suggestion would be that we drop variants altogether. We can mark relations in a better way through metadata. I am also querying whether the flatness of our URI scheme is appropriate for our uses. e.g. perhaps: http://www.cellml.org/models/bondarenko_szigeti_bett_kim_rasmusson_2004_version01 should be something like http://www.cellml.org/models/bondarenko_szigeti_bett_kim_rasmusson/2004/ventricular?rev=1 (no that is not a formal proposal) But this doesn't really help you now. The technically correct method at the moment would be that the new models that are similar but different only in their parameterisations are added as variants. Another alternative in the short term would to simply name the models as separate models (which they are) and we define now an rdf relation scheme that is very explicitly about how different models at different URIs relate to the one you are editing. This would mean that Tommy needs to update the rendering of the pages for these models to reflect this information. 2) These models should use imports so that we can at least point to the generic model and then the specialised parameterised ones. But that won't work right now because the repository can't handle 1.1 models. cheers Matt On 4/12/07, James Lawson [EMAIL PROTECTED] wrote: Addendum: The model in the repository describes the apex cells only. James Lawson wrote: Hi folks, I'm just fishing for some comments on how to handle cases where there are models which describe, for example, the properties of multiple cell types. Bondarenko et al. 2004 is a good example of this: In the Bondarenko et al. 2004 publication described here, the authors develop a computer model of the mouse ventricular action potential (see below). The model includes parameters for both the apex and the septum regions of the heart (the apex parameters have been substituted into the CellML version of the model described in ), and this helps to illustrate how there are regional differences in myocyte repolarisation in the mouse heart. Penny Noble has just sent me a zip file containing the latest versions of all the models she has curated for COR. I'm currently in the process of comparing these versions to the latest versions we have on the repository. In some cases, she has provided several variants of the same model which describe different cell types, as above. Firstly, are these true 'variants'? If so, then the matter is relatively simple. Unfortunately, if one goes through the repository list, variants simply come up as a duplication of the model listing, with no information concerning what the variant represents. This is something I imagine will be fixed in time, but is presently rather frustrating considering the issue at hand. Does anyone have any comments on this? Should I simply put the files up as different variants, with a note in the documentation? Thanks, James Lawson ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] model variants(?): e.g. where multiple cell types are described by the same model in different files
On 4/12/07, James Lawson [EMAIL PROTECTED] wrote: Matt wrote: This scratches a couple of important pending issues: 1) I feel the term 'variant' is odd (even though I originally suggested it). It was intended to mean that the model labelled as a variant is a variation of the one it is a variant of. However, this isn't really a very applicable definition, especially if one may consider a model to be a variant of more than one other model. Since variant is bound up in the URI and name then this makes for dilemmas. My suggestion would be that we drop variants altogether. We can mark relations in a better way through metadata. I am also querying whether the flatness of our URI scheme is appropriate for our uses. e.g. perhaps: http://www.cellml.org/models/bondarenko_szigeti_bett_kim_rasmusson_2004_version01 should be something like http://www.cellml.org/models/bondarenko_szigeti_bett_kim_rasmusson/2004/ventricular?rev=1 (no that is not a formal proposal) But this doesn't really help you now. The technically correct method at the moment would be that the new models that are similar but different only in their parameterisations are added as variants. Right. So given the current system I would do the following: 1) create a generic form of the model without parameters set - a broken model in one sense 2) add these other models that are different parameterisations of this model as variants of this generic-ish one 3) add the existing apical cell one in as a variant of this generic model also. 4) put an HTTP redirect in for the old URI to point to the variant form. I take it the existing one is: http://www.cellml.org/models/bondarenko_szigeti_bett_kim_rasmusson_2004_version01 which actually says: The model includes parameters for both the apex and the septum regions of the heart (the apex parameters have been substituted into the CellML version of the model described in ) Which I find confusing. Another alternative in the short term would to simply name the models as separate models (which they are) That's an interesting proposal. Given the current way that the models are listed, that would be a good way of displaying that the models are variants. If you upload two variants of a model, they come up as duplicate listings (i.e. no information is displayed about the nature of the variant,) so simply making them two different models would get around this. Anyone else have comments on this? and we define now an rdf relation scheme that is very explicitly about how different models at different URIs relate to the one you are editing. This would mean that Tommy needs to update the rendering of the pages for these models to reflect this information. 2) These models should use imports so that we can at least point to the generic model and then the specialised parameterised ones. But that won't work right now because the repository can't handle 1.1 models. In this case there is no generic model available. The model we have on the repository for Bondarenko et al. 2004 is the one describing the apical cells. cheers Matt ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
against them, then we are guaranteed to be able to interpret them. First, the type of metadata evolves very rapidly. We already have 29 types in MIRIMA resources, but I anticipate that number to grow very rapidly as libSBML3 (that implement the RDF annotation scheme) is adopted by the developers. How does that fail externalisation of metadata type through publishing schemas? Because MIRIAM resources can be updated in a second, and then the webservices make it immediately available to resolve annotations. To develop a schema takes time, energy and people. I don't see a whole lot of difference between an RDF Schema and a custom lookup table except that the Schema is a whole lot more flexible, especially if people want to customise it for in-house purposes but still be able to produce valid metadata for the wider community. It is in fact where I expect most pressure to add new properties and datatypes to the global schemas to come from. Who will do-it? The SBML-team is actually providing XML-schemas for SBML, and this is quite a hard job to do it properly. How does this relate to RDF Schemas or annotation in general? But more importantly, software developers often use local versions of the schemas. Finally MIRIAM resources can be completed by anybody. No need to wait for the SBML team or the CellML team to be ready to make the change. I don't understand what you mean. What sort of 'completion' would take place that may require one of our teams to have to make a change? -- Nicolas LE NOVERE, Computational Neurobiology, EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074 http://www.ebi.ac.uk/~lenov, AIM:nlenovere, MSN:[EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion Matt ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
On 4/3/07, Nicolas Le Novere [EMAIL PROTECTED] wrote: You will always need to pull apart the 'URI' (table2 MIRIAM document) to retrieve the datatype and identifier. Well, yes you have to recognise what belongs to the data-type and what belong to the identifier. But you do that all the time in RDF anyway. And beside, we do it for you and can return one or more URLs. In what part of RDF is this done? I guess I'm not sure why it isn't easier to keep the meaning of datatype and identifier seperate within the language context you are using - which is basically RDF. So that instead you would have something like: $X $QUALIFIER $DATATYPEINSTANCE $DATATYPEINSTANCE isA $DATATYPE $DATATYPEINSTANCE hasIdentifier $IDENTIFIER $DATATYPEINSTANCE hasPhysicalUrl $URL Something like: species calmodulin is calmodulin_in_uniprot calmodulin_in_uniprot isA UniProt_entry calmodulin_in_uniprot hasIdentifier P62158 calmodulin_in_uniprot hasPhysicalUrl http://www.ebi.uniprot.org/entry/P62158 - One cannot recognize UniProt_entry if it is a free string? This is why there are MIRIAM data-types - The physical URL being not stable, one cannot store actual URLs in the models themselves. Nope. I should be clearer about what would be published globally and what should be in a particular model. Published globally: calmodulin_in_uniprot isA UniProt_entry calmodulin_in_uniprot hasIdentifier P62158 calmodulin_in_uniprot hasPhysicalUrl http://www.ebi.uniprot.org/entry/P62158 UniProt_entry subClassOf DatabaseRecord (that's quite generic) hasUniprotEntry subPropertyOf hasDatabaseRecord hasDatabaseRecord subPropertyOf isDescribedBy In the model: x hasUniProtEntry calmodulin_in_uniprot Now one can filter all annotations in a model for database links only, or uniprot links only etc. It all just works when using RDF Schema. So we are down to: species calmodulin is calmodulin_in_uniprot calmodulin_in_uniprot isA http://www.uniprot.org/ calmodulin_in_uniprot hasIdentifier P62158 How is-it different (in term of information content and of computing steps necessary to parse) from: species calmodulin is http://www.uniprot.org/#P62158 For the reason's I give above. We do not need to relate them to specify that they all deal with publications. It is already done by the bqmodel:isDescribedBy No, isDescribedBy has no semantic meaning - there is nothing to say that it explicitly defines a publication in a journal article or a vocabulary term. Of course not, because we do not want to restrict the type of data used to describe the component. This is what I said just above. What do-you mean by isDescribedBy has no semantic meaning. isDescribedBy exactly means the is described by. How is that not a meaning? In the RDF world you have not attributed any meaning. It could be used for anything. What comes after the by can be a journal article, a webpage, a song, a poem, a control vocabulary term, or a telepathic transmission. Right, and so if it can be followed by anything then it has no meaning there is nothing to distinguish it from any property that someone may make up. So there is no way to determine if some set of URIs are controlled vocab terms and some set are journal articles and some set are experimental result sets? No. It is up to the user to decide what to do with what. See my ChEBI example before. For some people ChEBI is a controlled vocabulary, for some it is a database of chemical compound. For me, CAS is a database of chemical compound, for CellML, it is a bibliographic resource. But when a person uses it, they will have a context. It may be one or more of those from your list, but it is still useful for them to restrict the intention of the property to something close to what they mean. I don't particularly want to guess whether someone is intending the DataType to mean 'chemical compound' as opposed to 'bibliographic resource'. For all I know the record that I may be able to locate based on this URI might have value in both domains and I would want to know what they were intending. Of course you'd hope the record was RDF anyway and you were able to point to the record attribute by a full URI. -- Nicolas LE NOVERE, Computational Neurobiology, EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074 http://www.ebi.ac.uk/~lenov, AIM:nlenovere, MSN:[EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
for isDescribedBy when you are referring to a publication? From the dublin core spec: The isReferencedBy and References refinements enable the expression of relationships that aid the user but are not necessary tied to the life cycle or necessary for the intended use of the resource. This relationship might be used to link an article critical of a resource to that resource, a satire of a speech to the original speech, etc. Surely that provides some semantic value? :-) This is what we used until the end of 2006 in fact! But then we decided to stop confusing people with DC qualifiers for some metadata and not for the others. Besides, isReferencedBy really link a document to another document. Here we want to describe the relationship between a document and a model. Finally, we will use isDescribedBy also to link parameters and the literature that described the measure of this parameter I think overall we are trying to achieve the same thing in terms of expressiveness and interpretability, and I guess I am just left wondering why RDF schemas (or something built off them) didn't interest you so much. cheers Matt -- Nicolas LE NOVERE, Computational Neurobiology, EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074 http://www.ebi.ac.uk/~lenov, AIM:nlenovere, MSN:[EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
By all means, step in as much as possible. Can you explain in more detail or point to explanations of bqmodel:isDescribedBy? Specifically: - what is its intended meaning? - when more than one of these is defined on a resource, how is this interpreted? For example: is there some precedence implied somehow? - how do you determine the kind of reference it is - for example a pubmed uri? You have a datatype for vocab/database IDs in the annotation scheme you described, but I don't see this in the bqmodel:isDescribedBy examples. - how would you address auxiliary references as opposed to primary references so that a machine interpreting it can make the distinction? snip I entirely agree with Melanie, people should be able to pick the resource they want, as far as they uniquely identify it. This is clearly described in the MIRIAM paper. I'm not sure what benefits one gains from letting people arbitrarily choose what they want to use to identify something with. For example, how to you work out if particular entities in one SBML model match entities in another SBML model? Also, given that most of these resources are controlled vocabularies, there is a lot of room for misunderstanding someone's intention when using their choices of identifiers. An annotation is formed of three parts: The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ... The identifier of the particular information, e.g. 123456789, GO:0001234 ... An optional qualifier that describe the relationship between the concept represented by the model component and the concept represented by the particular information. To help people implement that, we developed MIRIAM resources (http://www.ebi.ac.uk/compneur-srv/miriam/). If you download a model from BioModels DB in SBML (not in CellML at the moment, for obvious reasons highlighted by the current discussion), you will see something like: bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ /rdf:Bag /bqmodel:isDescribedBy But on the webpage, there is: bPublication ID:/bnbsp;a href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrievedb=pubmeddopt=Abstractlist_uids=8983160; target=_blank8983160/a The URL is dynamically generated by MIRIAM webservices. I fact in the new version of BioModels DB, to be released in the fall, the URL does not point to PubMed anymore, but to the EBI extended Medline, more comprehensive. BUT the URI stored in the model is still the SAME. Similarly for a DOI: bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.doi.org/#10.1063/1.1681288/ /rdf:Bag /bqmodel:isDescribedBy is transformed in: bPublication ID:/bnbsp;a href=http://dx.doi.org/10.1063/1.1681288; target=_blank10.1063/1.1681288.../a That system is very flexible. You can use any resource listed in MIRIAM resources, and this resource can be extended at will (note that we distribute XML version of the resource for local use). But it is still robust and expressive. Cheers, On Wed, 28 Mar 2007, Melanie Nelson wrote: Wow, I haven't posted to this list in a long time... But I feel compelled to give a little advice as someone who's spent a lot of time integrating biological information and therefore has made a lot of mistakes! By all means, have a best practice encouraging people to use the GO cellular_component ontology to describe organelles and cells. You could probably also use the molecular_function ontology for proteins (although this will be messier). However, neither is likely to be a complete, i.e., there will be models that reference a biological entity not in the GO ontologies. Also, there will be cases where the entity the model references is most properly thought of as related in some way (e.g., a subset, a superset, or a sibling) to the GO entity. You can spend ages sorting this sort of thing out and coming up with consistent rules for handling all the relationships. Since you aren't really interested in sorting out this biological mess, you may want to consider letting people choose their own ontology and just reference it. An example of this practice is in the MIAME project: http://www.mged.org/Workgroups/MIAME/miame_1.1.html About the citations- my memory of this is fuzzy, but I think the original intent was that people should provide the PubMed ID where possible. However, not all journals are indexed in PubMed (for instance, there is a CellML paper published in one that is not), so the model needs to handle full citation info, too. The BQS model handles both, and then some, which is why we chose it. Hope this is helpful, Melanie --- Andrew Miller [EMAIL PROTECTED] wrote: Matt wrote: I don't think this is a good idea. - I think bioentity should be depreciated, it has not intrinsic semantic value. It does, unfortunately, seem to usually target a literal node at the moment. It would
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
On 3/29/07, Nicolas Le Novere [EMAIL PROTECTED] wrote: On Thu, 29 Mar 2007, Matt wrote: Can you explain in more detail or point to explanations of bqmodel:isDescribedBy? You can find some explanations at: http://www.ebi.ac.uk/compneur-srv/miriam-main/mdb?section=qualifiers So there is no simple way to determine if this is a reference to a journal article except through interpreting the URI? Note tha qualifiers are optional to be MIRIAM-compliant. I personaly think we should always use some qualification, otherwise an annotation becomes very difficult to use except for jumping from webpage to webpage. Specifically: - what is its intended meaning? Cf above. Note that the list of qualifiers is by no mean frozen. We are already aware of several gaps (e.g. how do-we qualify the relation between a peptide and the gene that encodes it?) - when more than one of these is defined on a resource, how is this interpreted? For example: is there some precedence implied somehow? This is up to the tool using the qualifiers. SBML does not allow nested qualifications. There is only an implicit hasVersion if several identical qualifiers are present: bqmodel:isDescribedBy toto bqmodel:isDescribedBy tata means is described by toto and is described by tata. In other words toto or tata describe the component. NOT toto and tata are necessary to describe the component. On top of that, BioModels DB add some precedence http://www.ebi.ac.uk/compneur-srv/biomodels/doc/annotation.html But all that is not part of MIRIAM rules. - how do you determine the kind of reference it is - for example a pubmed uri? You have a datatype for vocab/database IDs in the annotation scheme you described, but I don't see this in the bqmodel:isDescribedBy examples. rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ http://www.pubmed.gov/ means the following identifier has to be interpreted as pointing to a data of PubMed. http://www.pubmed.gov/ is unique and should not normally change. However, sometimes it may neverstheless change for various reasons: URI too confusing, badly choose, fusion of two resources etc. For instance, the old PubMed URI was http://www.ncbi.nlm.nih.gov/PubMed/ It was misleading because tied to a particular physical resource at the NCBI. We have a deprecation system in place that allow to resolve the old URIs and provide the new ones. - how would you address auxiliary references as opposed to primary references so that a machine interpreting it can make the distinction? I am not sure I understand that. Like primary and secondary accessions of UniProt? For journal articles, or other publications, then being able to identify the primary reference(s) is useful. For database records, it would also be useful to label a group as being the most important (or defining) set, and others as 'helpful'. It was why I suggested that CellML bibliographic referencing seperated these two, and that the latter would need to be bound to a reason (a natural language comment would be fine) the described why that reference was made. snip I entirely agree with Melanie, people should be able to pick the resource they want, as far as they uniquely identify it. This is clearly described in the MIRIAM paper. I'm not sure what benefits one gains from letting people arbitrarily choose what they want to use to identify something with. For example, how to you work out if particular entities in one SBML model match entities in another SBML model? Also, given that most of these resources are controlled vocabularies, there is a lot of room for misunderstanding someone's intention when using their choices of identifiers. An annotation is formed of three parts: The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ... The identifier of the particular information, e.g. 123456789, GO:0001234 ... An optional qualifier that describe the relationship between the concept represented by the model component and the concept represented by the particular information. To help people implement that, we developed MIRIAM resources (http://www.ebi.ac.uk/compneur-srv/miriam/). If you download a model from BioModels DB in SBML (not in CellML at the moment, for obvious reasons highlighted by the current discussion), you will see something like: bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ /rdf:Bag /bqmodel:isDescribedBy But on the webpage, there is: bPublication ID:/bnbsp;a href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrievedb=pubmeddopt=Abstractlist_uids=8983160; target=_blank8983160/a The URL is dynamically generated by MIRIAM webservices. I fact in the new version of BioModels DB, to be released in the fall, the URL does not point to PubMed anymore, but to the EBI extended Medline, more comprehensive. BUT the URI
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
I don't think this is a good idea. - I think bioentity should be depreciated, it has not intrinsic semantic value. - If it is used currently, it should be left as its current minimum specification which is to label and point to other bioinformatics database IDs. - The problem is not 'biologically related paper's' per se, but one of identifying what was the primary publication or publications that motivated a model. - There is also the case where a single publication that contains a mathematical model is the one and only primary source for the model itself - a rather common case at the moment. I would prefer that the primary publication(s) be identified as such, which covers the case in where there are some models in the repository built from general review papers of biology with no math. I would prefer references to other related publications to be bound explicitly to a comment in the model metadata - there should be a reason identified by the author/editor/reviewer as to why there has been such an association made. As an aside, we also need to determine whether the bqs schema provides enough detail to match publications across metadata instances for different models, and whether we should be complimenting bibliographic data with pubmed Ids and the like. cheers Matt On 3/29/07, Andrew Miller [EMAIL PROTECTED] wrote: Hi, As discussed at the last CellML meeting, there are some models which reference both the paper about the model, and a reference about the biology. Since there is no way to determine between them, this creates problems for CellML metadata processing tools which want to identify the paper about the model (such as the CellML repository). However, it would still be a good thing to include references about the biology / experiments on which a model is based, as well as papers on underlying mathematical techniques (and perhaps earlier papers?) The CellML Metadata specification already describes a predicate cmeta:bio_entity, and another cmeta:math_problem. Although the cmeta specification suggests that these be used to provide references to identifiers for the biological entity a part of the model relates to, and likewise for the mathematical problem, it would also be possible to create a list of references inside the resource targeted by the bio_entity or math_problem predicate. I would therefore suggest that the following be considered best practice: 1) Only refer to the paper about the model from the metadata for the model. 2) Any other papers should be in another resource referred to from the bio_entity and math_problem entities. Does anyone else have any opinion on this? Best regards, Andrew || ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
Melanie! Thanks for your thoughts. You are right about the mess mapping to different ontologies and vocabs produces. We have been working on trying to integrate explicitly with biopax (http://www.biopax.org/ states and generics proposal - level 2 was too limiting) in the hope that other databases (like GO, reactome, KEGG, DIP, signalling gateway, etc) get dragged along too. Some seem to be. At the moment, I think the cleanup begins at the repository interface. So long as we validate that the biological annotation is unambiguous, complete, and represented in a way that agrees with one or more of an accepted set of ontologies and vocabs, then I think we might be in a position to perform useful queries. We also have a person doing a PhD here now on the visualisation of CellML models which relies solely on the biological annotation. It's a pretty good failure point to get nothing in your picture :-) thanks again cheers Matt On 3/29/07, Melanie Nelson [EMAIL PROTECTED] wrote: Wow, I haven't posted to this list in a long time... But I feel compelled to give a little advice as someone who's spent a lot of time integrating biological information and therefore has made a lot of mistakes! By all means, have a best practice encouraging people to use the GO cellular_component ontology to describe organelles and cells. You could probably also use the molecular_function ontology for proteins (although this will be messier). However, neither is likely to be a complete, i.e., there will be models that reference a biological entity not in the GO ontologies. Also, there will be cases where the entity the model references is most properly thought of as related in some way (e.g., a subset, a superset, or a sibling) to the GO entity. You can spend ages sorting this sort of thing out and coming up with consistent rules for handling all the relationships. Since you aren't really interested in sorting out this biological mess, you may want to consider letting people choose their own ontology and just reference it. An example of this practice is in the MIAME project: http://www.mged.org/Workgroups/MIAME/miame_1.1.html About the citations- my memory of this is fuzzy, but I think the original intent was that people should provide the PubMed ID where possible. However, not all journals are indexed in PubMed (for instance, there is a CellML paper published in one that is not), so the model needs to handle full citation info, too. The BQS model handles both, and then some, which is why we chose it. Hope this is helpful, Melanie --- Andrew Miller [EMAIL PROTECTED] wrote: Matt wrote: I don't think this is a good idea. - I think bioentity should be depreciated, it has not intrinsic semantic value. It does, unfortunately, seem to usually target a literal node at the moment. It would be nice for this to at least be a resource, which could provide further information about the biological entity (or if we decide not to do that, at least a resource, with a dictionary and a process for adding new words to the dictionary to avoid duplication). It seems that GO(Gene Ontology) has terms for cell types, biological compartments, and so on, which would offer a better way to provide this information. I still think that this metadata is useful, even if the automated interpretation of it is currently difficult. - If it is used currently, it should be left as its current minimum specification which is to label and point to other bioinformatics database IDs. There are three layers of information here: Layer 1: What biological entity are we describing? (could be answered with a GO term). Layer 2: What information about that biological entity are we using? (could be answered with a reference to a paper, and perhaps even a reference to raw experimental data). Layer 3: How was that information translated into a model (could be answered with a reference to a paper on the model). Layer 3 is clearly information about the model, and should be described by as an arc of the model resource. Layer 1 is described by a literal at the moment. Layer 2 is therefore a gap, which we don't have any proper way to represent now. - The problem is not 'biologically related paper's' per se, but one of identifying what was the primary publication or publications that motivated a model. The publication which motivated the expression of a model in CellML, or the publication which motivated the creation of the model? Most of the models in the repository were motivated by a paper about a model which was not initially expressed in CellML. However, the way that the metadata specification works now is that the paper which describes the model (not the paper which motivated it) is referenced from the information about the model (not information about the CellML file
Re: [cellml-discussion] Proposal: BCP for including external codeinCellML models
to that. So I think ultimately the domain that wants to guarantee that the source is perpetually available should be the domain that forms the base of the URL. cheers Matt On 3/18/07, Andrew Miller [EMAIL PROTECTED] wrote: David Nickerson wrote: ECMAScript is not practical for use in modelling, because it is an interpreted, non-typed language, which necessarily means that it cannot be compiled and will be slower than compiled code. But CellML is an language for the description and exchange of mathematical models. It is not meant to be a one-off wonder describing the most efficient and best performing method for executing numerical computations. To turn a CellML model description into something useful for computation that description has to be interpreted and compiled into some other format suitable for the environment using it... Surely in the same manner, a standard description of procedural code could then be interpreted by any number of applications in whatever manner they feel best suits their environment? No, because due to the restriction of CellML to expression, it is much easier to work with, and this is what makes it declarative. You can perform a variety of manipulations on declarative expressions, but procedural code can basically only be run in the way it was written to run (for example, even working out whether procedural code will ever terminate, 'The Halting Problem', has been proved to be non-Turing computable in the general case, and this is likely to be the case for other types of manipulations too). Code can often be optimised and compiled, but the features of ECMAScript preclude many of the optimisations that a C compiler, for example, can make. For example, objects can have arbitrary properties, and there is no way to tell at compile-time what set of properties an object will have, or whether a property is a simple property or a getter. While a C compiler might take a value from an offset into a structure, ECMAScript code would end up searching a dictionary of properties on an object. Therefore, ECMAScript is not a good language if you want to be able to interpret it in different ways (and for any Turing-complete language, the ways in which you can interpret it are severely limited). Remember also CellML models can be used to solve a range of different problem types (fitting, ODE time course, and so on), but one procedural code implementation might not be useful for all of them. My BCP document is intended as a way to maintain as much of the model as possible in CellML, but simply leave the rest of the model unspecified. Given the amount of history and development of procedural languages, I don't think we can hope to 'standardise' anything more in a widely acceptable way when it comes to procedural languages. External code needs to be extensible, and hence outside the scope of the CellML specifications, for several reasons: 1) Performance. Code may need to be written in a way which is specific to a particular platform in order to be able to perform well. some response as above. Sometime, human intervention is always going to be required to save a model from unfeasible performance issues. If we take an ideological approach and try to block this from happening, it will just result in CellML not being used at all. Instead, it is better to encourage people to use CellML features whenever possible, but allow external code when it is not possible. 2) Access to existing libraries. There are often extensive libraries and other software packages into which a model needs to be integrated. This could be in practically any language, and so it would be necessary to access to data structures of these libraries to have the model work. I believe that this is the case for much of the CMISS-CellML work (I don't really think that a proposal to re-write CMISS in ECMAScript would be very popular!). In every case of people using CMISS that I know of, the use of CellML is to define model specific mathematical equations for integration into a larger model. In other words, the model consists of parts which can be expressed as mathematical equations, and parts that cannot be expressed in mathematical equations (in CMISS). You are proposing that the parts which cannot be expressed in mathematical equations be written in ECMAScript. I'm not suggesting re-writing CMISS in ECMAScript - rather you seem to be suggesting including CMISS in a CellML model?!? The question of which model is included in which is more an artificial distinction than anything more meaningful. However, there needs to be a mechanism for data flow from CMISS into the CellML models (otherwise, CMISS can only set initial conditions, it can't have any time dependent influence on the model). This would hold for most such cases of using existing libraries that I can think of, with the exception of someone wanting to solve
Re: [cellml-discussion] Proposal: BCP for including external codeinCellML models
On 3/19/07, Andrew Miller [EMAIL PROTECTED] wrote: Matt wrote: I have often thought referencing external code through a clearly defined interface would be useful, and mostly because procedural code is another natural way to solve problems. But I have always banged my head up against validation. With procedural code this amounts to passing tests - good tests - and being confident that the code will break in useful ways when it does break. I don't see this as being any different to the intended outcome of valid CellML models that are purely declarative. At first glance it might seem that it is more taxing for a developer wanting to use CellML in their application if they need to handle external code; but this proposal for external code is very specific to the math declarations, and I think independent of whether the math is represented in MathML or as an external source of procedural code, the decisions of an application that are investigating the math are going to be difficult without sufficient annotation that tries to classify the math formulations in a way that a machine can filter what it is capable of and not capable of processing. I deliberately don't address how to let the tools associate certain external code with a given code-identifier URI. Initially, this would have to be tool specific, but there could be another specification for this process in the future. Right. I don't think your specification needs to address this; I think it is a more general problem of model exchange between software systems where at present there can be quite a bit of work required by the system to figure out if it can interpret the math. At the moment the model interchange we know of - PCEnv, COR, JSim - all interpret models that are systems of ordinary differential equations and mostly modelling in similar ways the same kinds of biological processes. This is very convenient. I don't think the external code proposal complicates this. In some cases I imagine the application developer would welcome a particular math problem being already coded in a language that could be compiled an run. If that thought is continued, then there is a place for a model representation that has all math represented by external code, with the model structure being represented in CellML. This would obviously be under the assumption that some particular decisions for simulation of the model had been made; it is indeed a different scenario from the pure declarative model that seeks to explain the mathematical problem at a higher level and leave it to applications to resolve the simulation from this. At the moment we don't actually have a useful way for providing a cellml model with enough machine readable information for someone to rerun our model in exactly the same we as we had. It is getting closer. PCEnv has its own non-standard meta-data for the exact algorithm used, but we haven't been able to agree on a more general standard way to represent algorithms with gracious fallback yet. By referencing and/or including external code, we allow the step of exchanging a model at the simulation level, which is actually not a bad thing if our goal is to promote collaboration of model building. I don't know if this is true, as I am not suggesting that we allow the stepping (integrator) algorithm itself to be exchanged (although that could be a different, future specification if needed). Hmm, not necessarily the integrator algorithm, but perhaps the math of individual components that are computed. People may share a generic CellML integrator application, but find it easier to exchange model descriptions that have already had the math reformulated into library calls where they share the libraries and possibly develop them in parallel without the need to hook them up into the integration environment through an automatic interpretation of the MathML. That could be a productive win, but at the cost of quite possibly having the MathML version out of sync with the external code/library version. Allowing external code is going to allow this workflow to happen. One of the values of using MathML was to allow us to publish the math in a model and know that it faithfully represents the code that was used to generate the results published along side it. I think this falls over a little where we say represent something awkwardly in MathML for the sake of keeping it in MathML - e.g. mega huge piecewise functions for representing perturbations - would we really want to render these out for publication? You allude to something similar in your response below. What would be the most appropriate way to produce these parts for publication but still remain as error free in translation as possible. I think as you suggest, publishing the code that is actually used or at least a set of tests the library used passed to be applicable to your context, would be the minimum to ensure
[cellml-discussion] simulation metadata editing
(peter wrote) 2. Need ability to edit metadata on website models –e.g. for sensible defaults on time integration parameters and graphical We need to be very careful to preserve a relationship between a simulation and the data obtained from it. I see a problem occurring where we have metadata describing a simulation which is bound to a model, and a graphical output (or set of data points) that are supposed to represent the output of this simulation, which is also bound to the model. There is only an implicit relation between the two such that updating the simulation metadata now produces an inconsistency with the graphs (or associated data points) of results. I think we need to think about in the simulation metadata: 1) uniquely identifying simulations (an rdf ID within the model). 2) referencing the model uri this simulation is referring to (there shouldn't be anything stopping the simulation metadata being picked up and processed in isolation of the model) 3) binding graphs of results to the simulation and not the model. 4) changing the metadata of a simulation needs to force a version change (or variant) in a similar way to models so that a mismatch between graphs or result sets can be detected. This part of the discussion thread seems to belong on CellML discussion now. cheers Matt ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] CellML meetings - can we shift to conference calls?
Hi All, Maybe we aim for one every 2 months at the moment. What's probably useful is for people to email me or the list with items they would like to bring up in a conference call discussion. Maybe the first one should be kept to something short - like an hour. Then from there we can see what we think should happen. I'd like to have more visible project plans on the CellML site. My experiences with SBML and Biopax seem to suggest to me that we can work out a simple system that is transparent and works for everyone. What are the time zones we are dealing with? I realize at least some of us will be at opposite ends of the day; I just want to make sure we don't end up with mid-nighters. It's potentially the thing that will make large conference calls difficult. We could also try ones that are focused around particular projects if that seems better. Depends what sort of agendas we come up with. cheers Matt On 10/30/06, Nigel Lovell [EMAIL PROTECTED] wrote: Dear All, Thanks for also CCing us in Sydney. We would certainly be happy to contribute to a conference call once every two months. We have been continuing to develop tools for parameter optmisation of CellML models of excitable tissue and would value feedback and comments. As always we are happy to contribute to the wider agenda as well. We tend to use Skype for our research collaborations - cheaper for us than conference calls. Regards, Nigel. At 06:29 PM 30/10/06, [EMAIL PROTECTED] wrote: Dear All, Thank you for including me too on this. I have very similar views to Steve on this. It'd feel very nice to be more involved in that way. My main worry is timing as I only have limited availability. Best Regards, Penny Steve McKeever wrote: Dear All, This is a *very* interesting debate, thanks for including me in on it. Personally I would be happy for there to be a conference call every two months. This would allow us all to hear what you guys in Auckland have been up to and for us to pipe in with our various bits and pieces. I don't think we need more contact and conference calls are a bit hit and miss in terms of people all talking at the same time (or not at all) but once established we could change the frequency or allow for an important session when the need arises. Warm regards, Steve -- --- | Prof Nigel Lovell | Graduate School of Biomedical Engineering, | Adjunct Professor, School of Electrical Engineering and Telecommunications, | University of New South Wales | UNSW Sydney NSW 2052 Australia | Ph: +61-2-93853922 FAX: +61-2-96632108 | Email: [EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion