Re: [cellml-discussion] Concerning the CellML Model Repository
This is my view of where things should be heading: The main impetus for this thread is moving the cellml.org site forward. In this sense I would like to see a description of what it currently does and what features have been informally slated. Then I'd like to see a document that re-writes these out as use-cases that don't depend on technology (but can certainly borrow ideas from various technologies). A large part of this are the cellml.org use-cases around the use of metadata in the models. While the underlying implementation of the repository is something to discuss, I think that it is a red herring at the moment. The issues seems more to do with various use-cases being difficult to represent in the current style of model naming and the difficulty of reflecting someone's local filesystem workflow/layout. I think there is too much of a rush to solve the repository issue quickly based on these idiosyncrasies of the cellml.org model naming problem. Some(!!) considerations: - how is a modelers local workspace organized? e.g. we have talked about the possible need for a manifest file; the possibility of metadata sitting separate from the model itself; etc. Is the idea of a workspace appropriate? Would people have multiple workspaces, say one for each model, or one workspace for all their models, or both? - do people want to use a single central repository? Or should they be able to work independently in their own instance of a repository and perhaps at some point transfer their project to another one? - there has been an assumption that the base unit stored in a repository should be a cellml/xml model - why is this? check the reasons why this is believed to be the way it should be. - don't try to figure out the URI scheme right now - even in use cases. The only attention to URI will be the bahviour it might exhibit in the modeling process: for example, you want someone to be able to move from tracking a volatile branch of a model in their imports to a stable one (that's all you have to say, not what the URIs might look like). - don't attach specific technologies to the repository system until the use-case space has been filled out The evolution of the repository is a non-small task (it's actually someone's PhD topic). So once there is a pretty certain idea of what the repository may be, then how does the current system in the plone site sit with respect to this? Are there technologies that take us a step closer that could be weaved into the current product? etc.. What are the priorities for cellml.org? ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
On 6/26/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > Matt wrote: > > I don't understand the purpose of this. > > > > It looks like you are inventing a versioning system to implement from > > scratch. > > > > That's what it looks like, but if you recall the software choices I have > written down I have been considering them, and have been going through the > features they offer that would be of use to us. I am being agnostic to > software right here, as someone would like me to stay away from underlying > pieces at the moment. I am conveying a generic concept of a repository in > the context of CellML model development, outlining possible hints to what may > be best practices. I was meaning to look at these other projects to look at the use-cases they solve. It seems like you are reinventing them from the beginning starting at the lowest level of a database identifier. Look at the problems these systesm are trying to solve, and compare it to CellML models. > > > I don't see how this system would work with someone working on a > > filesystem and not wanting to use a browser - you'd have to invent > > client software for this. > > > > Start by reviewing things like: > > > > subversion > > svk > > darcs > > monotone > > arch > > etc > > > > I did, I already suggested to use Subversion as a possible backend (I also > reviewed how GIT might be a reasonable choice if we proxy remote > repositories), possibly a RDBMS to help with the relationship aspect of > models, and Plone/Zope for the workflow states and presentation front-end via > WWW. And the next line below here says review them in the context of the use-cases. > > > Review them in the context of the use-cases that need to be satisfied. > > > > Include use-cases such as someone working on a complex model that uses > > imports of models in a local space. Include use-cases of someone > > wanting to follow volatile vs non-volatile versions/branches, etc. > > > > If a model builder develops their models in their local space they could > import items from within their projects via relative paths (no different than > working locally on their storage device). If they rely on other models they > could import a specific frozen version of a model, or the development > version, from the repository. Volatile versions are provided for anyone who > need it. So write them out as examples, i.e. what their local filesystem looks like for that particular project, what the import URIs look like, and the commands they use to manipulate versions/branches/etc of this. What does it mean to want to follow a volatile version ? - what are the implications of this ? - do changesets make sense in such contexts? > > > Include the environments from which you expect this versioning system > > to work (e.g. commands on a filesystem, webdav, etc). > > > > If it's subversion someone could do a svn ci or use their GUI clients to > update models. They could also update via WWW. Not the actual commands ... just the environments/contexts that are important. For example, is it important for someone to be able to start a project on a local filesystem and add it to the repository - what conditions are necessary for various roles to add something to the repository. > > > What are the kinds of relationships between permissions and roles. I > > know you have some ideas here, but it's not very replete and perhaps > > needs to be put in a table. > > > > They will be put on the table. In a table. Something readable. > > > I think aliases in for web URIs are the least of the problems at the moment. > > > > On 6/26/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > >> Hi, > >> > >> I thought Andrew's ideas here is worth expanding, and I wrote a page based > >> on that. > >> > >> http://www.cellml.org/Members/tommy/BaseRepository > >> > >> Cheers, > >> Tommy. > >> > >> > >> > >> Andrew Miller wrote: > >>> Matt wrote: > > - Version/Variant > > It already clogged up the system. There is no proper revision control > > mechanism, what we have now is an ad-hoc emulated system. > > > I don't think it has clogged the system I just think it has been > improperly used both by authors and by the user interface. This is no > fault of the authors, there is simply a specification for versioning > that is missing. The hope is that subversion applies well to this. > > >>> I think that the versioning system itself is the root of the problem, > >>> because it is simultaneously too complicated and too limited. > >>> > >>> In particular: > >>> Branching is inherently a hierarchical process with arbitrary depth, in > >>> the sense that branches can be made from branches to an arbitrary depth. > >>> However, the variant / version system does not really provide the proper > >>> tools to deal with this, because it is limited to two levels (variant > >>> and version) before its utility in tracking what is a derivative of what > >>> is exhausted. > >>> > >>> It is
Re: [cellml-discussion] Concerning the CellML Model Repository
Matt wrote: > I don't understand the purpose of this. > > It looks like you are inventing a versioning system to implement from scratch. > That's what it looks like, but if you recall the software choices I have written down I have been considering them, and have been going through the features they offer that would be of use to us. I am being agnostic to software right here, as someone would like me to stay away from underlying pieces at the moment. I am conveying a generic concept of a repository in the context of CellML model development, outlining possible hints to what may be best practices. > I don't see how this system would work with someone working on a > filesystem and not wanting to use a browser - you'd have to invent > client software for this. > > Start by reviewing things like: > > subversion > svk > darcs > monotone > arch > etc > I did, I already suggested to use Subversion as a possible backend (I also reviewed how GIT might be a reasonable choice if we proxy remote repositories), possibly a RDBMS to help with the relationship aspect of models, and Plone/Zope for the workflow states and presentation front-end via WWW. > Review them in the context of the use-cases that need to be satisfied. > > Include use-cases such as someone working on a complex model that uses > imports of models in a local space. Include use-cases of someone > wanting to follow volatile vs non-volatile versions/branches, etc. > If a model builder develops their models in their local space they could import items from within their projects via relative paths (no different than working locally on their storage device). If they rely on other models they could import a specific frozen version of a model, or the development version, from the repository. Volatile versions are provided for anyone who need it. > Include the environments from which you expect this versioning system > to work (e.g. commands on a filesystem, webdav, etc). > If it's subversion someone could do a svn ci or use their GUI clients to update models. They could also update via WWW. > What are the kinds of relationships between permissions and roles. I > know you have some ideas here, but it's not very replete and perhaps > needs to be put in a table. > They will be put on the table. > I think aliases in for web URIs are the least of the problems at the moment. > > On 6/26/07, Tommy Yu <[EMAIL PROTECTED]> wrote: >> Hi, >> >> I thought Andrew's ideas here is worth expanding, and I wrote a page based >> on that. >> >> http://www.cellml.org/Members/tommy/BaseRepository >> >> Cheers, >> Tommy. >> >> >> >> Andrew Miller wrote: >>> Matt wrote: > - Version/Variant > It already clogged up the system. There is no proper revision control > mechanism, what we have now is an ad-hoc emulated system. > I don't think it has clogged the system I just think it has been improperly used both by authors and by the user interface. This is no fault of the authors, there is simply a specification for versioning that is missing. The hope is that subversion applies well to this. >>> I think that the versioning system itself is the root of the problem, >>> because it is simultaneously too complicated and too limited. >>> >>> In particular: >>> Branching is inherently a hierarchical process with arbitrary depth, in >>> the sense that branches can be made from branches to an arbitrary depth. >>> However, the variant / version system does not really provide the proper >>> tools to deal with this, because it is limited to two levels (variant >>> and version) before its utility in tracking what is a derivative of what >>> is exhausted. >>> >>> It is also inadequate because a new model might combine parts of other >>> models, especially if it is a 1.1 model, and these parts need to be >>> tracked individually. >>> >>> I think that the solution is to simplify down to a single global version >>> number that is common across the repository or the model (like in >>> Subversion), and then let either the CellML metadata, or perhaps the >>> Subversion copy history, describe the way a model has been derived. >>> >>> I see the following workflow as being both simpler and more general... >>> >>> John Doe creates a new model directory which has its primary URL at: >>> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ >>> >>> John now owns this model and is the only one who can change it. John >>> also gets to decide the visibility of different revisions of the model. >>> >>> John makes several revisions to the model (each of which bumps the >>> global revision number). There is a URL by which each historic version >>> can be referred to. >>> >>> John then publishes the model in a journal, referring to it by the >>> primary URL (or perhaps a short-form if we want to offer authors the >>> option of assigning one). After the paper is accepted by a peer-reviewed >>> journal, John updates the metada
Re: [cellml-discussion] Concerning the CellML Model Repository
I don't understand the purpose of this. It looks like you are inventing a versioning system to implement from scratch. I don't see how this system would work with someone working on a filesystem and not wanting to use a browser - you'd have to invent client software for this. Start by reviewing things like: subversion svk darcs monotone arch etc Review them in the context of the use-cases that need to be satisfied. Include use-cases such as someone working on a complex model that uses imports of models in a local space. Include use-cases of someone wanting to follow volatile vs non-volatile versions/branches, etc. Include the environments from which you expect this versioning system to work (e.g. commands on a filesystem, webdav, etc). What are the kinds of relationships between permissions and roles. I know you have some ideas here, but it's not very replete and perhaps needs to be put in a table. I think aliases in for web URIs are the least of the problems at the moment. On 6/26/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > Hi, > > I thought Andrew's ideas here is worth expanding, and I wrote a page based on > that. > > http://www.cellml.org/Members/tommy/BaseRepository > > Cheers, > Tommy. > > > > Andrew Miller wrote: > > Matt wrote: > >>> - Version/Variant > >>> It already clogged up the system. There is no proper revision control > >>> mechanism, what we have now is an ad-hoc emulated system. > >>> > >> I don't think it has clogged the system I just think it has been > >> improperly used both by authors and by the user interface. This is no > >> fault of the authors, there is simply a specification for versioning > >> that is missing. The hope is that subversion applies well to this. > >> > > I think that the versioning system itself is the root of the problem, > > because it is simultaneously too complicated and too limited. > > > > In particular: > > Branching is inherently a hierarchical process with arbitrary depth, in > > the sense that branches can be made from branches to an arbitrary depth. > > However, the variant / version system does not really provide the proper > > tools to deal with this, because it is limited to two levels (variant > > and version) before its utility in tracking what is a derivative of what > > is exhausted. > > > > It is also inadequate because a new model might combine parts of other > > models, especially if it is a 1.1 model, and these parts need to be > > tracked individually. > > > > I think that the solution is to simplify down to a single global version > > number that is common across the repository or the model (like in > > Subversion), and then let either the CellML metadata, or perhaps the > > Subversion copy history, describe the way a model has been derived. > > > > I see the following workflow as being both simpler and more general... > > > > John Doe creates a new model directory which has its primary URL at: > > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ > > > > John now owns this model and is the only one who can change it. John > > also gets to decide the visibility of different revisions of the model. > > > > John makes several revisions to the model (each of which bumps the > > global revision number). There is a URL by which each historic version > > can be referred to. > > > > John then publishes the model in a journal, referring to it by the > > primary URL (or perhaps a short-form if we want to offer authors the > > option of assigning one). After the paper is accepted by a peer-reviewed > > journal, John updates the metadata on the model. When he commits these > > changes, the repository sees this and creates a new alias, e.g. at: > > http://www.cellml.org/models/citation/doe_2007_1/ > > > > John makes some further changes to his model post-publication and > > commits them. However, by some mechanism (perhaps by the change > > metadata?) the repository knows that this is a change which occurred > > post-publication by John. > > > > Mary notices that there was a discrepancy between the model and John's > > published paper (assuming that he didn't reference the CellML model in > > the paper). She creates a new primary URL containing a copy of John's > > published model, at: > > http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/ > > She gets John to check this. When John agrees, she updates the metadata > > on her model to indicate that her version is a more correct version of > > John's paper. The repository then updates so that > > http://www.cellml.org/models/citation/doe_2007_1/ is a reference to > > John's fixed version. > > > > John merges in Mary's changes to > > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ > > and continues working on more changes. He starts collaborating with > > Mary, so he grants her write access to > > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/. > > > > Ming wants to create a derivative of John's paper, so he cre
Re: [cellml-discussion] Concerning the CellML Model Repository
David Nickerson wrote: > Hi Tommy, > > That looks good - its all starting to make sense to me now. > > I'm just wondering how your system would handle a case where two authors > independently encode the same published model. The first author to > upload their encoding would get "ownership" of the publication alias (if > I have the terminology right). Is there any way for the second author to > get a similar alias to their encoding of the model? This is starting to > sound like a version/variant theme, but its probably a situation that > will crop up quite frequently... I guess it depends on how those two model creators work. If John and Mary work independently and two different models describing the same item were created, each will need have separate project directories. If John did get the publication alias set up first it would obviously point to his model, but now Mary comes along and wants to have a separate model up also. What could happen is this: 1) Publication alias is no longer an alias, but a directory holding aliases to users' models. 2) New model directory is created. John and Mary's model directory is copied into there. While outcome is similar, 1) separates publications from models a lot more, may reflect this situation when a paper with multiple models with each created by different people: John, Mary and Ming creates on models a, b, c based on Doe's paper. All three gets approved, and repos://publication/doe_2007_1/ is created containing repos://publication/doe_2007_1/pathway_a -> repos://!rev/45/home/john/a repos://publication/doe_2007_1/pathway_b -> repos://!rev/60/home/mary/b repos://publication/doe_2007_1/pathway_c -> repos://!rev/54/home/ming/c created by their respective creators. Each published model is treated differently, note their revision numbers. 2) has the benefit of encouraging model creators to work together, groups the same models in one place, and may reflect this situation: A publication that describes multiple models with different people coding up each one could have a shared UUID named workspace, owned by the people working on it, with each separate models in its own directory. The publication alias could be owned by the whole group that worked on the model. I just flushed this out of my head, both of these suggestions may have very interesting consequences that is not noted here. This was a very good question. > > This is a slightly different example from your example workflow and > could be viewed as John and Mary both having "valid and correct" but > different encodings of the doe_2007_1 paper. Actually, I just saw the > '_1' on the publication link - is that some kind of version/variant that > would be _2 for Mary in my example? I had been assuming the 2007_1 meant > January 2007. > It could conceivably mean the first paper John Doe published in 2007, or January, as that haven't been decided yet. Thanks, Tommy. > > Thanks, > Andre. > > Tommy Yu wrote: >> Hi, >> >> I thought Andrew's ideas here is worth expanding, and I wrote a page based >> on that. >> >> http://www.cellml.org/Members/tommy/BaseRepository >> >> Cheers, >> Tommy. > > ___ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
Hi Tommy, That looks good - its all starting to make sense to me now. I'm just wondering how your system would handle a case where two authors independently encode the same published model. The first author to upload their encoding would get "ownership" of the publication alias (if I have the terminology right). Is there any way for the second author to get a similar alias to their encoding of the model? This is starting to sound like a version/variant theme, but its probably a situation that will crop up quite frequently... This is a slightly different example from your example workflow and could be viewed as John and Mary both having "valid and correct" but different encodings of the doe_2007_1 paper. Actually, I just saw the '_1' on the publication link - is that some kind of version/variant that would be _2 for Mary in my example? I had been assuming the 2007_1 meant January 2007. Thanks, Andre. Tommy Yu wrote: > Hi, > > I thought Andrew's ideas here is worth expanding, and I wrote a page based on > that. > > http://www.cellml.org/Members/tommy/BaseRepository > > Cheers, > Tommy. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
Hi, I thought Andrew's ideas here is worth expanding, and I wrote a page based on that. http://www.cellml.org/Members/tommy/BaseRepository Cheers, Tommy. Andrew Miller wrote: > Matt wrote: >>> - Version/Variant >>> It already clogged up the system. There is no proper revision control >>> mechanism, what we have now is an ad-hoc emulated system. >>> >> I don't think it has clogged the system I just think it has been >> improperly used both by authors and by the user interface. This is no >> fault of the authors, there is simply a specification for versioning >> that is missing. The hope is that subversion applies well to this. >> > I think that the versioning system itself is the root of the problem, > because it is simultaneously too complicated and too limited. > > In particular: > Branching is inherently a hierarchical process with arbitrary depth, in > the sense that branches can be made from branches to an arbitrary depth. > However, the variant / version system does not really provide the proper > tools to deal with this, because it is limited to two levels (variant > and version) before its utility in tracking what is a derivative of what > is exhausted. > > It is also inadequate because a new model might combine parts of other > models, especially if it is a 1.1 model, and these parts need to be > tracked individually. > > I think that the solution is to simplify down to a single global version > number that is common across the repository or the model (like in > Subversion), and then let either the CellML metadata, or perhaps the > Subversion copy history, describe the way a model has been derived. > > I see the following workflow as being both simpler and more general... > > John Doe creates a new model directory which has its primary URL at: > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ > > John now owns this model and is the only one who can change it. John > also gets to decide the visibility of different revisions of the model. > > John makes several revisions to the model (each of which bumps the > global revision number). There is a URL by which each historic version > can be referred to. > > John then publishes the model in a journal, referring to it by the > primary URL (or perhaps a short-form if we want to offer authors the > option of assigning one). After the paper is accepted by a peer-reviewed > journal, John updates the metadata on the model. When he commits these > changes, the repository sees this and creates a new alias, e.g. at: > http://www.cellml.org/models/citation/doe_2007_1/ > > John makes some further changes to his model post-publication and > commits them. However, by some mechanism (perhaps by the change > metadata?) the repository knows that this is a change which occurred > post-publication by John. > > Mary notices that there was a discrepancy between the model and John's > published paper (assuming that he didn't reference the CellML model in > the paper). She creates a new primary URL containing a copy of John's > published model, at: > http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/ > She gets John to check this. When John agrees, she updates the metadata > on her model to indicate that her version is a more correct version of > John's paper. The repository then updates so that > http://www.cellml.org/models/citation/doe_2007_1/ is a reference to > John's fixed version. > > John merges in Mary's changes to > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ > and continues working on more changes. He starts collaborating with > Mary, so he grants her write access to > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/. > > Ming wants to create a derivative of John's paper, so he creates a copy > of the revision referenced from > http://www.cellml.org/models/citation/doe_2007_1/ at > http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/ > and starts working on it (marking up the history in the model metadata). > > As you can see, instead of having a confusing mix of variants and > versions (with versions of variants of versions of variants), having a > single revision forces us to look at the metadata instead, which then is > sufficiently general not to have the problems we have seen. > >>> - It's CellML Code, right? >>> Why not put code in a real code management system, like Subversion? >>> >> Subversion works well for filesystems of code and text data and to >> some extent binary data that we don't really need to query the >> contents of. If this applies well for CellML modelling, then >> subversion is probably a good match. Subversion will bring its own >> complexities when we are dealing with applying security to file >> objects, > It depends whether or not we actually allow direct access to Subversion > by untrusted users. > A simple approach would be to make everyone go through the f
Re: [cellml-discussion] Concerning the CellML Model Repository
On 6/25/07, James Lawson <[EMAIL PROTECTED]> wrote: > What is 'agnositic' and what does it mean here? It's one of those english terms that is being hijacked in some IT circles. E.g. platform agnostic : http://www.e-consultancy.com/knowledge/glossary/20312/platform-agnostic.html I quite like it because in my mind it means to not be religously bound to any particular technical boundary - such as a particular programming language, operating systems etc. Here it means to look at the problem at hand and ignore CellML in your thought space. > > > My inclination is that an implementation using subversion plus some > > subversion hooks will be ok, but we haven't worked out details or done > > any proof of concept for this - which should be agnositic to cellml > > and focussed on how to apply zope+cmf security and workflows to data > > objects stored in subversion repositories. > > > ___ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion > ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
Matt wrote: > I think BioModels had every intention of harbouring CellML models too > if we were interested. I'm sure they are listening :-) they already do > > > > On 6/22/07, David Nickerson <[EMAIL PROTECTED]> wrote: >> It might also be worth looking into what the folks over at >> http://www.biomodels.net/ are up to. Given they seem to have curation >> built into their repository and maybe some other features worth looking >> into? >> >> And if we're going to be starting from scratch, there might be some >> value into seeing how the biomodels repository could be extended to >> support CellML? >> >> When you start seeing comments like "BioModels Database ranked first >> data resource for Systems Biology" in Nature Biotechnology, it might be >> a hint that they're doing something right and we should maybe be working >> with them rather than independently. >> >> >> David. >> >> >> Tommy Yu wrote: >>> Hi, >>> >>> I have written down some of my thoughts on how the model repository could >>> be put together. >>> >>> http://www.cellml.org/Members/tommy/repository_redesign.html >>> >>> It is still a pretty rough document. The usage example section gives a >>> rough outline on what I see people might be doing with the repository and >>> how this design could address those issues, which I think it will be of >>> interest to users. It is not an exhaustive list, yet. >>> >>> I must also note the design outlined is quite a drastic departure from what >>> we have now (it will be yet another new repository). However, it is more >>> true to the one envisioned before according to >>> http://www.cellml.org/wiki/CellMLModelRepositories, except I have an >>> addition layer that will assist in pulling content and drawing >>> relationships between models. >>> >>> Feel free to take it apart and/or build on top of it. >>> >>> Cheers, >>> Tommy. >>> ___ >>> cellml-discussion mailing list >>> cellml-discussion@cellml.org >>> http://www.cellml.org/mailman/listinfo/cellml-discussion >> -- >> David Nickerson, PhD >> Research Fellow >> Division of Bioengineering >> Faculty of Engineering >> National University of Singapore >> Email: [EMAIL PROTECTED] >> ___ >> cellml-discussion mailing list >> cellml-discussion@cellml.org >> http://www.cellml.org/mailman/listinfo/cellml-discussion >> > ___ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
What is 'agnositic' and what does it mean here? > My inclination is that an implementation using subversion plus some > subversion hooks will be ok, but we haven't worked out details or done > any proof of concept for this - which should be agnositic to cellml > and focussed on how to apply zope+cmf security and workflows to data > objects stored in subversion repositories. > ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository | version/variant metadata?
Hmm scrap that, haven't read the other 'Concerning the CellML repository' emails yet. Tommy just said that variants are getting scrapped... James Lawson wrote: > My $0.02 on this is (please forgive me if I get some of the technical > stuff mixed up): > > The current naming scheme is as it translates to the web address is: > author(s)_date_versionXX_variantXX > > I think it should be author(s)_date_variantXX_versionXX instead, since > IMO, one should be thinking in terms of versions of variants, rather > than variants of versions. > > Also, I think that if there were perhaps some metadata that could > pertain to what version and variant a cellml file is, and also some > 'sub'metadata under variant to say what the variant represents, whether > it's a particular cell type or what. > > I realise that metadata isn't supposed to be added to a model for the > sake of a repository or for any non-generalised purpose, but I think > that version/variant metadata would be useful. > E.g. for 1.1 models, a simulator could pick this metadata up. So you > could bring up window in which the software could tell you that, for > example, you are embedding this version of this markov model of an > L-type Ca++ channel, by such and such et al., into a variant 02 - > "epicardial cell" Pandit et al. cardiac cell model, etc. etc. > Another example would be working with CellML 1.1 models in an era where > we have a library of components that people can use. We might have a > GPCR component which has a large number of variants, and it would be > crucial for the simulation/editing programs like PCEnv to know, and be > able to tell the user, which version and variant of each component they > are using. People might want to swap in different variants to see how if > affects their model etc. > > And of course this version/variant metadata would obviously be highly > useful (IMO) for the repository. Maybe subversion could automatically > write this metadata. > > What I'm really trying to say is that I think there is justification > for version/variant information to be stored in metadata as well as the > URI naming scheme, since, unless I'm missing something, there is useful > information (both for repositories and simulator software) that can't be > stored in the URI. > > James > >>> - Version/Variant >>> It already clogged up the system. There is no proper revision control >>> mechanism, what we have now is an ad-hoc emulated system. >> I don't think it has clogged the system I just think it has been >> improperly used both by authors and by the user interface. > > Ideally the users and authors shouldn't be presented the option to make > mistakes like this, should they? Most people, I would imagine, don't > care about the versions of a model unless they are actually working > on/with it. > > > This is no >> fault of the authors, there is simply a specification for versioning >> that is missing. The hope is that subversion applies well to this. > > > ___ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository | version/variant metadata?
My $0.02 on this is (please forgive me if I get some of the technical stuff mixed up): The current naming scheme is as it translates to the web address is: author(s)_date_versionXX_variantXX I think it should be author(s)_date_variantXX_versionXX instead, since IMO, one should be thinking in terms of versions of variants, rather than variants of versions. Also, I think that if there were perhaps some metadata that could pertain to what version and variant a cellml file is, and also some 'sub'metadata under variant to say what the variant represents, whether it's a particular cell type or what. I realise that metadata isn't supposed to be added to a model for the sake of a repository or for any non-generalised purpose, but I think that version/variant metadata would be useful. E.g. for 1.1 models, a simulator could pick this metadata up. So you could bring up window in which the software could tell you that, for example, you are embedding this version of this markov model of an L-type Ca++ channel, by such and such et al., into a variant 02 - "epicardial cell" Pandit et al. cardiac cell model, etc. etc. Another example would be working with CellML 1.1 models in an era where we have a library of components that people can use. We might have a GPCR component which has a large number of variants, and it would be crucial for the simulation/editing programs like PCEnv to know, and be able to tell the user, which version and variant of each component they are using. People might want to swap in different variants to see how if affects their model etc. And of course this version/variant metadata would obviously be highly useful (IMO) for the repository. Maybe subversion could automatically write this metadata. What I'm really trying to say is that I think there is justification for version/variant information to be stored in metadata as well as the URI naming scheme, since, unless I'm missing something, there is useful information (both for repositories and simulator software) that can't be stored in the URI. James > >> - Version/Variant >> It already clogged up the system. There is no proper revision control >> mechanism, what we have now is an ad-hoc emulated system. > > I don't think it has clogged the system I just think it has been > improperly used both by authors and by the user interface. Ideally the users and authors shouldn't be presented the option to make mistakes like this, should they? Most people, I would imagine, don't care about the versions of a model unless they are actually working on/with it. This is no > fault of the authors, there is simply a specification for versioning > that is missing. The hope is that subversion applies well to this. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
I think BioModels had every intention of harbouring CellML models too if we were interested. I'm sure they are listening :-) On 6/22/07, David Nickerson <[EMAIL PROTECTED]> wrote: > It might also be worth looking into what the folks over at > http://www.biomodels.net/ are up to. Given they seem to have curation > built into their repository and maybe some other features worth looking > into? > > And if we're going to be starting from scratch, there might be some > value into seeing how the biomodels repository could be extended to > support CellML? > > When you start seeing comments like "BioModels Database ranked first > data resource for Systems Biology" in Nature Biotechnology, it might be > a hint that they're doing something right and we should maybe be working > with them rather than independently. > > > David. > > > Tommy Yu wrote: > > Hi, > > > > I have written down some of my thoughts on how the model repository could > > be put together. > > > > http://www.cellml.org/Members/tommy/repository_redesign.html > > > > It is still a pretty rough document. The usage example section gives a > > rough outline on what I see people might be doing with the repository and > > how this design could address those issues, which I think it will be of > > interest to users. It is not an exhaustive list, yet. > > > > I must also note the design outlined is quite a drastic departure from what > > we have now (it will be yet another new repository). However, it is more > > true to the one envisioned before according to > > http://www.cellml.org/wiki/CellMLModelRepositories, except I have an > > addition layer that will assist in pulling content and drawing > > relationships between models. > > > > Feel free to take it apart and/or build on top of it. > > > > Cheers, > > Tommy. > > ___ > > cellml-discussion mailing list > > cellml-discussion@cellml.org > > http://www.cellml.org/mailman/listinfo/cellml-discussion > > -- > David Nickerson, PhD > Research Fellow > Division of Bioengineering > Faculty of Engineering > National University of Singapore > Email: [EMAIL PROTECTED] > ___ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion > ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
On 6/22/07, Andrew Miller <[EMAIL PROTECTED]> wrote: > Matt wrote: > >> - Version/Variant > >> It already clogged up the system. There is no proper revision control > >> mechanism, what we have now is an ad-hoc emulated system. > >> > > > > I don't think it has clogged the system I just think it has been > > improperly used both by authors and by the user interface. This is no > > fault of the authors, there is simply a specification for versioning > > that is missing. The hope is that subversion applies well to this. > > > I think that the versioning system itself is the root of the problem, > because it is simultaneously too complicated and too limited. > > In particular: > Branching is inherently a hierarchical process with arbitrary depth, in > the sense that branches can be made from branches to an arbitrary depth. > However, the variant / version system does not really provide the proper > tools to deal with this, because it is limited to two levels (variant > and version) before its utility in tracking what is a derivative of what > is exhausted. > > It is also inadequate because a new model might combine parts of other > models, especially if it is a 1.1 model, and these parts need to be > tracked individually. > > I think that the solution is to simplify down to a single global version > number that is common across the repository or the model (like in > Subversion), and then let either the CellML metadata, or perhaps the > Subversion copy history, describe the way a model has been derived. Sure, so disregarding variants for now, there is nothing stopping this being implemented with the current versioning/naming convention. There's just no specification for proper use. However I think changesets (as well as global versions) apply well to the notion of a workspace, but I'm not certain about the common practice of trunk/branch roots as applied to cellml - perhaps the best practice would be that every workspace would be the trunk root. > > I see the following workflow as being both simpler and more general... > > John Doe creates a new model directory which has its primary URL at: > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ > > John now owns this model and is the only one who can change it. John > also gets to decide the visibility of different revisions of the model. This is change the model, or the model + metadata? > > John makes several revisions to the model (each of which bumps the > global revision number). There is a URL by which each historic version > can be referred to. > > John then publishes the model in a journal, referring to it by the > primary URL (or perhaps a short-form if we want to offer authors the > option of assigning one). After the paper is accepted by a peer-reviewed > journal, John updates the metadata on the model. When he commits these > changes, the repository sees this and creates a new alias, e.g. at: > http://www.cellml.org/models/citation/doe_2007_1/ > > John makes some further changes to his model post-publication and > commits them. However, by some mechanism (perhaps by the change > metadata?) the repository knows that this is a change which occurred > post-publication by John. > > Mary notices that there was a discrepancy between the model and John's > published paper (assuming that he didn't reference the CellML model in > the paper). She creates a new primary URL containing a copy of John's > published model, at: > http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/ > She gets John to check this. When John agrees, she updates the metadata > on her model to indicate that her version is a more correct version of > John's paper. The repository then updates so that > http://www.cellml.org/models/citation/doe_2007_1/ is a reference to > John's fixed version. > > John merges in Mary's changes to > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ > and continues working on more changes. He starts collaborating with > Mary, so he grants her write access to > http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/. > > Ming wants to create a derivative of John's paper, so he creates a copy > of the revision referenced from > http://www.cellml.org/models/citation/doe_2007_1/ at > http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/ > and starts working on it (marking up the history in the model metadata). > > As you can see, instead of having a confusing mix of variants and > versions (with versions of variants of versions of variants), having a > single revision forces us to look at the metadata instead, which then is > sufficiently general not to have the problems we have seen. Yep, I reckon variants didn't work out at all and the metadata is a better place for this information. > > >> - It's CellML Code, right? > >> Why not put code in a real code management system, like Subversion? > >> > > > > Subversion works well for filesystems of code and text data and to >
Re: [cellml-discussion] Concerning the CellML Model Repository
It might also be worth looking into what the folks over at http://www.biomodels.net/ are up to. Given they seem to have curation built into their repository and maybe some other features worth looking into? And if we're going to be starting from scratch, there might be some value into seeing how the biomodels repository could be extended to support CellML? When you start seeing comments like "BioModels Database ranked first data resource for Systems Biology" in Nature Biotechnology, it might be a hint that they're doing something right and we should maybe be working with them rather than independently. David. Tommy Yu wrote: > Hi, > > I have written down some of my thoughts on how the model repository could be > put together. > > http://www.cellml.org/Members/tommy/repository_redesign.html > > It is still a pretty rough document. The usage example section gives a rough > outline on what I see people might be doing with the repository and how this > design could address those issues, which I think it will be of interest to > users. It is not an exhaustive list, yet. > > I must also note the design outlined is quite a drastic departure from what > we have now (it will be yet another new repository). However, it is more > true to the one envisioned before according to > http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition > layer that will assist in pulling content and drawing relationships between > models. > > Feel free to take it apart and/or build on top of it. > > Cheers, > Tommy. > ___ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion -- David Nickerson, PhD Research Fellow Division of Bioengineering Faculty of Engineering National University of Singapore Email: [EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
> Do we really want to proxy remote repositories? Can we start smaller for now > but keep that in mind? I think this will be an essential feature of the model repository as we move forward. We are trying to present model authors with a common platform for the distribution and archiving of their models as they go through development and publication cycles. At some point we are going to want to provide some assurances to the community in terms of repository accessibility - things like uptime, backups, redundancy, etc. There is also a big question mark over the implications of the current geographical location of the model repository. For example, how will access scale when you start having tens, if not hundreds, of users around the world interacting with the repository on a regular basis? If things run too slow or access is a problem then people simply won't use the system. So while it makes sense to start out with less grand plans, I think any plan on moving the repository forward has to take these issues into account and discuss how they would be addressed in any future repository implementation. David. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
On 6/22/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > Matt wrote: > > Hi Tommy, > > > > Can you continue to update/fill out your document as well as begin > > associated proposals with information contained in the replies people > > are submitting. The goal of this process is a scoping document with > > associated content. > > > > It will be done when I am done refining all my thoughts about the threads > here, along with the other thoughts I already have but not written down there. > > > More comments below. > > > > Likewise. > > > On 6/22/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > >> Matt wrote: > >>> Hi Tommy, > >>> > >>> I found the document seemed to be too far ahead of itself. I also > >>> didn't find any of the pros and cons very compelling because they > >>> don't address specific problems and those problems are not described. > >>> > >>> 1) What are you actually trying to achieve? It would be useful to > >>> describe the parts of the current system that are giving you grief and > >>> look to give you more grief based on the use cases and any axes of > >>> scale. > >>> > >> Starting with what I envisioned. > >> > >> Who is the repository catered for? > >> 1) People who would like to work on models, using it as a place to store > >> work-in-progress models. > >> 2) Reviewers to review models. > >> 3) Website users to browse models. > >> > >> 1) What do the model builders want? > >> - Their own workspace (home directory) > >> - A place to let reviewers review their models > >> - Also to publish their models > >> > >> First point is not addressed by what we have now. Second and third point > >> is quite ad-hoc. Also, version control is very ad-hoc right now. > > > > Each of these points need to be filled out, e.g. what does it mean to > > have a workspace for a CellML modeller?, What are the scenarios and > > workflows for reviewers of CellML models? > > > > Workspace is like a home directory. Or are you comfortable with a flat > filesystem where each file is owned by different people are all over the > place. This is about organization according to what the model builders want. I'm more comfortable with the latter; but exactly what that looks like is difficult to know or perhaps ever to predict. Some work on a manifest description and a best practice/hint would be a good start. > > Models are by default private to the owner, but s/he can expose it via the > layer that binds subversion and the database together which manages > permissions. Try and stay away from specific underlying pieces at the moment. They key is the description of the workflow states, transitions, and actions. > Other modelers could import their collegues' models (provided permissions are > given). CellML import element kind of imports? > > Reviewers simply gets access to a model, a URI to a specific revision of a > model (and associated files, at model builder's options) will be generated > which s/he could use. If reviewer has rights s/he can publish the model to > the public. This should probably be model workspace, the concept of a single model is a bit vague at the moment unless we define some rule that there will always be a single top level model. I presume where we are heading is that TTW, people will be accessing an index.html that processes a manifest file and creates a pretty view of the workspace. > > >> 2,3) Reviewers and website users > >> - A centralized location to browse models. > >> - They would like to see how models may relate to each other. > >> > > > > How do models relate to each other? Relations between models come from > > all sorts of data within models, and within any associated metadata > > (so more than just our current cellml metadata specification). It > > would be useful to write out the details of the relationships that are > > important here as these pretty much form the basis of many of the > > queries that will need to be performed. > > It will be done. I can see users wanting to know which component of a model > was imported by other models, and finding all other dependency of a > particular model. More will come. > > > > >> First point is already addressed, but second point is definitely not > >> possible as the current repository does not support 1.1. > > > > Why does it not support CellML 1.1? i.e. what is the technology block > > here to extending the current system to support it? > > > > None, aside from the lack of a proper code versioning system in the backend. > With a few changes to the copy/paste code, CellML 1.1 will then be able to be > stored into the repository. I could go ahead and do this, but it will only > further compound the issues we have now. Okay, fine, refactor Model.py and > have new classes inherit from that, but we still lack certain key features, > such as a proper versioning backend. So lack of support for CellML 1.1 is not a reason to rebuild the system, but implementing CellML 1.1 support means pressure on other ugly bits like deali
Re: [cellml-discussion] Concerning the CellML Model Repository
Matt wrote: >> - Version/Variant >> It already clogged up the system. There is no proper revision control >> mechanism, what we have now is an ad-hoc emulated system. >> > > I don't think it has clogged the system I just think it has been > improperly used both by authors and by the user interface. This is no > fault of the authors, there is simply a specification for versioning > that is missing. The hope is that subversion applies well to this. > I think that the versioning system itself is the root of the problem, because it is simultaneously too complicated and too limited. In particular: Branching is inherently a hierarchical process with arbitrary depth, in the sense that branches can be made from branches to an arbitrary depth. However, the variant / version system does not really provide the proper tools to deal with this, because it is limited to two levels (variant and version) before its utility in tracking what is a derivative of what is exhausted. It is also inadequate because a new model might combine parts of other models, especially if it is a 1.1 model, and these parts need to be tracked individually. I think that the solution is to simplify down to a single global version number that is common across the repository or the model (like in Subversion), and then let either the CellML metadata, or perhaps the Subversion copy history, describe the way a model has been derived. I see the following workflow as being both simpler and more general... John Doe creates a new model directory which has its primary URL at: http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ John now owns this model and is the only one who can change it. John also gets to decide the visibility of different revisions of the model. John makes several revisions to the model (each of which bumps the global revision number). There is a URL by which each historic version can be referred to. John then publishes the model in a journal, referring to it by the primary URL (or perhaps a short-form if we want to offer authors the option of assigning one). After the paper is accepted by a peer-reviewed journal, John updates the metadata on the model. When he commits these changes, the repository sees this and creates a new alias, e.g. at: http://www.cellml.org/models/citation/doe_2007_1/ John makes some further changes to his model post-publication and commits them. However, by some mechanism (perhaps by the change metadata?) the repository knows that this is a change which occurred post-publication by John. Mary notices that there was a discrepancy between the model and John's published paper (assuming that he didn't reference the CellML model in the paper). She creates a new primary URL containing a copy of John's published model, at: http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/ She gets John to check this. When John agrees, she updates the metadata on her model to indicate that her version is a more correct version of John's paper. The repository then updates so that http://www.cellml.org/models/citation/doe_2007_1/ is a reference to John's fixed version. John merges in Mary's changes to http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/ and continues working on more changes. He starts collaborating with Mary, so he grants her write access to http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/. Ming wants to create a derivative of John's paper, so he creates a copy of the revision referenced from http://www.cellml.org/models/citation/doe_2007_1/ at http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/ and starts working on it (marking up the history in the model metadata). As you can see, instead of having a confusing mix of variants and versions (with versions of variants of versions of variants), having a single revision forces us to look at the metadata instead, which then is sufficiently general not to have the problems we have seen. >> - It's CellML Code, right? >> Why not put code in a real code management system, like Subversion? >> > > Subversion works well for filesystems of code and text data and to > some extent binary data that we don't really need to query the > contents of. If this applies well for CellML modelling, then > subversion is probably a good match. Subversion will bring its own > complexities when we are dealing with applying security to file > objects, It depends whether or not we actually allow direct access to Subversion by untrusted users. A simple approach would be to make everyone go through the front-end (which might even implement enough methods to let Subversion check out from there anyway). > and security/publishing in general will get even more complex > if we are proxying remote repositories - which we talked about a few > weeks ago. > > Generally, I think the concept of cellml modelling being laid out in a > filesystem and sub
Re: [cellml-discussion] Concerning the CellML Model Repository
Matt wrote: > Hi Tommy, > > Can you continue to update/fill out your document as well as begin > associated proposals with information contained in the replies people > are submitting. The goal of this process is a scoping document with > associated content. > It will be done when I am done refining all my thoughts about the threads here, along with the other thoughts I already have but not written down there. > More comments below. > Likewise. > On 6/22/07, Tommy Yu <[EMAIL PROTECTED]> wrote: >> Matt wrote: >>> Hi Tommy, >>> >>> I found the document seemed to be too far ahead of itself. I also >>> didn't find any of the pros and cons very compelling because they >>> don't address specific problems and those problems are not described. >>> >>> 1) What are you actually trying to achieve? It would be useful to >>> describe the parts of the current system that are giving you grief and >>> look to give you more grief based on the use cases and any axes of >>> scale. >>> >> Starting with what I envisioned. >> >> Who is the repository catered for? >> 1) People who would like to work on models, using it as a place to store >> work-in-progress models. >> 2) Reviewers to review models. >> 3) Website users to browse models. >> >> 1) What do the model builders want? >> - Their own workspace (home directory) >> - A place to let reviewers review their models >> - Also to publish their models >> >> First point is not addressed by what we have now. Second and third point is >> quite ad-hoc. Also, version control is very ad-hoc right now. > > Each of these points need to be filled out, e.g. what does it mean to > have a workspace for a CellML modeller?, What are the scenarios and > workflows for reviewers of CellML models? > Workspace is like a home directory. Or are you comfortable with a flat filesystem where each file is owned by different people are all over the place. This is about organization according to what the model builders want. Models are by default private to the owner, but s/he can expose it via the layer that binds subversion and the database together which manages permissions. Other modelers could import their collegues' models (provided permissions are given). Reviewers simply gets access to a model, a URI to a specific revision of a model (and associated files, at model builder's options) will be generated which s/he could use. If reviewer has rights s/he can publish the model to the public. >> 2,3) Reviewers and website users >> - A centralized location to browse models. >> - They would like to see how models may relate to each other. >> > > How do models relate to each other? Relations between models come from > all sorts of data within models, and within any associated metadata > (so more than just our current cellml metadata specification). It > would be useful to write out the details of the relationships that are > important here as these pretty much form the basis of many of the > queries that will need to be performed. It will be done. I can see users wanting to know which component of a model was imported by other models, and finding all other dependency of a particular model. More will come. > >> First point is already addressed, but second point is definitely not >> possible as the current repository does not support 1.1. > > Why does it not support CellML 1.1? i.e. what is the technology block > here to extending the current system to support it? > None, aside from the lack of a proper code versioning system in the backend. With a few changes to the copy/paste code, CellML 1.1 will then be able to be stored into the repository. I could go ahead and do this, but it will only further compound the issues we have now. Okay, fine, refactor Model.py and have new classes inherit from that, but we still lack certain key features, such as a proper versioning backend. Maybe as an experiment I could drop versions/variants and see how feasible it is in implementing certain desired features with just that. However I still think we need a proper storage backend as the foundation, get that right (decided), before moving forward. I don't want to build a mansion with a lack of a solid foundation. >> Issues: >> - Flat file system. >> Sure, using ZCatalog it is possible to emulate users' home directories and >> the like, but it still does not get away from what we have now. > > I don't understand this. What are you aiming for in a home space and > why doesn't the current system support it? > See above. Current system can emulate it, therefore support it, but again, lack of proper versioning backend. >> - Version/Variant >> It already clogged up the system. There is no proper revision control >> mechanism, what we have now is an ad-hoc emulated system. > > I don't think it has clogged the system I just think it has been > improperly used both by authors and by the user interface. This is no > fault of the authors, there is simply a specificatio
Re: [cellml-discussion] Concerning the CellML Model Repository
Hi Tommy, Can you continue to update/fill out your document as well as begin associated proposals with information contained in the replies people are submitting. The goal of this process is a scoping document with associated content. More comments below. On 6/22/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > Matt wrote: > > Hi Tommy, > > > > I found the document seemed to be too far ahead of itself. I also > > didn't find any of the pros and cons very compelling because they > > don't address specific problems and those problems are not described. > > > > 1) What are you actually trying to achieve? It would be useful to > > describe the parts of the current system that are giving you grief and > > look to give you more grief based on the use cases and any axes of > > scale. > > > > Starting with what I envisioned. > > Who is the repository catered for? > 1) People who would like to work on models, using it as a place to store > work-in-progress models. > 2) Reviewers to review models. > 3) Website users to browse models. > > 1) What do the model builders want? > - Their own workspace (home directory) > - A place to let reviewers review their models > - Also to publish their models > > First point is not addressed by what we have now. Second and third point is > quite ad-hoc. Also, version control is very ad-hoc right now. Each of these points need to be filled out, e.g. what does it mean to have a workspace for a CellML modeller?, What are the scenarios and workflows for reviewers of CellML models? > > 2,3) Reviewers and website users > - A centralized location to browse models. > - They would like to see how models may relate to each other. > How do models relate to each other? Relations between models come from all sorts of data within models, and within any associated metadata (so more than just our current cellml metadata specification). It would be useful to write out the details of the relationships that are important here as these pretty much form the basis of many of the queries that will need to be performed. > First point is already addressed, but second point is definitely not possible > as the current repository does not support 1.1. Why does it not support CellML 1.1? i.e. what is the technology block here to extending the current system to support it? > > Issues: > - Flat file system. > Sure, using ZCatalog it is possible to emulate users' home directories and > the like, but it still does not get away from what we have now. I don't understand this. What are you aiming for in a home space and why doesn't the current system support it? > - Version/Variant > It already clogged up the system. There is no proper revision control > mechanism, what we have now is an ad-hoc emulated system. I don't think it has clogged the system I just think it has been improperly used both by authors and by the user interface. This is no fault of the authors, there is simply a specification for versioning that is missing. The hope is that subversion applies well to this. > - It's CellML Code, right? > Why not put code in a real code management system, like Subversion? Subversion works well for filesystems of code and text data and to some extent binary data that we don't really need to query the contents of. If this applies well for CellML modelling, then subversion is probably a good match. Subversion will bring its own complexities when we are dealing with applying security to file objects, and security/publishing in general will get even more complex if we are proxying remote repositories - which we talked about a few weeks ago. Generally, I think the concept of cellml modelling being laid out in a filesystem and subversion versioning concepts applied to it is good, but untested. For instance, take a reasonably complex model of Andre's and work out how it will look on the filesystem and what subversion versioning would result in. While in this thread, I don't believe metadata should be treated any differently to model data. Adding special rules for versioning of some data and not others is going to complicate the versioning process and I can't see any compelling reason to do this. Remember that the subversion system is versioning file objects which will contain both metadata and cellml model data. What is important is how and where metadata is stored. Perhaps metadata should be seperated into its own document sitting next to the model in the filesystem. My inclination is that an implementation using subversion plus some subversion hooks will be ok, but we haven't worked out details or done any proof of concept for this - which should be agnositic to cellml and focussed on how to apply zope+cmf security and workflows to data objects stored in subversion repositories. > - Zope has revision control > Until someone packs the database. Perhaps you should look at http://plone.org/products/plone/roadmap/8 (which is now completed and merged into Plone 3). There are some other add on products -
Re: [cellml-discussion] Concerning the CellML Model Repository
Matt wrote: > Hi Tommy, > > I found the document seemed to be too far ahead of itself. I also > didn't find any of the pros and cons very compelling because they > don't address specific problems and those problems are not described. > > 1) What are you actually trying to achieve? It would be useful to > describe the parts of the current system that are giving you grief and > look to give you more grief based on the use cases and any axes of > scale. > Starting with what I envisioned. Who is the repository catered for? 1) People who would like to work on models, using it as a place to store work-in-progress models. 2) Reviewers to review models. 3) Website users to browse models. 1) What do the model builders want? - Their own workspace (home directory) - A place to let reviewers review their models - Also to publish their models First point is not addressed by what we have now. Second and third point is quite ad-hoc. Also, version control is very ad-hoc right now. 2,3) Reviewers and website users - A centralized location to browse models. - They would like to see how models may relate to each other. First point is already addressed, but second point is definitely not possible as the current repository does not support 1.1. Issues: - Flat file system. Sure, using ZCatalog it is possible to emulate users' home directories and the like, but it still does not get away from what we have now. - Version/Variant It already clogged up the system. There is no proper revision control mechanism, what we have now is an ad-hoc emulated system. - It's CellML Code, right? Why not put code in a real code management system, like Subversion? - Zope has revision control Until someone packs the database. - Zope/Plone is also quite slow. - Code we have now cannot get away from original design flaws. Might as well start from scratch. The major issue is, I cannot see how I can get the current repository to support CellML 1.1 models. Sure, a new archetype can be written, and built with ZCatalog and the like. I still find this method to be an ad-hoc slapped together with semi-mismatching components to get it working, whereas the obvious solution to use a CMS with a database that points to the data would be the much elegant solution (with a front-end written to interface that). Oh, how is it ad-hoc? I still do not have this resolved, but there is no "not" query in ZCatalog. There is a product called 'AdvancedQuery' that address that, but that's more dependency on yet more products to get something simple done. There are more, but I will end it here. > 2) What are the use cases? An initial set should be extracted from the > current site. You have written out some, but they only covered a small > set of function of the site, especially when it comes to relations > between models or workflow and curation states. Feel free to list some specific examples I have omitted like Andre and Andrew did. I do agree it is a small set, but I am starting from the basics and moving up from there. It will get quite complicated. > > I understand some of the details that are causing you pain with the > current implementation, but I think the first part of this is to be > charitable to the current system and adequately describe the two > points above. > > Before rethinking the implementation of this site I think the > following need to also be done: > - a specification for assigning a URI to these models (as would be > used by CellML 1.1 imports) I've outlined a few, but more details to come. > - a specification for how a manifest file is to be constructed, or > some set of rules for interpreting a directory structure of models, > especially in those cases where there are multiple local models used > in imports and we need to point to at least the top level model. > - a suggested solution to the bqs problem. Research existing standards. > I did consider that, and I think OpenURL may suit our needs fairly well. It is already an established standards, it's about citations, got great support by the world (libraries and citation catalogs are using this), seems to have everything bqs describes, and here's the spec: http://www.niso.org/standards/resources/Z39_88_2004.pdf However, it's in XML only, but near the bottom of page 23 of that file, I quote: > - To support new applications, communities could introduce new XML-based > ContextObject Formats constrained by other syntactic constraint languages > (DTD or RELAX NG, for example) or semantic constraint languages (RDFS or OWL, > for example). Nothing is really stopping us from adapting that standard, aside from having to rewrite/regenerate all metadata we have now. > Generally: > > Relational databases are useful, but so are the combination of > ZCatalog and Sets. It really depends on the structure of the data and > the queries you want to perform. You should write out a reasonable set > of these in natural language to get the focus right. Maybe a
Re: [cellml-discussion] Concerning the CellML Model Repository
David Nickerson wrote: > Hi Tommy, > > looks like a good starting point for some discussion. Just to help me > think through some of the issues, is there any chance you could add a > usage example illustrating how this system would deal with a model made > from the combination of a bunch of papers (i.e., a single model where > each component defines a new citation). I'm guessing this would be done > by adding each of the components as separate models and then importing > them into a single model? > It depends on how the model is cited. If the creator of the model that binds all the separate models together based his/her model on a published paper, that citation would be used. If not, it can only reside inside the user's directory as a filename of his choice, that imports the other models. Yes, creator of model would have to import the components. > Another usage example that might be interesting to look at would be a > model author adding a local CellML 1.1 model hierarchy to a remote > repository and how all the import href's are handled in this case (i.e., > imports throughout the model hierarchy might consist of a mix of > relative, http, and file URLs). > The model repository shouldn't be responsible for users importing from file:// and other non-existent URIs. I will create detail use cases for this, but in the case of http URIs, I can think of checking for a pre-approved list of hostnames that models can be imported from. > And another usage example might be the searching for models built using > a specific set of data. It will hopefully become standard practice to > annotate variable values with their source, where the source may be some > data from a different article than the model's publication. > That's using the metadata, right? If the creator of the model does annotate components properly (e.g. giving some comment to cmeta:id of some component of some file) it will be searchable (provided that the creator publishes that model). Thanks for your inputs, Tommy. > > Thanks, > David. > > Tommy Yu wrote: >> Hi, >> >> I have written down some of my thoughts on how the model repository could be >> put together. >> >> http://www.cellml.org/Members/tommy/repository_redesign.html >> >> It is still a pretty rough document. The usage example section gives a >> rough outline on what I see people might be doing with the repository and >> how this design could address those issues, which I think it will be of >> interest to users. It is not an exhaustive list, yet. >> >> I must also note the design outlined is quite a drastic departure from what >> we have now (it will be yet another new repository). However, it is more >> true to the one envisioned before according to >> http://www.cellml.org/wiki/CellMLModelRepositories, except I have an >> addition layer that will assist in pulling content and drawing relationships >> between models. >> >> Feel free to take it apart and/or build on top of it. >> >> Cheers, >> Tommy. >> ___ >> cellml-discussion mailing list >> cellml-discussion@cellml.org >> http://www.cellml.org/mailman/listinfo/cellml-discussion > ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
Hi Andrew, A couple notes: > I don't think it is a bad thing to have a one-way cache of metadata > somewhere for technical / performance reasons (perhaps in a relational > database), but I think that we should replicate data for each model > (perhaps using a deep copy-on-write approach if this is really necessary > to save disk space) rather than changing the metadata for existing > models without changing the version. > > Making changes to metadata require changes to the model will ensure that > no one gets burned by referencing a particular version of a model, only > to find that the metadata in that version has changed on them. > > Your current unversioned, globally shared metadata approach probably > also has security implications. For example, lets say that Alice submits I understood, and I did call for metadata in the RDBMS to be more of a snapshot. Metadata will still be versioned (revision) in the Subversion repository. The publishing of a model to the public could conceivably be done by someone other than the model creator. Also, in the scenario outlined below, you are correct that a paper referenced by PubMed would be treated somewhat differently. If Charlie were to publish a "fake" paper to the repository, it would result in a new references anyway: Alice - Paper title (original) Alice, Charlie - Paper title (fake) There is no way to stop users from entering bad data into the system if they were given "admin" rights. Fortunately Charlie wouldn't have that and so he wouldn't be able to add a new author to Alice's paper, but able to only create a new fake paper that he did not write since he can publish a model. On the other hand, if he decide to use the original publication name to publish his model, then change the reference there, he would still be prevented from doing that, but he has the option to create a new fake reference. Again, no way stopping user from publishing bad data if they were given rights. It is possible to limit where Charlie can publish his paper to (i.e. publishes to reviewers only), and there would be no visible damage. > a model which references a publication. Now suppose that Charlie wasn't > an author of that paper, but he wants to add his name onto the list of > authors. So he submits a completely different, bogus, model which > includes metadata for the publication, and includes his name. When Bob > downloads Alice's model from the repository, it would then include > Charlie's name as one of the authors (assuming that the publication was > referenced by PubMed ID or DOI or some sort of publication URI. > Particular cases like the one I described might be able to be secured in > an ad hoc fashion such as by checking that the authors are the same, but > the general attack will still pervade this type of approach unless > metadata is associated uniquely with a particular version of a > particular model. If the assertions about the same subject cannot be > identified between models in the database, then having data flow back > from the relational database into the model does not carry any benefit > at all). > > However, I do agree that there is a place for some metadata which can be > changed without creating a new version (which probably is the type of > metadata that you wouldn't include in the CellML file by default). > Curation status and permissions would probably fit in this category, > because although they may be associated with a particular version, they > should not be immutable for a given version. > > 2) I think that there should be a directory for each mathematical model > (which may include several CellML model files, documentation, and so > on), so that a particular version can be downloaded / checked out in its > entirety (with some directory-level manifest describing how to run or > view the model). This suggests that collisions between mathematical > models should be prevented at this level, not at the file level. Under > this scheme, Mary would find that at usage example 3, she couldn't use > the same directory name as the one John already submitted. > > 3) I think the 'reference by citation' needs some expansion: I think > people referencing models should have the choice to refer to: > => a specific version for which no files will change at all. > => the latest version which aims to reflect the letter of a publication > (updates will only fix mistakes in the model which prevent it from > corresponding to the printed paper). > => the latest version which aims to reflect the results obtained by the > author (updates can fix discrepancies or omissions from the paper that > were in the author's original code, if the author didn't use CellML). > => the latest derivative of the current model developed by the same > author / group, even if it has not yet been peer-reviewed (subject to > permissions constraints). > => the latest derivative of the current model, but with all imports
Re: [cellml-discussion] Concerning the CellML Model Repository
Tommy Yu wrote: > Hi, > > I have written down some of my thoughts on how the model repository could be > put together. > > http://www.cellml.org/Members/tommy/repository_redesign.html > > It is still a pretty rough document. The usage example section gives a rough > outline on what I see people might be doing with the repository and how this > design could address those issues, which I think it will be of interest to > users. It is not an exhaustive list, yet. > > I must also note the design outlined is quite a drastic departure from what > we have now (it will be yet another new repository). However, it is more > true to the one envisioned before according to > http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition > layer that will assist in pulling content and drawing relationships between > models. > > Feel free to take it apart and/or build on top of it. > Hi Tommy, A few comments: 1) I am still not convinced that meta-data should not be versioned, simply because changes to metadata can be important changes to a model. In some cases, such as changes to simulation metadata, the changes might have a major impact on the final model. I don't think it is a bad thing to have a one-way cache of metadata somewhere for technical / performance reasons (perhaps in a relational database), but I think that we should replicate data for each model (perhaps using a deep copy-on-write approach if this is really necessary to save disk space) rather than changing the metadata for existing models without changing the version. Making changes to metadata require changes to the model will ensure that no one gets burned by referencing a particular version of a model, only to find that the metadata in that version has changed on them. Your current unversioned, globally shared metadata approach probably also has security implications. For example, lets say that Alice submits a model which references a publication. Now suppose that Charlie wasn't an author of that paper, but he wants to add his name onto the list of authors. So he submits a completely different, bogus, model which includes metadata for the publication, and includes his name. When Bob downloads Alice's model from the repository, it would then include Charlie's name as one of the authors (assuming that the publication was referenced by PubMed ID or DOI or some sort of publication URI. Particular cases like the one I described might be able to be secured in an ad hoc fashion such as by checking that the authors are the same, but the general attack will still pervade this type of approach unless metadata is associated uniquely with a particular version of a particular model. If the assertions about the same subject cannot be identified between models in the database, then having data flow back from the relational database into the model does not carry any benefit at all). However, I do agree that there is a place for some metadata which can be changed without creating a new version (which probably is the type of metadata that you wouldn't include in the CellML file by default). Curation status and permissions would probably fit in this category, because although they may be associated with a particular version, they should not be immutable for a given version. 2) I think that there should be a directory for each mathematical model (which may include several CellML model files, documentation, and so on), so that a particular version can be downloaded / checked out in its entirety (with some directory-level manifest describing how to run or view the model). This suggests that collisions between mathematical models should be prevented at this level, not at the file level. Under this scheme, Mary would find that at usage example 3, she couldn't use the same directory name as the one John already submitted. 3) I think the 'reference by citation' needs some expansion: I think people referencing models should have the choice to refer to: => a specific version for which no files will change at all. => the latest version which aims to reflect the letter of a publication (updates will only fix mistakes in the model which prevent it from corresponding to the printed paper). => the latest version which aims to reflect the results obtained by the author (updates can fix discrepancies or omissions from the paper that were in the author's original code, if the author didn't use CellML). => the latest derivative of the current model developed by the same author / group, even if it has not yet been peer-reviewed (subject to permissions constraints). => the latest derivative of the current model, but with all imports external to the model updated to the latest versions (even if this has not been reviewed by the author). This would be the most frequently updated version, because it could be automatically created without the model author being involved. It would also be possible to sear
Re: [cellml-discussion] Concerning the CellML Model Repository
Hi Tommy, I found the document seemed to be too far ahead of itself. I also didn't find any of the pros and cons very compelling because they don't address specific problems and those problems are not described. 1) What are you actually trying to achieve? It would be useful to describe the parts of the current system that are giving you grief and look to give you more grief based on the use cases and any axes of scale. 2) What are the use cases? An initial set should be extracted from the current site. You have written out some, but they only covered a small set of function of the site, especially when it comes to relations between models or workflow and curation states. I understand some of the details that are causing you pain with the current implementation, but I think the first part of this is to be charitable to the current system and adequately describe the two points above. Before rethinking the implementation of this site I think the following need to also be done: - a specification for assigning a URI to these models (as would be used by CellML 1.1 imports) - a specification for how a manifest file is to be constructed, or some set of rules for interpreting a directory structure of models, especially in those cases where there are multiple local models used in imports and we need to point to at least the top level model. - a suggested solution to the bqs problem. Research existing standards. Generally: Relational databases are useful, but so are the combination of ZCatalog and Sets. It really depends on the structure of the data and the queries you want to perform. You should write out a reasonable set of these in natural language to get the focus right. Maybe a proof of concept using various mechanisms is required. The frustration with metadata handling at the moment is a result of some difficulties in the metadata specification for the metadata you are using the most and also the use of a quite esoteric system: 4Suite's Versa RDF query interface. RDQL or SPARQL are better SQL-like equivalents and certainly have a wide acceptance. Subversion offers a nice philosophy of code management and the guess is that this would apply well to the modeling process. It also offers the potential for building URIs for versioned material - individual files and whole changesets (which is something we are after). The default webdav URI scheme may not be what we want, so it is also worth looking at others; for example, the trac browser interface to a subversion repository form quite nice URIs. Workflow and security as defined and implemented by Zope/CMF/Plone is a very nice model that should be reflected in our workflow and security use-cases. We discussed a few weeks ago that if this environment is going to provide the security layer, then there needs to be a relationship between this and the subversion repository at quite a detailed level. cheers Matt On 6/21/07, Tommy Yu <[EMAIL PROTECTED]> wrote: > Hi, > > I have written down some of my thoughts on how the model repository could be > put together. > > http://www.cellml.org/Members/tommy/repository_redesign.html > > It is still a pretty rough document. The usage example section gives a rough > outline on what I see people might be doing with the repository and how this > design could address those issues, which I think it will be of interest to > users. It is not an exhaustive list, yet. > > I must also note the design outlined is quite a drastic departure from what > we have now (it will be yet another new repository). However, it is more > true to the one envisioned before according to > http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition > layer that will assist in pulling content and drawing relationships between > models. > > Feel free to take it apart and/or build on top of it. > > Cheers, > Tommy. > ___ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion > ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Concerning the CellML Model Repository
Hi Tommy, looks like a good starting point for some discussion. Just to help me think through some of the issues, is there any chance you could add a usage example illustrating how this system would deal with a model made from the combination of a bunch of papers (i.e., a single model where each component defines a new citation). I'm guessing this would be done by adding each of the components as separate models and then importing them into a single model? Another usage example that might be interesting to look at would be a model author adding a local CellML 1.1 model hierarchy to a remote repository and how all the import href's are handled in this case (i.e., imports throughout the model hierarchy might consist of a mix of relative, http, and file URLs). And another usage example might be the searching for models built using a specific set of data. It will hopefully become standard practice to annotate variable values with their source, where the source may be some data from a different article than the model's publication. Thanks, David. Tommy Yu wrote: > Hi, > > I have written down some of my thoughts on how the model repository could be > put together. > > http://www.cellml.org/Members/tommy/repository_redesign.html > > It is still a pretty rough document. The usage example section gives a rough > outline on what I see people might be doing with the repository and how this > design could address those issues, which I think it will be of interest to > users. It is not an exhaustive list, yet. > > I must also note the design outlined is quite a drastic departure from what > we have now (it will be yet another new repository). However, it is more > true to the one envisioned before according to > http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition > layer that will assist in pulling content and drawing relationships between > models. > > Feel free to take it apart and/or build on top of it. > > Cheers, > Tommy. > ___ > cellml-discussion mailing list > cellml-discussion@cellml.org > http://www.cellml.org/mailman/listinfo/cellml-discussion -- David Nickerson, PhD Research Fellow Division of Bioengineering Faculty of Engineering National University of Singapore Email: [EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
[cellml-discussion] Concerning the CellML Model Repository
Hi, I have written down some of my thoughts on how the model repository could be put together. http://www.cellml.org/Members/tommy/repository_redesign.html It is still a pretty rough document. The usage example section gives a rough outline on what I see people might be doing with the repository and how this design could address those issues, which I think it will be of interest to users. It is not an exhaustive list, yet. I must also note the design outlined is quite a drastic departure from what we have now (it will be yet another new repository). However, it is more true to the one envisioned before according to http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition layer that will assist in pulling content and drawing relationships between models. Feel free to take it apart and/or build on top of it. Cheers, Tommy. ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion