Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
On 4/1/07, Nicolas Le Novere [EMAIL PROTECTED] wrote: I misunderstand the scope of the property isDescribedBy. I also don't think reverse engineering URIs to obtain meaning is a good practice. But ... you do not reverse engineer anything. Though you have to pull apart the URI correctly to discover the key. You mean to split urn:MyURI:12345 into urn:MyURI and 12345? (I voluntarily use URN form rather than URL to avoid confusion) Yes we may have to so that in some cases. Isn't this what you have to do every time? E.g.: x bqmodel:isDescribedBy http://www.pubmed.gov/#8983160 or the more general case: one of: $X $QUALIFIER [$DATATYPE-1#$IDENTIFIER-1, $DATATYPE-2#$IDENTIFIER-2, ... $DATATYPE-k#$IDENTIFIER-n ... $DATATYPE-K#$IDENTIFIER-N] $X $QUALIFIER [urn:$DATATYPE-1':$IDENTIFIER-1, urn:$DATATYPE-2':$IDENTIFIER-2, ... urn:$DATATYPE-k':$IDENTIFIER-n ... urn:$DATATYPE-K':$IDENTIFIER-N] where $DATATYPE-k represents datatype k in URL form and $DATATYPE-k' represents datatype k in URN form and $IDENTIFIER-n can be any string allowed in URIs and $X is a model constituent and $QUALIFIER is the qualifier property of the annotation of constituent $X You will always need to pull apart the 'URI' (table2 MIRIAM document) to retrieve the datatype and identifier. I guess I'm not sure why it isn't easier to keep the meaning of datatype and identifier seperate within the language context you are using - which is basically RDF. So that instead you would have something like: $X $QUALIFIER $DATATYPEINSTANCE $DATATYPEINSTANCE isA $DATATYPE $DATATYPEINSTANCE hasIdentifier $IDENTIFIER $DATATYPEINSTANCE hasPhysicalUrl $URL $QUALIFIER could be as general as isDescribedBy, or as specific as, for example, isDescribedByPubMedRecord $QUALIFIER would have domain and range constraints. I see the end result been the same, but the latter method is easier to extend and specialise. It also remains within the RDF standard which I think the MIRIAM document should have focused on more rather than inventing a very specific non-standard way of representing identifers and datatypes. The URI IS the meaning. In the English dictionary, there is a word publication, with a definition. Well, in MIRIAM dictionary, this word is http://www.pubmed.gov/; So you say somewhere in the dictionary that there is a set of things that are Publications and this set is denoted by any URI that starts with http://www.pubmed.gov/ ? No. Publication is a human notion. We are dealing we software here. http://www.pubmed.gov/ is sufficient to uniquely identify a type of data. What the software does with it is its own business. Publication is a useful semantic term for a machine to resolve to. Publication could have many may representations in machine form, it's just important that the annotation language implies they are all to be equivalent. I presume there are other URI bases that also mean publication? Something like: http://www.pubmed.gov/ isA Publication http://not.in.pubmed/ isA Publication Yes. At the moment, we just have PubMed and DOI, we are adding arXiv. So you do have a machine interpretation of Publication - yours is a lookup list. We do not need to relate them to specify that they all deal with publications. It is already done by the bqmodel:isDescribedBy No, isDescribedBy has no semantic meaning - there is nothing to say that it explicitly defines a publication in a journal article or a vocabulary term. How do you extend the mapping of URI where the URI points to a general identification service that resolves across, for example, different publication indexes/databases. Do you need to ask people to replace this URI (which may actually be usable to return some more RDF) with a new one that uses a seperate namespace for each publication index/databases? I think there are maybe two misunderstandings here. The first one is between the MIRIAM notions of data-type and of resource. MIRIAM URIs describe data using data-type and identifiers. This data can be distributed through various resources. But we do not want to put information about those resources in the models. The life-span of resources is in general pretty short. Yes, I think we agree on that. And that brings-me to the deeper misunderstanding, that is maybe the cause of all this discussion. The only purpose of MIRIAM annotation is to uniquely identify an annotation, in a perennial way. It is not to implement a semantic web infrastructure where you can go directly from the annotation to the resource pointed by the annotation. While there are some semantic web languages that say the identifier for a resource is also the location of the resource - I'm certainly not implying that here. The example of a specific data warehouse uri is simply that if you find some resource that you would like
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
On 4/3/07, Nicolas Le Novere [EMAIL PROTECTED] wrote: You will always need to pull apart the 'URI' (table2 MIRIAM document) to retrieve the datatype and identifier. Well, yes you have to recognise what belongs to the data-type and what belong to the identifier. But you do that all the time in RDF anyway. And beside, we do it for you and can return one or more URLs. In what part of RDF is this done? I guess I'm not sure why it isn't easier to keep the meaning of datatype and identifier seperate within the language context you are using - which is basically RDF. So that instead you would have something like: $X $QUALIFIER $DATATYPEINSTANCE $DATATYPEINSTANCE isA $DATATYPE $DATATYPEINSTANCE hasIdentifier $IDENTIFIER $DATATYPEINSTANCE hasPhysicalUrl $URL Something like: species calmodulin is calmodulin_in_uniprot calmodulin_in_uniprot isA UniProt_entry calmodulin_in_uniprot hasIdentifier P62158 calmodulin_in_uniprot hasPhysicalUrl http://www.ebi.uniprot.org/entry/P62158 - One cannot recognize UniProt_entry if it is a free string? This is why there are MIRIAM data-types - The physical URL being not stable, one cannot store actual URLs in the models themselves. Nope. I should be clearer about what would be published globally and what should be in a particular model. Published globally: calmodulin_in_uniprot isA UniProt_entry calmodulin_in_uniprot hasIdentifier P62158 calmodulin_in_uniprot hasPhysicalUrl http://www.ebi.uniprot.org/entry/P62158 UniProt_entry subClassOf DatabaseRecord (that's quite generic) hasUniprotEntry subPropertyOf hasDatabaseRecord hasDatabaseRecord subPropertyOf isDescribedBy In the model: x hasUniProtEntry calmodulin_in_uniprot Now one can filter all annotations in a model for database links only, or uniprot links only etc. It all just works when using RDF Schema. So we are down to: species calmodulin is calmodulin_in_uniprot calmodulin_in_uniprot isA http://www.uniprot.org/ calmodulin_in_uniprot hasIdentifier P62158 How is-it different (in term of information content and of computing steps necessary to parse) from: species calmodulin is http://www.uniprot.org/#P62158 For the reason's I give above. We do not need to relate them to specify that they all deal with publications. It is already done by the bqmodel:isDescribedBy No, isDescribedBy has no semantic meaning - there is nothing to say that it explicitly defines a publication in a journal article or a vocabulary term. Of course not, because we do not want to restrict the type of data used to describe the component. This is what I said just above. What do-you mean by isDescribedBy has no semantic meaning. isDescribedBy exactly means the is described by. How is that not a meaning? In the RDF world you have not attributed any meaning. It could be used for anything. What comes after the by can be a journal article, a webpage, a song, a poem, a control vocabulary term, or a telepathic transmission. Right, and so if it can be followed by anything then it has no meaning there is nothing to distinguish it from any property that someone may make up. So there is no way to determine if some set of URIs are controlled vocab terms and some set are journal articles and some set are experimental result sets? No. It is up to the user to decide what to do with what. See my ChEBI example before. For some people ChEBI is a controlled vocabulary, for some it is a database of chemical compound. For me, CAS is a database of chemical compound, for CellML, it is a bibliographic resource. But when a person uses it, they will have a context. It may be one or more of those from your list, but it is still useful for them to restrict the intention of the property to something close to what they mean. I don't particularly want to guess whether someone is intending the DataType to mean 'chemical compound' as opposed to 'bibliographic resource'. For all I know the record that I may be able to locate based on this URI might have value in both domains and I would want to know what they were intending. Of course you'd hope the record was RDF anyway and you were able to point to the record attribute by a full URI. -- Nicolas LE NOVERE, Computational Neurobiology, EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074 http://www.ebi.ac.uk/~lenov, AIM:nlenovere, MSN:[EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
Hi Nicolas. Thanks for the in-depth reply. On 3/31/07, Nicolas Le Novere [EMAIL PROTECTED] wrote: I misunderstand the scope of the property isDescribedBy. I also don't think reverse engineering URIs to obtain meaning is a good practice. But ... you do not reverse engineer anything. Though you have to pull apart the URI correctly to discover the key. (as well as pulling apart the rdf structure that it's embedded in, which I assume is well defined - only rdf containers of URIs allowed?). The URI IS the meaning. In the English dictionary, there is a word publication, with a definition. Well, in MIRIAM dictionary, this word is http://www.pubmed.gov/; So you say somewhere in the dictionary that there is a set of things that are Publications and this set is denoted by any URI that starts with http://www.pubmed.gov/ ? I presume there are other URI bases that also mean publication? Something like: http://www.pubmed.gov/ isA Publication http://not.in.pubmed/ isA Publication How do you extend the mapping of URI where the URI points to a general identification service that resolves across, for example, different publication indexes/databases. Do you need to ask people to replace this URI (which may actually be usable to return some more RDF) with a new one that uses a seperate namespace for each publication index/databases? Is this MIRIAM dictionary considered a global dictionary? Can people maintain their own local ones? Is there a protocol for creating a dictionary that maps URI (bases or namespaces?) to meaning - e.g. isA Publication - and a way to share this with others? How do you say one URI means the same as another if the URI scheme changes? It seems you leave this up to the developer to make sure they accomodate both instead of letting rdfs take care of this. The URI scheme should not change. Why? There are a number of reasons the URIs (including the namespace) may change and RDF certainly doesn't suggest they shouldn't. A more common case though is that more namespaces are added for reasons such as different authority over similar resources, different versions of resources, dividing out a data warehouse into its original providers, or collapsing databases into a warehouse. In the rare case it changes, it is up to us to provide a deprecation system so that the developer actually does not feel the change at all. I presume you allow for different base URIs that share a common namespace to identify with different things? e.g. http://www.organisation.org/models and http://www.organisation.org/microarray How do you say one URI is the same as another in your dictionary? If you create an element bqs:PubMed_id in your language, rather than having a generic reference scheme, with a type PubMed defined elsewhere. I don't see what hardcoded means. Using properties that are defined in a shared schema or standard is a pretty basic premise of sharing information using RDF. How do you think RSS or dublin core works? Exactly, and this is why we dumped first CellML metadata. When we started with CellML metadata, we had bqs:PubMed_id, bqs:Medline_id and bqs:CAS_id - PubMed and Medline are redundant (Medline actually gave up their id. They use PubMed ones now) - We could not refer to anything that was not in PubMed. This is the case of MANY models. Why is that? We coul have asked you guys to develop a new version of CellML metadata spec, with bqs:DOI. How long before we would asked another version with bqs:arXiv? bqs:Scopus_id? That's what I'd expect people to do. And this is only for bibliography. By the way, Dublin Core does not work that well. I explored extensively the usage of DC and it is an absolute mess. Everybody implement its own usages and rules. Elements and their syntax change wildly from one place to the other. At the end, this is the usual semantic web. Everybody export the information alright, but barely anybody can read-it. I am not even talking about interpreting it. I think it works well for simple authorship details of a content item. our bqs namespace reflects OMG's Bibliographic Query Service specification - see http://www.omg.org/docs/dtc/01-04-05.pdf I know BQS very well. It has been developed in the room next to mine for a project that died before birth (because we massively gave up CORBA). It is NOT a standard. It has been endorsed by the OMG because at that time the EBI was a member, and because Martin Senger was the person writing many of those things (he also wrote the specification of LSID) I'm not sure I see the point here. It was used because it offered the right concepts for what it was intended to be used for. The RDF Schema is available over the web, and any RDFS aware interpreter can use it to better understand the annotation properties of CellML models. It's shared and available and referenced by models. The big advantage of externalising the type of metadata is that the scheme is generic.
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
I misunderstand the scope of the property isDescribedBy. I also don't think reverse engineering URIs to obtain meaning is a good practice. But ... you do not reverse engineer anything. The URI IS the meaning. In the English dictionary, there is a word publication, with a definition. Well, in MIRIAM dictionary, this word is http://www.pubmed.gov/; How do you say one URI means the same as another if the URI scheme changes? It seems you leave this up to the developer to make sure they accomodate both instead of letting rdfs take care of this. The URI scheme should not change. In the rare case it changes, it is up to us to provide a deprecation system so that the developer actually does not feel the change at all. If you create an element bqs:PubMed_id in your language, rather than having a generic reference scheme, with a type PubMed defined elsewhere. I don't see what hardcoded means. Using properties that are defined in a shared schema or standard is a pretty basic premise of sharing information using RDF. How do you think RSS or dublin core works? Exactly, and this is why we dumped first CellML metadata. When we started with CellML metadata, we had bqs:PubMed_id, bqs:Medline_id and bqs:CAS_id - PubMed and Medline are redundant (Medline actually gave up their id. They use PubMed ones now) - We could not refer to anything that was not in PubMed. This is the case of MANY models. We coul have asked you guys to develop a new version of CellML metadata spec, with bqs:DOI. How long before we would asked another version with bqs:arXiv? bqs:Scopus_id? And this is only for bibliography. By the way, Dublin Core does not work that well. I explored extensively the usage of DC and it is an absolute mess. Everybody implement its own usages and rules. Elements and their syntax change wildly from one place to the other. At the end, this is the usual semantic web. Everybody export the information alright, but barely anybody can read-it. I am not even talking about interpreting it. our bqs namespace reflects OMG's Bibliographic Query Service specification - see http://www.omg.org/docs/dtc/01-04-05.pdf I know BQS very well. It has been developed in the room next to mine for a project that died before birth (because we massively gave up CORBA). It is NOT a standard. It has been endorsed by the OMG because at that time the EBI was a member, and because Martin Senger was the person writing many of those things (he also wrote the specification of LSID) The big advantage of externalising the type of metadata is that the scheme is generic. What does generic mean? Standardised and used by everyone like a published schema would be? No. Generic, because it can be applied to any kind of data. You do not need to define specific scheme for each new data-type. Bibliography is just a type of metadata like any other. No need to a special treatment. What do you mean by the 'type' of metadata? EC page, PubMed entry, DOI indexed document, UniProt entry, Gene Ontology term etc. The type of metadata is externalized as soon as it is presented in a Schema and made public and adopted by the community. That cannot work for several reasons. First, the type of metadata evolves very rapidly. We already have 29 types in MIRIMA resources, but I anticipate that number to grow very rapidly as libSBML3 (that implement the RDF annotation scheme) is adopted by the developers. Second, the relevant metadata varies according to the community. An obvious example is BQS. It used PubMed because it was developed at the EBI by a software engineer who just did not know there was anything else than PubMed in bibliography. Third, who decide what adopted and community means? For instance we have been struggling with that in SBML for years. We are just starting to have a robust model of development, with a balance between democracy and technical soundness. I think we actually have a pretty good system. But it is not trivial. (In case there is a misunderstanding here: SBML and MIRIAM are separate entities. I am just using SBML as an example). Finally, when was CellML metadata made public, how was the community consulted, and how its feedback was incorporated in the specification? (I am not even talking about BQS. As I said this is NOT a community standard. It is a data-model developed by one person for a very specific project). I think more I am misunderstanding the range of use isDescribedBy is ok for. isDescribedBy is a relationships. are you meaning is-a relation? as in Type? Yes, model id=EPSP_Edelstein metaid=_01 [...] rdf:Description rdf:about=#_01 bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ /rdf:Bag /bqmodel:isDescribedBy means: the model contained in the SBML model EPSP_Edelstein is described in the metadata 8983160 of the data-type http://www.pubmed.gov/; I don't see this at all. isVersionOf and hasVersion is about versions as in successors where both
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
By all means, step in as much as possible. Can you explain in more detail or point to explanations of bqmodel:isDescribedBy? Specifically: - what is its intended meaning? - when more than one of these is defined on a resource, how is this interpreted? For example: is there some precedence implied somehow? - how do you determine the kind of reference it is - for example a pubmed uri? You have a datatype for vocab/database IDs in the annotation scheme you described, but I don't see this in the bqmodel:isDescribedBy examples. - how would you address auxiliary references as opposed to primary references so that a machine interpreting it can make the distinction? snip I entirely agree with Melanie, people should be able to pick the resource they want, as far as they uniquely identify it. This is clearly described in the MIRIAM paper. I'm not sure what benefits one gains from letting people arbitrarily choose what they want to use to identify something with. For example, how to you work out if particular entities in one SBML model match entities in another SBML model? Also, given that most of these resources are controlled vocabularies, there is a lot of room for misunderstanding someone's intention when using their choices of identifiers. An annotation is formed of three parts: The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ... The identifier of the particular information, e.g. 123456789, GO:0001234 ... An optional qualifier that describe the relationship between the concept represented by the model component and the concept represented by the particular information. To help people implement that, we developed MIRIAM resources (http://www.ebi.ac.uk/compneur-srv/miriam/). If you download a model from BioModels DB in SBML (not in CellML at the moment, for obvious reasons highlighted by the current discussion), you will see something like: bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ /rdf:Bag /bqmodel:isDescribedBy But on the webpage, there is: bPublication ID:/bnbsp;a href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrievedb=pubmeddopt=Abstractlist_uids=8983160; target=_blank8983160/a The URL is dynamically generated by MIRIAM webservices. I fact in the new version of BioModels DB, to be released in the fall, the URL does not point to PubMed anymore, but to the EBI extended Medline, more comprehensive. BUT the URI stored in the model is still the SAME. Similarly for a DOI: bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.doi.org/#10.1063/1.1681288/ /rdf:Bag /bqmodel:isDescribedBy is transformed in: bPublication ID:/bnbsp;a href=http://dx.doi.org/10.1063/1.1681288; target=_blank10.1063/1.1681288.../a That system is very flexible. You can use any resource listed in MIRIAM resources, and this resource can be extended at will (note that we distribute XML version of the resource for local use). But it is still robust and expressive. Cheers, On Wed, 28 Mar 2007, Melanie Nelson wrote: Wow, I haven't posted to this list in a long time... But I feel compelled to give a little advice as someone who's spent a lot of time integrating biological information and therefore has made a lot of mistakes! By all means, have a best practice encouraging people to use the GO cellular_component ontology to describe organelles and cells. You could probably also use the molecular_function ontology for proteins (although this will be messier). However, neither is likely to be a complete, i.e., there will be models that reference a biological entity not in the GO ontologies. Also, there will be cases where the entity the model references is most properly thought of as related in some way (e.g., a subset, a superset, or a sibling) to the GO entity. You can spend ages sorting this sort of thing out and coming up with consistent rules for handling all the relationships. Since you aren't really interested in sorting out this biological mess, you may want to consider letting people choose their own ontology and just reference it. An example of this practice is in the MIAME project: http://www.mged.org/Workgroups/MIAME/miame_1.1.html About the citations- my memory of this is fuzzy, but I think the original intent was that people should provide the PubMed ID where possible. However, not all journals are indexed in PubMed (for instance, there is a CellML paper published in one that is not), so the model needs to handle full citation info, too. The BQS model handles both, and then some, which is why we chose it. Hope this is helpful, Melanie --- Andrew Miller [EMAIL PROTECTED] wrote: Matt wrote: I don't think this is a good idea. - I think bioentity should be depreciated, it has not intrinsic semantic value. It does, unfortunately, seem to usually target a literal node at the moment. It would
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
On Thu, 29 Mar 2007, Matt wrote: BioPAX annotatation scheme is very fragile and in practise almost unusable. How is that? Because the annotation is free-form. I can use UniProt, uniprot, Uni-Prot etc. That was always going to wash out in practice. I'm not sure a rule for generating the URI is useful in the long run, especially if these database produce their own URI scheme, then it becomes a less simple task to match other data resources that don't use the biopax scheme to the ones that use the originating databases scheme. I entirely agree. This is why (well, that is one of the reasons) I do not like LSIDs, and why we developed MIRIAM resources. So that everybody use the same URIs. This is explained in the MIRIAM paper. -- Nicolas LE NOVERE, Computational Neurobiology, EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074 http://www.ebi.ac.uk/~lenov, AIM: nlenovere, MSN: [EMAIL PROTECTED] ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
On Thu, 29 Mar 2007, Matt wrote: Can you explain in more detail or point to explanations of bqmodel:isDescribedBy? You can find some explanations at: http://www.ebi.ac.uk/compneur-srv/miriam-main/mdb?section=qualifiers Note tha qualifiers are optional to be MIRIAM-compliant. I personaly think we should always use some qualification, otherwise an annotation becomes very difficult to use except for jumping from webpage to webpage. Specifically: - what is its intended meaning? Cf above. Note that the list of qualifiers is by no mean frozen. We are already aware of several gaps (e.g. how do-we qualify the relation between a peptide and the gene that encodes it?) - when more than one of these is defined on a resource, how is this interpreted? For example: is there some precedence implied somehow? This is up to the tool using the qualifiers. SBML does not allow nested qualifications. There is only an implicit hasVersion if several identical qualifiers are present: bqmodel:isDescribedBy toto bqmodel:isDescribedBy tata means is described by toto and is described by tata. In other words toto or tata describe the component. NOT toto and tata are necessary to describe the component. On top of that, BioModels DB add some precedence http://www.ebi.ac.uk/compneur-srv/biomodels/doc/annotation.html But all that is not part of MIRIAM rules. - how do you determine the kind of reference it is - for example a pubmed uri? You have a datatype for vocab/database IDs in the annotation scheme you described, but I don't see this in the bqmodel:isDescribedBy examples. rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ http://www.pubmed.gov/ means the following identifier has to be interpreted as pointing to a data of PubMed. http://www.pubmed.gov/ is unique and should not normally change. However, sometimes it may neverstheless change for various reasons: URI too confusing, badly choose, fusion of two resources etc. For instance, the old PubMed URI was http://www.ncbi.nlm.nih.gov/PubMed/ It was misleading because tied to a particular physical resource at the NCBI. We have a deprecation system in place that allow to resolve the old URIs and provide the new ones. - how would you address auxiliary references as opposed to primary references so that a machine interpreting it can make the distinction? I am not sure I understand that. Like primary and secondary accessions of UniProt? snip I entirely agree with Melanie, people should be able to pick the resource they want, as far as they uniquely identify it. This is clearly described in the MIRIAM paper. I'm not sure what benefits one gains from letting people arbitrarily choose what they want to use to identify something with. For example, how to you work out if particular entities in one SBML model match entities in another SBML model? Also, given that most of these resources are controlled vocabularies, there is a lot of room for misunderstanding someone's intention when using their choices of identifiers. An annotation is formed of three parts: The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ... The identifier of the particular information, e.g. 123456789, GO:0001234 ... An optional qualifier that describe the relationship between the concept represented by the model component and the concept represented by the particular information. To help people implement that, we developed MIRIAM resources (http://www.ebi.ac.uk/compneur-srv/miriam/). If you download a model from BioModels DB in SBML (not in CellML at the moment, for obvious reasons highlighted by the current discussion), you will see something like: bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ /rdf:Bag /bqmodel:isDescribedBy But on the webpage, there is: bPublication ID:/bnbsp;a href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrievedb=pubmeddopt=Abstractlist_uids=8983160; target=_blank8983160/a The URL is dynamically generated by MIRIAM webservices. I fact in the new version of BioModels DB, to be released in the fall, the URL does not point to PubMed anymore, but to the EBI extended Medline, more comprehensive. BUT the URI stored in the model is still the SAME. Similarly for a DOI: bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.doi.org/#10.1063/1.1681288/ /rdf:Bag /bqmodel:isDescribedBy is transformed in: bPublication ID:/bnbsp;a href=http://dx.doi.org/10.1063/1.1681288; target=_blank10.1063/1.1681288.../a That system is very flexible. You can use any resource listed in MIRIAM resources, and this resource can be extended at will (note that we distribute XML version of the resource for local use). But it is still robust and expressive. Cheers, On Wed, 28 Mar 2007, Melanie Nelson wrote: Wow, I haven't posted to this list in a long time... But I feel compelled to give a little advice as someone who's spent a lot of time
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
On 3/29/07, Nicolas Le Novere [EMAIL PROTECTED] wrote: On Thu, 29 Mar 2007, Matt wrote: Can you explain in more detail or point to explanations of bqmodel:isDescribedBy? You can find some explanations at: http://www.ebi.ac.uk/compneur-srv/miriam-main/mdb?section=qualifiers So there is no simple way to determine if this is a reference to a journal article except through interpreting the URI? Note tha qualifiers are optional to be MIRIAM-compliant. I personaly think we should always use some qualification, otherwise an annotation becomes very difficult to use except for jumping from webpage to webpage. Specifically: - what is its intended meaning? Cf above. Note that the list of qualifiers is by no mean frozen. We are already aware of several gaps (e.g. how do-we qualify the relation between a peptide and the gene that encodes it?) - when more than one of these is defined on a resource, how is this interpreted? For example: is there some precedence implied somehow? This is up to the tool using the qualifiers. SBML does not allow nested qualifications. There is only an implicit hasVersion if several identical qualifiers are present: bqmodel:isDescribedBy toto bqmodel:isDescribedBy tata means is described by toto and is described by tata. In other words toto or tata describe the component. NOT toto and tata are necessary to describe the component. On top of that, BioModels DB add some precedence http://www.ebi.ac.uk/compneur-srv/biomodels/doc/annotation.html But all that is not part of MIRIAM rules. - how do you determine the kind of reference it is - for example a pubmed uri? You have a datatype for vocab/database IDs in the annotation scheme you described, but I don't see this in the bqmodel:isDescribedBy examples. rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ http://www.pubmed.gov/ means the following identifier has to be interpreted as pointing to a data of PubMed. http://www.pubmed.gov/ is unique and should not normally change. However, sometimes it may neverstheless change for various reasons: URI too confusing, badly choose, fusion of two resources etc. For instance, the old PubMed URI was http://www.ncbi.nlm.nih.gov/PubMed/ It was misleading because tied to a particular physical resource at the NCBI. We have a deprecation system in place that allow to resolve the old URIs and provide the new ones. - how would you address auxiliary references as opposed to primary references so that a machine interpreting it can make the distinction? I am not sure I understand that. Like primary and secondary accessions of UniProt? For journal articles, or other publications, then being able to identify the primary reference(s) is useful. For database records, it would also be useful to label a group as being the most important (or defining) set, and others as 'helpful'. It was why I suggested that CellML bibliographic referencing seperated these two, and that the latter would need to be bound to a reason (a natural language comment would be fine) the described why that reference was made. snip I entirely agree with Melanie, people should be able to pick the resource they want, as far as they uniquely identify it. This is clearly described in the MIRIAM paper. I'm not sure what benefits one gains from letting people arbitrarily choose what they want to use to identify something with. For example, how to you work out if particular entities in one SBML model match entities in another SBML model? Also, given that most of these resources are controlled vocabularies, there is a lot of room for misunderstanding someone's intention when using their choices of identifiers. An annotation is formed of three parts: The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ... The identifier of the particular information, e.g. 123456789, GO:0001234 ... An optional qualifier that describe the relationship between the concept represented by the model component and the concept represented by the particular information. To help people implement that, we developed MIRIAM resources (http://www.ebi.ac.uk/compneur-srv/miriam/). If you download a model from BioModels DB in SBML (not in CellML at the moment, for obvious reasons highlighted by the current discussion), you will see something like: bqmodel:isDescribedBy rdf:Bag rdf:li rdf:resource=http://www.pubmed.gov/#8983160/ /rdf:Bag /bqmodel:isDescribedBy But on the webpage, there is: bPublication ID:/bnbsp;a href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrievedb=pubmeddopt=Abstractlist_uids=8983160; target=_blank8983160/a The URL is dynamically generated by MIRIAM webservices. I fact in the new version of BioModels DB, to be released in the fall, the URL does not point to PubMed anymore, but to the EBI extended Medline, more comprehensive. BUT the URI
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
I don't think this is a good idea. - I think bioentity should be depreciated, it has not intrinsic semantic value. - If it is used currently, it should be left as its current minimum specification which is to label and point to other bioinformatics database IDs. - The problem is not 'biologically related paper's' per se, but one of identifying what was the primary publication or publications that motivated a model. - There is also the case where a single publication that contains a mathematical model is the one and only primary source for the model itself - a rather common case at the moment. I would prefer that the primary publication(s) be identified as such, which covers the case in where there are some models in the repository built from general review papers of biology with no math. I would prefer references to other related publications to be bound explicitly to a comment in the model metadata - there should be a reason identified by the author/editor/reviewer as to why there has been such an association made. As an aside, we also need to determine whether the bqs schema provides enough detail to match publications across metadata instances for different models, and whether we should be complimenting bibliographic data with pubmed Ids and the like. cheers Matt On 3/29/07, Andrew Miller [EMAIL PROTECTED] wrote: Hi, As discussed at the last CellML meeting, there are some models which reference both the paper about the model, and a reference about the biology. Since there is no way to determine between them, this creates problems for CellML metadata processing tools which want to identify the paper about the model (such as the CellML repository). However, it would still be a good thing to include references about the biology / experiments on which a model is based, as well as papers on underlying mathematical techniques (and perhaps earlier papers?) The CellML Metadata specification already describes a predicate cmeta:bio_entity, and another cmeta:math_problem. Although the cmeta specification suggests that these be used to provide references to identifiers for the biological entity a part of the model relates to, and likewise for the mathematical problem, it would also be possible to create a list of references inside the resource targeted by the bio_entity or math_problem predicate. I would therefore suggest that the following be considered best practice: 1) Only refer to the paper about the model from the metadata for the model. 2) Any other papers should be in another resource referred to from the bio_entity and math_problem entities. Does anyone else have any opinion on this? Best regards, Andrew || ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?
Melanie! Thanks for your thoughts. You are right about the mess mapping to different ontologies and vocabs produces. We have been working on trying to integrate explicitly with biopax (http://www.biopax.org/ states and generics proposal - level 2 was too limiting) in the hope that other databases (like GO, reactome, KEGG, DIP, signalling gateway, etc) get dragged along too. Some seem to be. At the moment, I think the cleanup begins at the repository interface. So long as we validate that the biological annotation is unambiguous, complete, and represented in a way that agrees with one or more of an accepted set of ontologies and vocabs, then I think we might be in a position to perform useful queries. We also have a person doing a PhD here now on the visualisation of CellML models which relies solely on the biological annotation. It's a pretty good failure point to get nothing in your picture :-) thanks again cheers Matt On 3/29/07, Melanie Nelson [EMAIL PROTECTED] wrote: Wow, I haven't posted to this list in a long time... But I feel compelled to give a little advice as someone who's spent a lot of time integrating biological information and therefore has made a lot of mistakes! By all means, have a best practice encouraging people to use the GO cellular_component ontology to describe organelles and cells. You could probably also use the molecular_function ontology for proteins (although this will be messier). However, neither is likely to be a complete, i.e., there will be models that reference a biological entity not in the GO ontologies. Also, there will be cases where the entity the model references is most properly thought of as related in some way (e.g., a subset, a superset, or a sibling) to the GO entity. You can spend ages sorting this sort of thing out and coming up with consistent rules for handling all the relationships. Since you aren't really interested in sorting out this biological mess, you may want to consider letting people choose their own ontology and just reference it. An example of this practice is in the MIAME project: http://www.mged.org/Workgroups/MIAME/miame_1.1.html About the citations- my memory of this is fuzzy, but I think the original intent was that people should provide the PubMed ID where possible. However, not all journals are indexed in PubMed (for instance, there is a CellML paper published in one that is not), so the model needs to handle full citation info, too. The BQS model handles both, and then some, which is why we chose it. Hope this is helpful, Melanie --- Andrew Miller [EMAIL PROTECTED] wrote: Matt wrote: I don't think this is a good idea. - I think bioentity should be depreciated, it has not intrinsic semantic value. It does, unfortunately, seem to usually target a literal node at the moment. It would be nice for this to at least be a resource, which could provide further information about the biological entity (or if we decide not to do that, at least a resource, with a dictionary and a process for adding new words to the dictionary to avoid duplication). It seems that GO(Gene Ontology) has terms for cell types, biological compartments, and so on, which would offer a better way to provide this information. I still think that this metadata is useful, even if the automated interpretation of it is currently difficult. - If it is used currently, it should be left as its current minimum specification which is to label and point to other bioinformatics database IDs. There are three layers of information here: Layer 1: What biological entity are we describing? (could be answered with a GO term). Layer 2: What information about that biological entity are we using? (could be answered with a reference to a paper, and perhaps even a reference to raw experimental data). Layer 3: How was that information translated into a model (could be answered with a reference to a paper on the model). Layer 3 is clearly information about the model, and should be described by as an arc of the model resource. Layer 1 is described by a literal at the moment. Layer 2 is therefore a gap, which we don't have any proper way to represent now. - The problem is not 'biologically related paper's' per se, but one of identifying what was the primary publication or publications that motivated a model. The publication which motivated the expression of a model in CellML, or the publication which motivated the creation of the model? Most of the models in the repository were motivated by a paper about a model which was not initially expressed in CellML. However, the way that the metadata specification works now is that the paper which describes the model (not the paper which motivated it) is referenced from the information about the model (not information about the CellML file). - There is