Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-04-02 Thread Matt
On 4/1/07, Nicolas Le Novere [EMAIL PROTECTED] wrote:

   I misunderstand the scope of the property isDescribedBy. I also don't
   think reverse engineering URIs to obtain meaning is a good practice.
 
  But ... you do not reverse engineer anything.
 
  Though you have to pull apart the URI correctly to discover the key.

 You mean to split urn:MyURI:12345 into urn:MyURI and 12345?
 (I voluntarily use URN form rather than URL to avoid confusion)

 Yes we may have to so that in some cases.


Isn't this what you have to do every time? E.g.:

x bqmodel:isDescribedBy http://www.pubmed.gov/#8983160

or the more general case:

one of:
$X $QUALIFIER [$DATATYPE-1#$IDENTIFIER-1,
   $DATATYPE-2#$IDENTIFIER-2,
   ... $DATATYPE-k#$IDENTIFIER-n
   ... $DATATYPE-K#$IDENTIFIER-N]

$X $QUALIFIER [urn:$DATATYPE-1':$IDENTIFIER-1,
   urn:$DATATYPE-2':$IDENTIFIER-2,
   ... urn:$DATATYPE-k':$IDENTIFIER-n
   ... urn:$DATATYPE-K':$IDENTIFIER-N]

where
$DATATYPE-k represents datatype k in URL form
and $DATATYPE-k' represents datatype k in URN form
and $IDENTIFIER-n can be any string allowed in URIs
and $X is a model constituent
and $QUALIFIER is the qualifier property of the annotation of constituent $X

You will always need to pull apart the 'URI' (table2 MIRIAM document)
to retrieve the datatype and identifier.

I guess I'm not sure why it isn't easier to keep the meaning of
datatype and identifier seperate within the language context you are
using - which is basically RDF. So that instead you would have
something like:

$X $QUALIFIER $DATATYPEINSTANCE

$DATATYPEINSTANCE isA $DATATYPE
$DATATYPEINSTANCE hasIdentifier $IDENTIFIER
$DATATYPEINSTANCE hasPhysicalUrl $URL

$QUALIFIER could be as general as isDescribedBy, or as specific as,
for example, isDescribedByPubMedRecord

$QUALIFIER would have domain and range constraints.

I see the end result been the same, but the latter method is easier to
extend and specialise. It also remains within the RDF standard which I
think the MIRIAM document should have focused on more rather than
inventing a very specific non-standard way of representing identifers
and datatypes.


  The URI IS the meaning. In
  the English dictionary, there is a word publication, with a
  definition.
  Well, in MIRIAM dictionary, this word is http://www.pubmed.gov/;
 
  So you say somewhere in the dictionary that there is a set of things
  that are Publications and this set is denoted by any URI that starts
  with http://www.pubmed.gov/ ?

 No. Publication is a human notion. We are dealing we software here.
 http://www.pubmed.gov/ is sufficient to uniquely identify a type of data.
 What the software does with it is its own business.

Publication is a useful semantic term for a machine to resolve to.
Publication could have many may representations in machine form, it's
just important that the annotation language implies they are all to be
equivalent.


  I presume there are other URI bases that
  also mean publication? Something like:
 
  http://www.pubmed.gov/ isA Publication
  http://not.in.pubmed/ isA Publication

 Yes. At the moment, we just have PubMed and DOI, we are adding arXiv.

So you do have a machine interpretation of Publication - yours is a lookup list.


 We do not need to relate them to specify that they all deal with
 publications. It is already done by the bqmodel:isDescribedBy

No, isDescribedBy has no semantic meaning - there is nothing to say
that it explicitly defines a publication in a journal article or a
vocabulary term.


  How do you extend the mapping of URI where the URI points to a general
  identification service that resolves across, for example, different
  publication indexes/databases. Do you need to ask people to replace
  this URI (which may actually be usable to return some more RDF) with a
  new one that uses a seperate namespace for each publication
  index/databases?

 I think there are maybe two misunderstandings here. The first one is
 between the MIRIAM notions of data-type and of resource. MIRIAM URIs
 describe data using data-type and identifiers. This data can be
 distributed through various resources. But we do not want to put
 information about those resources in the models. The life-span of
 resources is in general pretty short.

Yes, I think we agree on that.


 And that brings-me to the deeper misunderstanding, that is maybe the cause
 of all this discussion. The only purpose of MIRIAM annotation is to
 uniquely identify an annotation, in a perennial way. It is not to
 implement a semantic web infrastructure where you can go directly from the
 annotation to the resource pointed by the annotation.

While there are some semantic web languages that say the identifier
for a resource is also the location of the resource - I'm certainly
not implying that here. The example of a specific data warehouse uri
is simply that if you find some resource that you would like 

Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-04-02 Thread Matt
On 4/3/07, Nicolas Le Novere [EMAIL PROTECTED] wrote:
  You will always need to pull apart the 'URI' (table2 MIRIAM document)
  to retrieve the datatype and identifier.

 Well, yes you have to recognise what belongs to the data-type and what
 belong to the identifier. But you do that all the time in RDF anyway. And
 beside, we do it for you and can return one or more URLs.

In what part of RDF is this done?


  I guess I'm not sure why it isn't easier to keep the meaning of
  datatype and identifier seperate within the language context you are
  using - which is basically RDF. So that instead you would have
  something like:
 
  $X $QUALIFIER $DATATYPEINSTANCE
 
  $DATATYPEINSTANCE isA $DATATYPE
  $DATATYPEINSTANCE hasIdentifier $IDENTIFIER
  $DATATYPEINSTANCE hasPhysicalUrl $URL

 Something like:

 species calmodulin is calmodulin_in_uniprot
 calmodulin_in_uniprot isA UniProt_entry
 calmodulin_in_uniprot hasIdentifier P62158
 calmodulin_in_uniprot hasPhysicalUrl http://www.ebi.uniprot.org/entry/P62158

 - One cannot recognize UniProt_entry if it is a free string? This is why
 there are MIRIAM data-types

 - The physical URL being not stable, one cannot store actual URLs in the
 models themselves.

Nope. I should be clearer about what would be published globally and
what should be in a particular model.

Published globally:

calmodulin_in_uniprot isA UniProt_entry
calmodulin_in_uniprot hasIdentifier P62158
calmodulin_in_uniprot hasPhysicalUrl http://www.ebi.uniprot.org/entry/P62158

UniProt_entry subClassOf DatabaseRecord  (that's quite generic)
hasUniprotEntry subPropertyOf hasDatabaseRecord
hasDatabaseRecord subPropertyOf isDescribedBy

In the model:

x hasUniProtEntry calmodulin_in_uniprot


Now one can filter all annotations in a model for database links only,
or uniprot links only etc. It all just works when using RDF Schema.



 So we are down to:

 species calmodulin is calmodulin_in_uniprot
 calmodulin_in_uniprot isA http://www.uniprot.org/
 calmodulin_in_uniprot hasIdentifier P62158

 How is-it different (in term of information content and of computing steps
 necessary to parse) from:

 species calmodulin is http://www.uniprot.org/#P62158


For the reason's I give above.

  We do not need to relate them to specify that they all deal with
  publications. It is already done by the bqmodel:isDescribedBy
 
  No, isDescribedBy has no semantic meaning - there is nothing to say
  that it explicitly defines a publication in a journal article or a
  vocabulary term.

 Of course not, because we do not want to restrict the type of data used to
 describe the component. This is what I said just above. What do-you mean
 by
 isDescribedBy has no semantic meaning.

 isDescribedBy exactly means the is described by. How is that not a meaning?

In the RDF world you have not attributed any meaning. It could be used
for anything.



 What comes after the by can be a journal article, a webpage, a song, a
 poem, a control vocabulary term, or a telepathic transmission.

Right, and so if it can be followed by anything then it has no meaning
 there is nothing to distinguish it from any property that someone
may make up.


  So there is no way to determine if some set of URIs are controlled
  vocab terms and some set are journal articles and some set are
  experimental result sets?

 No. It is up to the user to decide what to do with what. See my ChEBI
 example before. For some people ChEBI is a controlled vocabulary, for some
 it is a database of chemical compound. For me, CAS is a database of
 chemical compound, for CellML, it is a bibliographic resource.

But when a person uses it, they will have a context. It may be one or
more of those from your list, but it is still useful for them to
restrict the intention of the property to something close to what they
mean. I don't particularly want to guess whether someone is intending
the DataType to mean 'chemical compound' as opposed to 'bibliographic
resource'. For all I know the record that I may be able to locate
based on this URI might have value in both domains and I would want to
know what they were intending. Of course you'd hope the record was RDF
anyway and you were able to point to the record attribute by a full
URI.



 --
 Nicolas LE NOVERE,  Computational Neurobiology,
 EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
 Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
 http://www.ebi.ac.uk/~lenov, AIM:nlenovere, MSN:[EMAIL PROTECTED]

 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-04-01 Thread Matt
Hi Nicolas. Thanks for the in-depth reply.

On 3/31/07, Nicolas Le Novere [EMAIL PROTECTED] wrote:

  I misunderstand the scope of the property isDescribedBy. I also don't
  think reverse engineering URIs to obtain meaning is a good practice.

 But ... you do not reverse engineer anything.

Though you have to pull apart the URI correctly to discover the key.
(as well as pulling apart the rdf structure that it's embedded in,
which I assume is well defined - only rdf containers of URIs
allowed?).

 The URI IS the meaning. In
 the English dictionary, there is a word publication, with a definition.
 Well, in MIRIAM dictionary, this word is http://www.pubmed.gov/;

So you say somewhere in the dictionary that there is a set of things
that are Publications and this set is denoted by any URI that starts
with http://www.pubmed.gov/ ? I presume there are other URI bases that
also mean publication? Something like:

http://www.pubmed.gov/ isA Publication
http://not.in.pubmed/ isA Publication

How do you extend the mapping of URI where the URI points to a general
identification service that resolves across, for example, different
publication indexes/databases. Do you need to ask people to replace
this URI (which may actually be usable to return some more RDF) with a
new one that uses a seperate namespace for each publication
index/databases?

Is this MIRIAM dictionary considered a global dictionary? Can people
maintain their own local ones? Is there a protocol for creating a
dictionary that maps URI (bases or namespaces?) to meaning - e.g. isA
Publication - and a way to share this with others?


  How do you say one URI means the same as another if the URI scheme
  changes? It seems you leave this up to the developer to make sure they
  accomodate both instead of letting rdfs take care of this.

 The URI scheme should not change.

Why? There are a number of reasons the URIs (including the namespace)
may change and RDF certainly doesn't suggest they shouldn't. A more
common case though is that more namespaces are added for reasons such
as different authority over similar resources, different versions of
resources, dividing out a data warehouse into its original providers,
or collapsing databases into a warehouse.

 In the rare case it changes, it is up to
 us to provide a deprecation system so that the developer actually does not
 feel the change at all.

I presume you allow for different base URIs that share a common
namespace to identify with different things? e.g.
http://www.organisation.org/models and
http://www.organisation.org/microarray

How do you say one URI is the same as another in your dictionary?


  If you create an element bqs:PubMed_id in your language, rather than
  having a generic reference scheme, with a type PubMed defined elsewhere.
 
  I don't see what hardcoded means. Using properties that are defined in
  a shared schema or standard is a pretty basic premise of sharing
  information using RDF. How do you think RSS or dublin core works?

 Exactly, and this is why we dumped first CellML metadata. When we started
 with CellML metadata, we had bqs:PubMed_id, bqs:Medline_id and bqs:CAS_id

 - PubMed and Medline are redundant (Medline actually gave up their id.
 They use PubMed ones now)

 - We could not refer to anything that was not in PubMed. This is the case
 of MANY models.

Why is that?


 We coul have asked you guys to develop a new version of CellML metadata
 spec, with bqs:DOI. How long before we would asked another version with
 bqs:arXiv? bqs:Scopus_id?


That's what I'd expect people to do.

 And this is only for bibliography.

 By the way, Dublin Core does not work that well. I explored extensively
 the usage of DC  and it is an absolute mess. Everybody implement its own
 usages and rules. Elements and their syntax change wildly from one place
 to the other. At the end, this is the usual semantic web. Everybody export
 the information alright, but barely anybody can read-it. I am not even
 talking about interpreting it.

I think it works well for simple authorship details of a content item.


  our bqs namespace reflects OMG's Bibliographic Query Service
  specification - see http://www.omg.org/docs/dtc/01-04-05.pdf

 I know BQS very well. It has been developed in the room next to mine for a
 project that died before birth (because we massively gave up CORBA). It is
 NOT a standard. It has been endorsed by the OMG because at that time the
 EBI was a member, and because Martin Senger was the person writing many of
 those things (he also wrote the specification of LSID)

I'm not sure I see the point here. It was used because it offered the
right concepts for what it was intended to be used for. The RDF Schema
is available over the web, and any RDFS aware interpreter can use it
to better understand the annotation properties of CellML models.
It's shared and available and referenced by models.


   The big advantage of externalising the type of metadata is that the
   scheme is generic.
 
  

Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-03-31 Thread Nicolas Le Novere

 I misunderstand the scope of the property isDescribedBy. I also don't
 think reverse engineering URIs to obtain meaning is a good practice.

But ... you do not reverse engineer anything. The URI IS the meaning. In
the English dictionary, there is a word publication, with a definition.
Well, in MIRIAM dictionary, this word is http://www.pubmed.gov/;

 How do you say one URI means the same as another if the URI scheme
 changes? It seems you leave this up to the developer to make sure they
 accomodate both instead of letting rdfs take care of this.

The URI scheme should not change. In the rare case it changes, it is up to
us to provide a deprecation system so that the developer actually does not
feel the change at all.

 If you create an element bqs:PubMed_id in your language, rather than
 having a generic reference scheme, with a type PubMed defined elsewhere.

 I don't see what hardcoded means. Using properties that are defined in
 a shared schema or standard is a pretty basic premise of sharing
 information using RDF. How do you think RSS or dublin core works?

Exactly, and this is why we dumped first CellML metadata. When we started
with CellML metadata, we had bqs:PubMed_id, bqs:Medline_id and bqs:CAS_id

- PubMed and Medline are redundant (Medline actually gave up their id.
They use PubMed ones now)

- We could not refer to anything that was not in PubMed. This is the case
of MANY models.

We coul have asked you guys to develop a new version of CellML metadata
spec, with bqs:DOI. How long before we would asked another version with
bqs:arXiv? bqs:Scopus_id?

And this is only for bibliography.

By the way, Dublin Core does not work that well. I explored extensively
the usage of DC  and it is an absolute mess. Everybody implement its own
usages and rules. Elements and their syntax change wildly from one place
to the other. At the end, this is the usual semantic web. Everybody export
the information alright, but barely anybody can read-it. I am not even
talking about interpreting it.

 our bqs namespace reflects OMG's Bibliographic Query Service
 specification - see http://www.omg.org/docs/dtc/01-04-05.pdf

I know BQS very well. It has been developed in the room next to mine for a
project that died before birth (because we massively gave up CORBA). It is
NOT a standard. It has been endorsed by the OMG because at that time the
EBI was a member, and because Martin Senger was the person writing many of
those things (he also wrote the specification of LSID)

  The big advantage of externalising the type of metadata is that the
  scheme is generic.

 What does generic mean? Standardised and used by everyone like a
 published schema would be?

No. Generic, because it can be applied to any kind of data. You do not
need to define specific scheme for each new data-type. Bibliography is
just a type of metadata like any other. No need to a special treatment.

  What do you mean by the 'type' of metadata?

 EC page, PubMed entry, DOI indexed document, UniProt entry, Gene
 Ontology term etc.

 The type of metadata is externalized as soon as it is presented in a
 Schema and made public and adopted by the community.

That cannot work for several reasons.

First, the type of metadata evolves very rapidly. We already have 29 types
in MIRIMA resources, but I anticipate that number to grow very rapidly as
libSBML3 (that implement the RDF annotation scheme) is adopted by the
developers.

Second, the relevant metadata varies according to the community. An
obvious example is BQS. It used PubMed because it was developed at the EBI
by a software engineer who just did not know there was anything else than
PubMed in bibliography.

Third, who decide what adopted and community means? For instance we
have been struggling with that in SBML for years. We are just starting to
have a robust model of development, with a balance between democracy and
technical soundness. I think we actually have a pretty good system. But it
is not trivial. (In case there is a misunderstanding here: SBML and MIRIAM
are separate entities. I am just using SBML as an example).

Finally, when was CellML metadata made public, how was the community
consulted, and how its feedback was incorporated in the specification? (I
am not even talking about BQS. As I said this is NOT a community standard.
It is a data-model developed by one person for a very specific project).

  I think more I am misunderstanding the range of use isDescribedBy is
 ok for.

 isDescribedBy is a relationships.

 are you meaning is-a relation? as in Type?

Yes,

model id=EPSP_Edelstein metaid=_01
[...]
rdf:Description rdf:about=#_01
bqmodel:isDescribedBy
rdf:Bag
rdf:li rdf:resource=http://www.pubmed.gov/#8983160/
/rdf:Bag
/bqmodel:isDescribedBy

means:

the model contained in the SBML model EPSP_Edelstein is described in the
metadata 8983160 of the data-type http://www.pubmed.gov/;

 I don't see this at all. isVersionOf and hasVersion is about
 versions as in successors where both 

Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-03-29 Thread Matt
By all means, step in as much as possible.

Can you explain in more detail or point to explanations of
bqmodel:isDescribedBy?

Specifically:
- what is its intended meaning?
- when more than one of these is defined on a resource, how is this
interpreted? For example: is there some precedence implied somehow?
- how do you determine the kind of reference it is - for example a
pubmed uri? You have a datatype for vocab/database IDs in the
annotation scheme you described, but I don't see this in the
bqmodel:isDescribedBy examples.
- how would you address auxiliary references as opposed to primary
references so that a machine interpreting it can make the distinction?


snip

 I entirely agree with Melanie, people should be able to pick the
 resource they want, as far as they uniquely identify it. This is
 clearly described in the MIRIAM paper.

I'm not sure what benefits one gains from letting people arbitrarily
choose what they want to use to identify something with. For example,
how to you work out if particular entities in one SBML model match
entities in another SBML model?

Also, given that most of these resources are controlled vocabularies,
there is a lot of room for misunderstanding someone's intention when
using their choices of identifiers.



 An annotation is formed of
 three parts:

 The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ...

 The identifier of the particular information, e.g. 123456789, GO:0001234 ...

 An optional qualifier that describe the relationship between the concept 
 represented by the model component and the concept represented by the 
 particular information.

 To help people implement that, we developed MIRIAM resources
 (http://www.ebi.ac.uk/compneur-srv/miriam/).

 If you download a model from BioModels DB in SBML (not in CellML at
 the moment, for obvious reasons highlighted by the current
 discussion), you will see something like:

 bqmodel:isDescribedBy
 rdf:Bag
 rdf:li rdf:resource=http://www.pubmed.gov/#8983160/
 /rdf:Bag
 /bqmodel:isDescribedBy

 But on the webpage, there is:

 bPublication ID:/bnbsp;a 
 href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrievedb=pubmeddopt=Abstractlist_uids=8983160;
  target=_blank8983160/a

 The URL is dynamically generated by MIRIAM webservices. I fact in the
 new version of BioModels DB, to be released in the fall, the URL does
 not point to PubMed anymore, but to the EBI extended Medline, more
 comprehensive. BUT the URI stored in the model is still the SAME.

 Similarly for a DOI:

 bqmodel:isDescribedBy
 rdf:Bag
 rdf:li rdf:resource=http://www.doi.org/#10.1063/1.1681288/
 /rdf:Bag
 /bqmodel:isDescribedBy

 is transformed in:

 bPublication ID:/bnbsp;a href=http://dx.doi.org/10.1063/1.1681288; 
 target=_blank10.1063/1.1681288.../a

 That system is very flexible. You can use any resource listed in
 MIRIAM resources, and this resource can be extended at will (note that
 we distribute XML version of the resource for local use). But it is
 still robust and expressive.

 Cheers,

 On Wed, 28 Mar 2007, Melanie Nelson wrote:

  Wow, I haven't posted to this list in a long time...
  But I feel compelled to give a little advice as
  someone who's spent a lot of time integrating
  biological information and therefore has made a lot of
  mistakes!
 
  By all means, have a best practice encouraging people
  to use the GO cellular_component ontology to describe
  organelles and cells. You could probably also use the
  molecular_function ontology for proteins (although
  this will be messier). However, neither is likely to
  be a complete, i.e., there will be models that
  reference a biological entity not in the GO
  ontologies. Also, there will be cases where the entity
  the model references is most properly thought of as
  related in some way (e.g., a subset, a superset, or a
  sibling) to the GO entity. You can spend ages
  sorting this sort of thing out and coming up with
  consistent rules for handling all the relationships.
 
 
  Since you aren't really interested in sorting out this
  biological mess, you may want to consider letting
  people choose their own ontology and just reference
  it.
  An example of this practice is in the MIAME project:
  http://www.mged.org/Workgroups/MIAME/miame_1.1.html
 
  About the citations- my memory of this is fuzzy, but I
  think the original intent was that people should
  provide the PubMed ID where possible. However, not all
  journals are indexed in PubMed (for instance, there is
  a CellML paper published in one that is not), so the
  model needs to handle full citation info, too. The BQS
  model handles both, and then some, which is why we
  chose it.
 
  Hope this is helpful,
  Melanie
 
 
  --- Andrew Miller [EMAIL PROTECTED] wrote:
 
  Matt wrote:
  I don't think this is a good idea.
 
  - I think bioentity should be depreciated, it has
  not intrinsic semantic value.
 
  It does, unfortunately, seem to usually target a
  literal node at the
  moment. It would 

Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-03-29 Thread Nicolas Le Novere
On Thu, 29 Mar 2007, Matt  wrote:

 BioPAX annotatation scheme is very fragile and in practise almost
 unusable.

 How is that?

Because the annotation is free-form. I can use UniProt, uniprot, Uni-Prot etc.

 That was always going to wash out in practice. I'm not sure a rule for
 generating the URI is useful in the long run, especially if these
 database produce their own URI scheme, then it becomes a less simple
 task to match other data resources that don't use the biopax scheme to
 the ones that use the originating databases scheme.

I entirely agree. This is why (well, that is one of the reasons) I do
not like LSIDs, and why we developed MIRIAM resources. So that
everybody use the same URIs. This is explained in the MIRIAM paper.

-- 
Nicolas LE NOVERE,  Computational Neurobiology,
EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
http://www.ebi.ac.uk/~lenov, AIM: nlenovere, MSN: [EMAIL PROTECTED]
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-03-29 Thread Nicolas Le Novere
On Thu, 29 Mar 2007, Matt  wrote:

 Can you explain in more detail or point to explanations of
 bqmodel:isDescribedBy?

You can find some explanations at:

http://www.ebi.ac.uk/compneur-srv/miriam-main/mdb?section=qualifiers

Note tha qualifiers are optional to be MIRIAM-compliant. I personaly
think we should always use some qualification, otherwise an annotation
becomes very difficult to use except for jumping from webpage to
webpage.

 Specifically:
 - what is its intended meaning?

Cf above. Note that the list of qualifiers is by no mean frozen. We
are already aware of several gaps (e.g. how do-we qualify the relation
between a peptide and the gene that encodes it?)

 - when more than one of these is defined on a resource, how is this
 interpreted? For example: is there some precedence implied somehow?

This is up to the tool using the qualifiers. SBML does not allow
nested qualifications. There is only an implicit hasVersion if several
identical qualifiers are present:

bqmodel:isDescribedBy toto
bqmodel:isDescribedBy tata

means is described by toto and is described by tata. In other words
toto or tata describe the component.

NOT toto and tata are necessary to describe the component.

On top of that, BioModels DB add some precedence
http://www.ebi.ac.uk/compneur-srv/biomodels/doc/annotation.html

But all that is not part of MIRIAM rules.

 - how do you determine the kind of reference it is - for example a
 pubmed uri? You have a datatype for vocab/database IDs in the
 annotation scheme you described, but I don't see this in the
 bqmodel:isDescribedBy examples.

rdf:li rdf:resource=http://www.pubmed.gov/#8983160/

http://www.pubmed.gov/   means the following identifier has to be interpreted 
as pointing to a data of PubMed.

http://www.pubmed.gov/ is unique and should not normally
change. However, sometimes it may neverstheless change for various
reasons: URI too confusing, badly choose, fusion of two resources
etc. For instance, the old PubMed URI was 
http://www.ncbi.nlm.nih.gov/PubMed/
It was misleading because tied to a particular physical resource at
the NCBI.

We have a deprecation system in place that allow to resolve the
old URIs and provide the new ones.


 - how would you address auxiliary references as opposed to primary
 references so that a machine interpreting it can make the distinction?

I am not sure I understand that. Like primary and secondary accessions of 
UniProt?


 snip

 I entirely agree with Melanie, people should be able to pick the
 resource they want, as far as they uniquely identify it. This is
 clearly described in the MIRIAM paper.

 I'm not sure what benefits one gains from letting people arbitrarily
 choose what they want to use to identify something with. For example,
 how to you work out if particular entities in one SBML model match
 entities in another SBML model?

 Also, given that most of these resources are controlled vocabularies,
 there is a lot of room for misunderstanding someone's intention when
 using their choices of identifiers.



 An annotation is formed of
 three parts:

 The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ...

 The identifier of the particular information, e.g. 123456789, GO:0001234 ...

 An optional qualifier that describe the relationship between the concept 
 represented by the model component and the concept represented by the 
 particular information.

 To help people implement that, we developed MIRIAM resources
 (http://www.ebi.ac.uk/compneur-srv/miriam/).

 If you download a model from BioModels DB in SBML (not in CellML at
 the moment, for obvious reasons highlighted by the current
 discussion), you will see something like:

 bqmodel:isDescribedBy
 rdf:Bag
 rdf:li rdf:resource=http://www.pubmed.gov/#8983160/
 /rdf:Bag
 /bqmodel:isDescribedBy

 But on the webpage, there is:

 bPublication ID:/bnbsp;a 
 href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrievedb=pubmeddopt=Abstractlist_uids=8983160;
  target=_blank8983160/a

 The URL is dynamically generated by MIRIAM webservices. I fact in the
 new version of BioModels DB, to be released in the fall, the URL does
 not point to PubMed anymore, but to the EBI extended Medline, more
 comprehensive. BUT the URI stored in the model is still the SAME.

 Similarly for a DOI:

 bqmodel:isDescribedBy
 rdf:Bag
 rdf:li rdf:resource=http://www.doi.org/#10.1063/1.1681288/
 /rdf:Bag
 /bqmodel:isDescribedBy

 is transformed in:

 bPublication ID:/bnbsp;a href=http://dx.doi.org/10.1063/1.1681288; 
 target=_blank10.1063/1.1681288.../a

 That system is very flexible. You can use any resource listed in
 MIRIAM resources, and this resource can be extended at will (note that
 we distribute XML version of the resource for local use). But it is
 still robust and expressive.

 Cheers,

 On Wed, 28 Mar 2007, Melanie Nelson wrote:

 Wow, I haven't posted to this list in a long time...
 But I feel compelled to give a little advice as
 someone who's spent a lot of time 

Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-03-29 Thread Matt
On 3/29/07, Nicolas Le Novere [EMAIL PROTECTED] wrote:
 On Thu, 29 Mar 2007, Matt  wrote:

  Can you explain in more detail or point to explanations of
  bqmodel:isDescribedBy?

 You can find some explanations at:

 http://www.ebi.ac.uk/compneur-srv/miriam-main/mdb?section=qualifiers


So  there is no simple way to determine if this is a reference to a
journal article except through interpreting the URI?



 Note tha qualifiers are optional to be MIRIAM-compliant. I personaly
 think we should always use some qualification, otherwise an annotation
 becomes very difficult to use except for jumping from webpage to
 webpage.

  Specifically:
  - what is its intended meaning?

 Cf above. Note that the list of qualifiers is by no mean frozen. We
 are already aware of several gaps (e.g. how do-we qualify the relation
 between a peptide and the gene that encodes it?)

  - when more than one of these is defined on a resource, how is this
  interpreted? For example: is there some precedence implied somehow?

 This is up to the tool using the qualifiers. SBML does not allow
 nested qualifications. There is only an implicit hasVersion if several
 identical qualifiers are present:

 bqmodel:isDescribedBy toto
 bqmodel:isDescribedBy tata

 means is described by toto and is described by tata. In other words
 toto or tata describe the component.

 NOT toto and tata are necessary to describe the component.

 On top of that, BioModels DB add some precedence
 http://www.ebi.ac.uk/compneur-srv/biomodels/doc/annotation.html

 But all that is not part of MIRIAM rules.

  - how do you determine the kind of reference it is - for example a
  pubmed uri? You have a datatype for vocab/database IDs in the
  annotation scheme you described, but I don't see this in the
  bqmodel:isDescribedBy examples.

 rdf:li rdf:resource=http://www.pubmed.gov/#8983160/

 http://www.pubmed.gov/   means the following identifier has to be 
 interpreted as pointing to a data of PubMed.

 http://www.pubmed.gov/ is unique and should not normally
 change. However, sometimes it may neverstheless change for various
 reasons: URI too confusing, badly choose, fusion of two resources
 etc. For instance, the old PubMed URI was
 http://www.ncbi.nlm.nih.gov/PubMed/
 It was misleading because tied to a particular physical resource at
 the NCBI.

 We have a deprecation system in place that allow to resolve the
 old URIs and provide the new ones.


  - how would you address auxiliary references as opposed to primary
  references so that a machine interpreting it can make the distinction?

 I am not sure I understand that. Like primary and secondary accessions of 
 UniProt?

For journal articles, or other publications, then being able to
identify the primary reference(s) is useful. For database records, it
would also be useful to label a group as being the most important (or
defining) set, and others as 'helpful'. It was why I suggested that
CellML bibliographic referencing seperated these two, and that the
latter would need to be bound to a reason (a natural language comment
would be fine) the described why that reference was made.


 
  snip
 
  I entirely agree with Melanie, people should be able to pick the
  resource they want, as far as they uniquely identify it. This is
  clearly described in the MIRIAM paper.
 
  I'm not sure what benefits one gains from letting people arbitrarily
  choose what they want to use to identify something with. For example,
  how to you work out if particular entities in one SBML model match
  entities in another SBML model?
 
  Also, given that most of these resources are controlled vocabularies,
  there is a lot of room for misunderstanding someone's intention when
  using their choices of identifiers.
 
 
 
  An annotation is formed of
  three parts:
 
  The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ...
 
  The identifier of the particular information, e.g. 123456789, GO:0001234 
  ...
 
  An optional qualifier that describe the relationship between the concept 
  represented by the model component and the concept represented by the 
  particular information.
 
  To help people implement that, we developed MIRIAM resources
  (http://www.ebi.ac.uk/compneur-srv/miriam/).
 
  If you download a model from BioModels DB in SBML (not in CellML at
  the moment, for obvious reasons highlighted by the current
  discussion), you will see something like:
 
  bqmodel:isDescribedBy
  rdf:Bag
  rdf:li rdf:resource=http://www.pubmed.gov/#8983160/
  /rdf:Bag
  /bqmodel:isDescribedBy
 
  But on the webpage, there is:
 
  bPublication ID:/bnbsp;a 
  href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrievedb=pubmeddopt=Abstractlist_uids=8983160;
   target=_blank8983160/a
 
  The URL is dynamically generated by MIRIAM webservices. I fact in the
  new version of BioModels DB, to be released in the fall, the URL does
  not point to PubMed anymore, but to the EBI extended Medline, more
  comprehensive. BUT the URI 

Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-03-28 Thread Matt
I don't think this is a good idea.

- I think bioentity should be depreciated, it has not intrinsic semantic value.
- If it is used currently, it should be left as its current minimum
specification which is to label and point to other bioinformatics
database IDs.
- The problem is not 'biologically related paper's' per se, but one of
identifying what was the primary publication or publications that
motivated a model.
- There is also the case where a single publication that contains a
mathematical model is the one and only primary source for the model
itself - a rather common case at the moment.

I would prefer that the primary publication(s) be identified as such,
which covers the case in where there are some models in the repository
built from general review papers of biology with no math.

I would prefer references to other related publications to be bound
explicitly to a comment in the model metadata - there should be a
reason identified by the author/editor/reviewer as to why there has
been such an association made.

As an aside, we also need to determine whether the bqs schema provides
enough detail to match publications across metadata instances for
different models, and whether we should be complimenting bibliographic
data with pubmed Ids and the like.

cheers
Matt




On 3/29/07, Andrew Miller [EMAIL PROTECTED] wrote:
 Hi,

 As discussed at the last CellML meeting, there are some models which
 reference both the paper about the model, and a reference about the
 biology. Since there is no way to determine between them, this creates
 problems for CellML metadata processing tools which want to identify the
 paper about the model (such as the CellML repository). However, it would
 still be a good thing to include references about the biology /
 experiments on which a model is based, as well as papers on underlying
 mathematical techniques (and perhaps earlier papers?)

 The CellML Metadata specification already describes a predicate
 cmeta:bio_entity, and another cmeta:math_problem. Although the cmeta
 specification suggests that these be used to provide references to
 identifiers for the biological entity a part of the model relates to,
 and likewise for the mathematical problem, it would also be possible to
 create a list of references inside the resource targeted by the
 bio_entity or math_problem predicate.

 I would therefore suggest that the following be considered best practice:
 1) Only refer to the paper about the model from the metadata for the model.
 2) Any other papers should be in another resource referred to from the
 bio_entity and math_problem entities.

 Does anyone else have any opinion on this?

 Best regards,
 Andrew

 ||
 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Biological and other non-model citations in CellML metadata?

2007-03-28 Thread Matt
Melanie!

Thanks for your thoughts. You are right about the mess mapping to
different ontologies and vocabs produces. We have been working on
trying to integrate explicitly with biopax (http://www.biopax.org/
states and generics proposal - level 2 was too limiting) in the hope
that other databases (like GO, reactome, KEGG, DIP, signalling
gateway, etc) get dragged along too. Some seem to be.

At the moment, I think the cleanup begins at the repository interface.
So long as we validate that the biological annotation is unambiguous,
complete, and represented in a way that agrees with one or more of an
accepted set of ontologies and vocabs, then I think we might be in a
position to perform useful queries.

We also have a person doing a PhD here now on the visualisation of
CellML models which relies solely on the biological annotation.  It's
a pretty good failure point to get nothing in your picture :-)

thanks again
cheers
Matt

On 3/29/07, Melanie Nelson [EMAIL PROTECTED] wrote:
 Wow, I haven't posted to this list in a long time...
 But I feel compelled to give a little advice as
 someone who's spent a lot of time integrating
 biological information and therefore has made a lot of
 mistakes!

 By all means, have a best practice encouraging people
 to use the GO cellular_component ontology to describe
 organelles and cells. You could probably also use the
 molecular_function ontology for proteins (although
 this will be messier). However, neither is likely to
 be a complete, i.e., there will be models that
 reference a biological entity not in the GO
 ontologies. Also, there will be cases where the entity
 the model references is most properly thought of as
 related in some way (e.g., a subset, a superset, or a
 sibling) to the GO entity. You can spend ages
 sorting this sort of thing out and coming up with
 consistent rules for handling all the relationships.


 Since you aren't really interested in sorting out this
 biological mess, you may want to consider letting
 people choose their own ontology and just reference
 it.
 An example of this practice is in the MIAME project:
 http://www.mged.org/Workgroups/MIAME/miame_1.1.html

 About the citations- my memory of this is fuzzy, but I
 think the original intent was that people should
 provide the PubMed ID where possible. However, not all
 journals are indexed in PubMed (for instance, there is
 a CellML paper published in one that is not), so the
 model needs to handle full citation info, too. The BQS
 model handles both, and then some, which is why we
 chose it.

 Hope this is helpful,
 Melanie


 --- Andrew Miller [EMAIL PROTECTED] wrote:

  Matt wrote:
   I don't think this is a good idea.
  
   - I think bioentity should be depreciated, it has
  not intrinsic semantic value.
  
  It does, unfortunately, seem to usually target a
  literal node at the
  moment. It would be nice for this to at least be a
  resource, which could
  provide further information about the biological
  entity (or if we decide
  not to do that, at least a resource, with a
  dictionary and a process for
  adding new words to the dictionary to avoid
  duplication).
 
  It seems that GO(Gene Ontology) has terms for cell
  types, biological
  compartments, and so on, which would offer a better
  way to provide this
  information.
 
  I still think that this metadata is useful, even if
  the automated
  interpretation of it is currently difficult.
   - If it is used currently, it should be left as
  its current minimum
   specification which is to label and point to other
  bioinformatics
   database IDs.
  
  There are three layers of information here:
  Layer 1: What biological entity are we describing?
  (could be answered
  with a GO term).
  Layer 2: What information about that biological
  entity are we using?
  (could be answered with a reference to a paper, and
  perhaps even a
  reference to raw experimental data).
  Layer 3: How was that information translated into a
  model (could be
  answered with a reference to a paper on the model).
 
  Layer 3 is clearly information about the model, and
  should be described
  by as an arc of the model resource.
  Layer 1 is described by a literal at the moment.
 
  Layer 2 is therefore a gap, which we don't have any
  proper way to
  represent now.
   - The problem is not 'biologically related
  paper's' per se, but one of
   identifying what was the primary publication or
  publications that
   motivated a model.
  
  The publication which motivated the expression of a
  model in CellML, or
  the publication which motivated the creation of the
  model? Most of the
  models in the repository were motivated by a paper
  about a model which
  was not initially expressed in CellML. However, the
  way that the
  metadata specification works now is that the paper
  which describes the
  model (not the paper which motivated it) is
  referenced from the
  information about the model (not information about
  the CellML file).
   - There is