By all means - versioning is crucial - and all knowledge maps/
association files/annotations referencing nodes in an ontology MUST
include the version number.
For an example of how biomed. ontology curators deal with the issue
of versioning, see the Gene Ontology Consortium web site pages
describing their SOP on this issue:
http://www.geneontology.org/GO.usage.shtml#obsoletions
see "Obsoleting Terms" and "Merges, Splits, Movements"
All of the OBO Foundary ontologies are set up in a source control
system, have an official "release" policy, and associated mailing-
lists to request changes/corrections and announce new releases.
Generally as is the case with evolving software API & format specs,
new versions are backward compatible - e.g., annotations citing terms/
concepts/entities/nodes as they existed in a previous graph, can be
resolved in the more recent versions. However, since we are talking
about "theories of reality" here - and as many have pointed out, our
descriptions of reality evolve in often non-monotonic ways, the
mapping across versions from a node in one version to the
"equivalent" node(s) in other versions may be a far from trivial
process. Sometimes the mappings can be giving using DL rules, others
simply require a deterministic look-up table.
The curation process of deciding how to "migrate" nodes as changes/
corrections are required can be quite complex, as can be seen when
you review what GO curators are required to do to keep knowledge maps/
association files current when they originally referenced nodes in
prior versions of GO (see refs above).
I realize this may sound hideously complex, very labor intensive, and
"fragile", but the process actually works.
Here, too, I think its important to remember what the original
requirements are for a given knowledge resource. In the case of GO,
the core curation process has focussed on mapping occurrences of
specific biomolecular and subcellular entities as they occur in the
literature. A significant portion of the GO curation process still
revolves around explicitly tracking entity occurrences in the
literature.
Of course, a whole slew of powerful tools and valuable research has
grown around GO - especially as its formal specificity has improved
over the last 5 years or so, many of which are designed to use GO to
organize/pool/analyze primary research data, as opposed to focusing
on it's "representation" in the literature. I think this is where
ontology practicing is most likely to provide the greatest benefit in
the coming decade - as applied to primary data repositories. It is
here, too, where Semantic Web technologies are most likely to
relevant and provide a powerful, flexible formalism for representing
semantic info associated with scientific observations - with explicit
links to various knowledge resources across the formal semantic
spectrum (from flat term lists through, thorough, computable and
relatively complete theories of reality).
The following two threads of activity in biomedical KR are important
to understand as related, yet distinct threads of activity:
1) KR applied to existing descriptions of research data:
From repositories of primary data such as GENBANK and GEO on
through the highly reduced representations found in the STM
literature. Analysis of the semantic and lexical content of the
later have been going on since the 1940s & 1950s (at least in the
info/library science fields) and more recently (since the 1960s) in
the converging C.S./Linguistics fields (e.g., Comp. Linguistics and
Info Retrieval). Only in the last 15 years have ontologies played any
significant role in these pursuits. TextPresso - the text mining
framework recommended by the model organism database consoritum
(http://www.gmod.org/home & http://www.textpresso.org/) is a good
example of this approach coming from the bioinformatics community,
but there are other examples using much more powerful Comp.
Linguistic techniques.
2) Use in creating NEW descriptions of primary data:
Here, ontologies along with SW tech and other KR tools (such as the
Topic Maps Reference Model (TMRM) Jack Park and his colleagues at SRI
are working on) and C.S. techniques for federating inter-related
data repositories can be combined to transform our ability to compute
across large swarths of data. In this case, the first digital
representation of research data derives from a formally sound,
computable framework. It is this latter approach, combined with the
armamentarium of informatics tools accumulated over the last 30 years
from various fields, that will bring the bulk of biomedical
researchers forward from the still 19th approach to forcing all
contributions to the evolving biomed knowledge base to pass through a
human brain for knowledge extraction to one where human cognitive
capacity is truly being augmented via automation (in the sense
espoused by Doug Englebart and Vanevar Bush) and all new scientific
descriptions can be automatically analyzed in the context of all
relevant prior knowledge. I consider this transformation very much
like the that has taken place over the last 30 years to augment our
tools for observation (automated, high throughput sequencing;
molecular imaging and all forms of microscopy; microarrays; etc.)
I think some of the disagreement/confusion on the topic of the
accuracy and effectiveness of biomedical ontologies derives from
collapsing these two approaches to KR, which though highly inter-
related, bring with them distinct approaches, limits, caveats, and
capabilities.
Just my $0.02.
Cheers,
Bill
On Jun 13, 2006, at 10:00 PM, kc28 wrote:
This brings up an interesting issue -- how ontological evolution
would impact mapping or integration of overlapping ontologies. I
believe it's quite a research challenge. We might need to
incorporate the notion of versioning into the ontological
structure. For example, what versions of the protein classes/
instances can be mapped between two ontologies. Just my two-cent
thought.
Cheers,
-Kei
John Rumble wrote:
An unwritten rule about higher level ontologies is that they
reflect our knowledge today, not tomorrow. As knowledge evolves,
the upper level ontologies, especially, must also evolve. The
example of the concept "protein" is very apropos here. We can view
it from functional, structural, integrative angles, and I am sure
there are a bunch more. Then think about how our "concept" of a
protein in each of those views has evolved over the last 10 years,
20 years, 75 years. The problem is evident.
At whatever level an ontology is developed, someone smarter or
with more insight or standing on the shoulder of giants will use
that onotlogy as a building block for a new and better higher
level view of nature. We have not reached the end of science yet.
In my days of leading similar standards developments, some of the
best progress we made was when we banned discussions of (1) higher-
level ontologies (though we called them something else back in
those old days) and (2) acronyms.
For those of you who have requested more references on my
previous e-mail about experiment description, it will have to wait
a few more days. Unfortunately bioinformatics have not solved my
kidney stone issues, which severely limit my ability to pull the
requested information together.
John
Dr. John Rumble
Technical Director
Information International Associates
Oak Ridge TN
www.infointl.com <http://www.infointl.com>
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
301 963 7903 (Home Office)
301 502 5729 (Cell)
865 298 1251 (Oak Ridge Office)
Bill Bug
Senior Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)
Please Note: I now have a new email - [EMAIL PROTECTED]
This email and any accompanying attachments are confidential.
This information is intended solely for the use of the individual
to whom it is addressed. Any review, disclosure, copying,
distribution, or use of this email communication by others is strictly
prohibited. If you are not the intended recipient please notify us
immediately by returning this message to the sender and delete
all copies. Thank you for your cooperation.