Re: scientific publishing task force update

William Bug Wed, 14 Jun 2006 08:04:48 -0700

By all means - versioning is crucial - and all knowledge maps/association files/annotations referencing nodes in an ontology MUSTinclude the version number.

For an example of how biomed. ontology curators deal with the issueof versioning, see the Gene Ontology Consortium web site pagesdescribing their SOP on this issue:


http://www.geneontology.org/GO.usage.shtml#obsoletions
        see "Obsoleting Terms" and "Merges, Splits, Movements"

All of the OBO Foundary ontologies are set up in a source controlsystem, have an official "release" policy, and associated mailing-lists to request changes/corrections and announce new releases.

Generally as is the case with evolving software API & format specs,new versions are backward compatible - e.g., annotations citing terms/concepts/entities/nodes as they existed in a previous graph, can beresolved in the more recent versions. However, since we are talkingabout "theories of reality" here - and as many have pointed out, ourdescriptions of reality evolve in often non-monotonic ways, themapping across versions from a node in one version to the"equivalent" node(s) in other versions may be a far from trivialprocess. Sometimes the mappings can be giving using DL rules, otherssimply require a deterministic look-up table.

The curation process of deciding how to "migrate" nodes as changes/corrections are required can be quite complex, as can be seen whenyou review what GO curators are required to do to keep knowledge maps/association files current when they originally referenced nodes inprior versions of GO (see refs above).

I realize this may sound hideously complex, very labor intensive, and"fragile", but the process actually works.

Here, too, I think its important to remember what the originalrequirements are for a given knowledge resource. In the case of GO,the core curation process has focussed on mapping occurrences ofspecific biomolecular and subcellular entities as they occur in theliterature. A significant portion of the GO curation process stillrevolves around explicitly tracking entity occurrences in theliterature.

Of course, a whole slew of powerful tools and valuable research hasgrown around GO - especially as its formal specificity has improvedover the last 5 years or so, many of which are designed to use GO toorganize/pool/analyze primary research data, as opposed to focusingon it's "representation" in the literature. I think this is whereontology practicing is most likely to provide the greatest benefit inthe coming decade - as applied to primary data repositories. It ishere, too, where Semantic Web technologies are most likely torelevant and provide a powerful, flexible formalism for representingsemantic info associated with scientific observations - with explicitlinks to various knowledge resources across the formal semanticspectrum (from flat term lists through, thorough, computable andrelatively complete theories of reality).

The following two threads of activity in biomedical KR are importantto understand as related, yet distinct threads of activity:


        1) KR applied to existing descriptions of research data:

From repositories of primary data such as GENBANK and GEO onthrough the highly reduced representations found in the STMliterature. Analysis of the semantic and lexical content of thelater have been going on since the 1940s & 1950s (at least in theinfo/library science fields) and more recently (since the 1960s) inthe converging C.S./Linguistics fields (e.g., Comp. Linguistics andInfo Retrieval). Only in the last 15 years have ontologies played anysignificant role in these pursuits. TextPresso - the text miningframework recommended by the model organism database consoritum(http://www.gmod.org/home & http://www.textpresso.org/) is a goodexample of this approach coming from the bioinformatics community,but there are other examples using much more powerful Comp.Linguistic techniques.


        2) Use in creating NEW descriptions of primary data:

Here, ontologies along with SW tech and other KR tools (such as theTopic Maps Reference Model (TMRM) Jack Park and his colleagues at SRIare working on) and C.S. techniques for federating inter-relateddata repositories can be combined to transform our ability to computeacross large swarths of data. In this case, the first digitalrepresentation of research data derives from a formally sound,computable framework. It is this latter approach, combined with thearmamentarium of informatics tools accumulated over the last 30 yearsfrom various fields, that will bring the bulk of biomedicalresearchers forward from the still 19th approach to forcing allcontributions to the evolving biomed knowledge base to pass through ahuman brain for knowledge extraction to one where human cognitivecapacity is truly being augmented via automation (in the senseespoused by Doug Englebart and Vanevar Bush) and all new scientificdescriptions can be automatically analyzed in the context of allrelevant prior knowledge. I consider this transformation very muchlike the that has taken place over the last 30 years to augment ourtools for observation (automated, high throughput sequencing;molecular imaging and all forms of microscopy; microarrays; etc.)

I think some of the disagreement/confusion on the topic of theaccuracy and effectiveness of biomedical ontologies derives fromcollapsing these two approaches to KR, which though highly inter-related, bring with them distinct approaches, limits, caveats, andcapabilities.


Just my $0.02.

Cheers,
Bill

On Jun 13, 2006, at 10:00 PM, kc28 wrote:

This brings up an interesting issue -- how ontological evolutionwould impact mapping or integration of overlapping ontologies. Ibelieve it's quite a research challenge. We might need toincorporate the notion of versioning into the ontologicalstructure. For example, what versions of the protein classes/instances can be mapped between two ontologies. Just my two-centthought.
Cheers,

-Kei

John Rumble wrote:
An unwritten rule about higher level ontologies is that theyreflect our knowledge today, not tomorrow. As knowledge evolves,the upper level ontologies, especially, must also evolve. Theexample of the concept "protein" is very apropos here. We can viewit from functional, structural, integrative angles, and I am surethere are a bunch more. Then think about how our "concept" of aprotein in each of those views has evolved over the last 10 years,20 years, 75 years. The problem is evident.At whatever level an ontology is developed, someone smarter orwith more insight or standing on the shoulder of giants will usethat onotlogy as a building block for a new and better higherlevel view of nature. We have not reached the end of science yet.In my days of leading similar standards developments, some of thebest progress we made was when we banned discussions of (1) higher-level ontologies (though we called them something else back inthose old days) and (2) acronyms.For those of you who have requested more references on myprevious e-mail about experiment description, it will have to waita few more days. Unfortunately bioinformatics have not solved mykidney stone issues, which severely limit my ability to pull therequested information together.
 John
 Dr. John Rumble
Technical Director
Information International Associates
Oak Ridge TN
www.infointl.com <http://www.infointl.com>
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
301 963 7903 (Home Office)
301 502 5729 (Cell)
865 298 1251 (Oak Ridge Office)


Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - [EMAIL PROTECTED]

This email and any accompanying attachments are confidential.This information is intended solely for the use of the individualto whom it is addressed. Any review, disclosure, copying,distribution, or use of this email communication by others is strictlyprohibited. If you are not the intended recipient please notify usimmediately by returning this message to the sender and deleteall copies. Thank you for your cooperation.

Re: scientific publishing task force update

Reply via email to