RE: My task from last week: Semantic free identifiers

Michel_Dumontier Mon, 20 Jun 2011 18:38:27 -0700


From: Sivaram Arabandi, MD [mailto:[email protected]]
Sent: Monday, June 20, 2011 8:42 PM
To: Andrea Splendiani
Cc: Michel_Dumontier; Chime Ogbuji; andrea splendiani (RRes-Roth); 
Vagnoni,Matthew M; James Malone; HCLS
Subject: Re: My task from last week: Semantic free identifiers


On Jun 20, 2011, at 7:47 PM, Andrea Splendiani wrote:


Let's be precise.
I think everybody here would agree that to have opaque unique identifier is a 
sensible policy for individuals more often than not.
To keep the analogy with relational databases, the issue is not whether ID 
should be opaque or not, but whether Table and Column names should be opaque or 
not.


I think we're in complete agreement here - and that's why we specify the human 
readable label using rdfs:label (and assign the language tag, if desired).

m.

+1
very well put.


Thinking about ontology terms, there are good reasons for which these should be 
"codes", rather than definitions in a word (that is, to at least avoid the 
temptations). Whether they 'should' be codes, is in the tradeoff area. I guess 
it depends on domains. It makes sense for OBO, less for DBPedia.
What I was originally finding a bit too much is that 'everything' should 
necessarily have an opaque id. We can live with rdf:type, perhaps obo:partOf 
and so on...

ciao,
Andrea




Il giorno 20/giu/2011, alle ore 23.38, Michel_Dumontier ha scritto:


It's exactly the same reason why we have tables with incremental primary keys 
or have social security numbers for people and ISBN's for books.  The 
identifier is meant to identify one thing, and should not clash with other 
things having similar or exact names. What that thing is, is up to you. But you 
don't need a fancy algorithm to generate them so that you ensure uniqueness.  
In creating RDF data (for Bio2RDF), we're often put in the position of having 
to create unique identifiers (so as to avoid unreliable blank nodes), and we 
sometimes have no other alternative but to hash 3-8 values to get that (and to 
ensure we'll generate the same identifier in the future).  Having a guaranteed 
primary key is definitely good for change management.

However, if you're quite sure that your system will never generate the same 
identifier (EVER EVER EVER) for another entity, then go ahead and use labels in 
your URIs.  But if you expect some churn in the meantime (as will happen with 
domain ontologies - see 'Protein' for BioPAX as an example), then you may want 
to investigate a more principled approach. There are many cases in SIO where I 
changed the label - to be more accurate wrt to the definition or just to 
conform to a new label syntax. Had I linked the label to the identifier, this 
would cause some cognitive dissonance, and be a pain for users to update.

m.

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Sivaram Arabandi, MD
Sent: Monday, June 20, 2011 3:56 PM
To: Chime Ogbuji
Cc: Andrea Splendiani; Vagnoni,Matthew M; James Malone; HCLS
Subject: Re: My task from last week: Semantic free identifiers

I couldn't "agree" more with Andrea and Chime on this one. And would like to 
see some good reason(s) for us to continue to be burdened by them.
The standard answer - 'tooling can help in managing the readability aspects' 
has been heard several times, and yet everyone seems to pass around 'raw RDF or 
SPARQL snippets with readable URIs' - for sure these will be absolutely 
unreadable if we were to use totally opaque identifiers.

I recently had a discussion on this topic with Michel (during Semtech) and this 
exact line of thinking that Mark alluded to in his email came up:
          "though I guess, for them, "partOf" *is* opaque... so...??  Perhaps 
that argument is somewhat spurious??"

--Sivaram
____________________________
Sivaram Arabandi, MD, MS
Ph:  216.374.2883

http://ontolog.cim3.net/cgi-bin/wiki.pl?SivaramArabandi
http://www..linkedin.com/pub/sivaram-arabandi/1/9ab/92a<http://www.linkedin.com/pub/sivaram-arabandi/1/9ab/92a>



On Jun 20, 2011, at 3:34 PM, Chime Ogbuji wrote:



On Monday, June 20, 2011 at 3:08 PM, Andrea Splendiani wrote:


Hi,
sorry to jump on this thread like this...

To be honest, I'm kind of concerned by the insistence on semantic-opaque
identifiers.
I am as well and I have been for some time.
I understand the reason for them,

Actually, I would be interested in hearing the reason for them enumerated, 
because I have had a hard time imagining what could possibly offset the 
(significant) impact on readability that it has on biomedical ontologies.  The 
barrier is already high for non-logicians and non-semantic web aficionados to 
use biomedical ontologies.  Why set it any higher?

-- Chime


________________________________

No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com/>
Version: 10.0.1382 / Virus Database: 1513/3715 - Release Date: 06/20/11

Andrea Splendiani
Senior Bioinformatics Scientist
Centre for Mathematical and Computational Biology
+44(0)1582 763133 ext 2004
[email protected]<mailto:[email protected]>




________________________________

No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com>
Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11

RE: My task from last week: Semantic free identifiers

Reply via email to