On 25 Mar 2009, at 10:41, Phillip Lord wrote:
"Michel_Dumontier" <[email protected]> writes:
And I'm trying to explain that there is no pragmatic reason to make
explicit the distinction between a biomolecule (and what we know
about
it) and a database record (and what we know about the biomolecule)
unless they are actually different. It just complicates things in a
wholly unnecessary way.
I've given a clear example. Where two databases exist, with two
records,
which appear to be referring to the same (class of) molecules.
[snip]
This is the key example.
But there's the other key example, where one record exists which
appear to be referring to multiple entities (either by ambiguity or
by composition). This is a generalization of your point about ill
definedness of the very idea of a gene.
To paraphase you (I think), introducing a resource in the latter case
takes you from 1 mapping problem to 2 mapping problems.
This is why the the Boothian line is quite naive. If it's just the
case that you have 1 (or more) records and a clear relationship
better the record(s) and the object described by the record, then it
may (or may not!, by often will) make sense to distinguish them and
name each, esp. for the purpose of entity reconciliation, record
reconciliation, entity exploration, etc.
However, if you are forced to do so without a clear purpose, then you
just add more noise to the overall system. You are likely to make
brute errors and you are likely to make choices that conflict with
those motivated by different applications.
This is why clear empirical data is important. It's perfectly
possible to do harm (in aggregate) by following a rule intended to
produce good.
Cheers,
Bijan.