Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)

Patrick Durusau Tue, 23 Aug 2011 06:50:10 -0700

John

On 8/23/2011 9:05 AM, John Erickson wrote:

This is an important discussion that (I believe) foreshadows how
canonical identifiers are managed moving forward.


Both CAS and DUNS numbers are a good example. Consider the challenge
of linking EPA data; it's easy to create a list of toxic chemicals
that are common across many EPA datasets. Based on those chemical
names, its possible to further find (in most cases) references in
DBPedia and other sources, such as PubChem:

* ACETALDEHYDE
* http://dbpedia.org/page/Acetaldehyde
* http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=177
* etc...

Now, add to this a sensible agency-rooted URI design and a
DBPedia-like infrastructure and one has a very powerful hub that
strengthens the Linked Data ecosystem. It would arguably be stronger
if CAS identifiers were also (somehow) included, but even the bits of
linking shown above change the value proposition of traditional
proprietary naming schemes...

Quite so and I did not mean to imply otherwise. Yes, gatheringgovernment agency URI identifiers for toxic chemicals is a value-addproposition.

I am curious if you find that different offices within agencies use thesame URIs? Or did they have other identifiers in their records prior tothe URIs?

That is will the URIs map to the identifiers used in EPA datasets forexample?

Despite its obvious value, I don't agree that the project "change[s] thevalue proposition of traditional proprietary naming schemes..."

Mostly because it does not address the *prior* use of other identifiersin the published literature. However convenient it may be to pretendthat we are starting off fresh, in fact we are not, in any informationsystem.

The fact remains that even if we switched (miraculously) today to allnew URI identifiers, we will be accessing literature using prioridentifiers for a very long time. I suspect hundreds of years.

BTW, who bridges between the new URI schemes and the CAS identifiers?For searching traditional literature?

John
PS: At TWC we are about to go live with a registry called "Instance
Hub" that will demonstrate the association of agency-based URI schemes
--- think EPA, HHS, DOE, USDA, etc --- with instance data over which
the agency has some authority or interest...More very soon!

Looking forward to it!

Hope you are having a great day!

Patrick


On Tue, Aug 23, 2011 at 8:31 AM, Patrick Durusau<[email protected]>  wrote:

David,

On 8/22/2011 9:55 PM, David Booth wrote:

On Mon, 2011-08-22 at 20:27 -0400, Patrick Durusau wrote:
[ . . . ]

The use of CAS identifiers supports searching across vast domains of
*existing* literature. Not all, but most of it for the last 60 or so
years.

That is non-trivial and should not be lightly discarded.

BTW, your objection is that "non-licensed systems" cannot use CAS
identifiers? Are these commercial systems that are charging their
customers? Why would you think such systems should be able to take
information created by others?

Using the information associated with an identifier is one thing; using
the identifier itself is another.  I'm sure the CAS numbers have added
non-trivial value that should not be ignored.  But their business model
needs to change.  It is ludicrous in this web era to prohibit the use of
the identifiers themselves.

If there is one principle we have learned from the web, it is enormous
value and importance of freely usable universal identifiers.  URIs rule!
http://urisrule.org/

:)

Well, I won't take the bait on URIs, ;-), but will note that re-use of
identifiers of a sort was addressed quite a few years ago.

See: Feist Publications, Inc., v. Rural Telephone Service Co., 499 U.S. 340
(1991) or follow this link:

http://en.wikipedia.org/wiki/Feist_v._Rural

The circumstances with CAS numbers is slightly different because to get
access to the full set of CAS numbers I suspect you have to sign a licensing
agreement on re-use, which makes it a matter of *contract* law and not
copyright.

Perhaps they should increase the limits beyond 10,000 identifiers but the
only people who want the whole monty as it were are potential commercial
competitors.

The people who publish the periodical "Brain" for example at $10,000 a year.
Why should I want the complete set of identifiers to be freely available to
help them?

Personally I think given the head start that the CAS maintainers have on the
literature, etc., that different models for use of the identifiers might
suit their purposes just as well. Universal identifiers change over time and
my concern is with the least semantic friction and not as much with how we
get there.

Hope you are having a great day!

Patrick




--
Patrick Durusau
[email protected]
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau


--
Patrick Durusau
[email protected]
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau

Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)

Reply via email to