I think it would depend on the scale and the domain you're aiming for. If 
you're trying to represent ALL people in the world, then few solutions would 
scale (at least until there's a universal personal ID or URI). But you could 
construct a (mostly) unique ID, and prefix that with a base URI for your 
system. For instance, you could use a base URI + email addresses (e.g., 
http://my.base.uri/John.Doe_AT_gmail.com). You'd have to ensure your data 
couldn't be accessed by spammers in this case, of course. Facebook or Twiiter 
home pages (or both) might also work.

While it might not link to a canonical representation, the simple fact is that 
there is no global canonical representation for individuals, much less a global 
registry.

In my case, my domains are either for internal use (employees of a company), 
major public figures (politicians, etc.), or individuals in a particular 
business (e.g., medical). For the employees, the employee ID can provide a 
unique representation. Public figures is relatively easy - use the URL for 
their Wikipedia page. Particular businesses might have some sort of internal ID 
(this varies, of course).

-----Original Message-----
From: David Moss [mailto:[email protected]] 
Sent: Wednesday, August 14, 2013 7:03 PM
To: [email protected]
Subject: Naming entities

This is a fairly basic question, but how do others go about naming entities in 
an RDF graph?

The semantic web evangelists are keen on URIs that mean something ie 
<http://admoss.info/David_Moss>.

This sounds great but in practice it doesn't scale. There are many people named 
David Moss in the world.

It is possible to have URIs such as <http://admoss.info/David_Moss1> 
<http://admoss.info/David_Moss2> ... <http://admoss.info/David_Moss249>, but 
differentiating between them is not a human readable task. It also becomes 
problematic in tracking the highest number of each entity name so additions can 
be made to the graph.

I first tried using blank nodes as entity identifiers but they are no good for 
the purpose as searching is difficult and they are not supposed to be used 
outside the environment in which they are created. They are supposed to be 
internal only references for convenience of the machine. They are also the 
antithesis of human readable.

I currently maintainable next_id entity in my graph and use and update its 
value to obtain entity names, ending up with <http://admoss.info/person22>, 
<http://admoss.info/organisation23> and <http://admoss.info/Building24> etc.

This is not exactly human readable, but I can't think of any naming policy that 
maintains the dream of human readable identifiers yet scales.

How are others addressing this issue?




Reply via email to