Chris,
I'll try to answer some of the questions.
On 28 Jul 2008, at 13:18, Chris Wallace wrote:
Thanks John for this resource - It inspires me to help my students
to do a similar data collection exercise in Bristol!
A few things puzzle me though, probably as a newcomer to this field.
I'm in the process of RDFing our faculty data so these issues are
taxing me too.
1) The resource URI eg. http://www.johngoodwin.me.uk/pubs/id/pub1
is not humanly readable. Is this considered to be a problem? For
example DBPedia would be I think be less valuable with system-
generated resource ids, even though natural resource ids require a
mechanism for disambiguation.
Human-readable unique identifiers are nice, but the exception. It's
true that DBpedia would be less valuable without the human-readable
IDs, but DBpedia piggy-banks on Wikipedia's identifier scheme, which
is maintained by an army of volunteers. At the end of the day,
uniqueness is more important than human-readable. If the unique
identifiers in your original data source are not human-readable, and
you don't have the resources to curate a new identifier scheme, then
using a numeric scheme is better than not publishing the data at all...
2) The pub name has been re-formatting to catalogue order, but pub
names are proper nouns and I'd be laughed at if I asked the way to
"Alexandra, The". Perhaps both forms could be included with a
different tag for the catalog format if it is not computable from
the natural name.
I don't see why pub names are different from movie names, artist
names, or book names, all of which can often be found reformatted in
this way.
3) Why have both rdfs:label and pub:name since they seem to have
the same content?
Generic RDF tools (which do not know about the pub vocabulary) often
use rdfs:label for display/headline purposes. So if your domain-
specific vocabular has its own vocabulary, it might be a good idea to
add both. In an ideal world, John would declare pub:name a subproperty
of rdfs:label, and the tools would infer the rdfs:label value... But
most clients don't do that yet.
4) I feel uncomfortable with the non-uniform representation of the
address - partly with domain specific-tags pub:street and
pub:postcode, partly with a company-specific (and non-humanly
decipherable) URI. I know that this is a can of worms e.g.http://xml.coverpages.org/namesAndAddresses.html#eccma
and I can’t find a suitable address vocabulary but this mixture
doesn’t look very satisfactory.
If only we could finally agree on *one* vCard-in-RDF vocabulary...
5) pub:dateSurveyed: isn’t this just the date at which the
description was authored (if not when it was entered into this
format) i.e. dc:date
dc:date could mean many things: when the pub was surveyed, when the
RDF document was published, when the pub was opened... Using
pub:dateSurveyed makes the meaning clear to the user of the data.
Best,
Richard
6) Generally , these seem such general properties of any place that
I'm surprised that any local vocabulary is needed at all, given that
no data is actually domain specific (like a list of beers served).
This case study seems a great example of the issues in vocabulary
and resource reuse. It would be interesting to compare the different
solutions which different analysts would use to represent this
data. Perhaps something like it would be a good exercise for the
Oxford VoCamp?
Chris
Chris Wallace
Senior Lecturer
Department of Information Science and Digital Media
University of the West of England, Bristol
This email was independently scanned for viruses by McAfee anti-
virus software and none were found