Chris,

I'll try to answer some of the questions.

On 28 Jul 2008, at 13:18, Chris Wallace wrote:

Thanks John for this resource - It inspires me to help my students to do a similar data collection exercise in Bristol!

A few things puzzle me though, probably as a newcomer to this field. I'm in the process of RDFing our faculty data so these issues are taxing me too.

1) The resource URI eg. http://www.johngoodwin.me.uk/pubs/id/pub1

is not humanly readable. Is this considered to be a problem? For example DBPedia would be I think be less valuable with system- generated resource ids, even though natural resource ids require a mechanism for disambiguation.

Human-readable unique identifiers are nice, but the exception. It's true that DBpedia would be less valuable without the human-readable IDs, but DBpedia piggy-banks on Wikipedia's identifier scheme, which is maintained by an army of volunteers. At the end of the day, uniqueness is more important than human-readable. If the unique identifiers in your original data source are not human-readable, and you don't have the resources to curate a new identifier scheme, then using a numeric scheme is better than not publishing the data at all...

2) The pub name has been re-formatting to catalogue order, but pub names are proper nouns and I'd be laughed at if I asked the way to "Alexandra, The". Perhaps both forms could be included with a different tag for the catalog format if it is not computable from the natural name.

I don't see why pub names are different from movie names, artist names, or book names, all of which can often be found reformatted in this way.

3) Why have both rdfs:label and pub:name since they seem to have the same content?

Generic RDF tools (which do not know about the pub vocabulary) often use rdfs:label for display/headline purposes. So if your domain- specific vocabular has its own vocabulary, it might be a good idea to add both. In an ideal world, John would declare pub:name a subproperty of rdfs:label, and the tools would infer the rdfs:label value... But most clients don't do that yet.

4) I feel uncomfortable with the non-uniform representation of the address - partly with domain specific-tags pub:street and pub:postcode, partly with a company-specific (and non-humanly decipherable) URI. I know that this is a can of worms e.g.http://xml.coverpages.org/namesAndAddresses.html#eccma and I can’t find a suitable address vocabulary but this mixture doesn’t look very satisfactory.

If only we could finally agree on *one* vCard-in-RDF vocabulary...

5) pub:dateSurveyed: isn’t this just the date at which the description was authored (if not when it was entered into this format) i.e. dc:date

dc:date could mean many things: when the pub was surveyed, when the RDF document was published, when the pub was opened... Using pub:dateSurveyed makes the meaning clear to the user of the data.

Best,
Richard



6) Generally , these seem such general properties of any place that I'm surprised that any local vocabulary is needed at all, given that no data is actually domain specific (like a list of beers served).

This case study seems a great example of the issues in vocabulary and resource reuse. It would be interesting to compare the different solutions which different analysts would use to represent this data. Perhaps something like it would be a good exercise for the Oxford VoCamp?

Chris


Chris Wallace
Senior Lecturer
Department of Information Science and Digital Media
University of the West of England, Bristol



This email was independently scanned for viruses by McAfee anti- virus software and none were found


Reply via email to