I notice lines in the dbpedia dumps that look like

<http://dbpedia.org/resource/Boston%2C_MA> 
<http://dbpedia.org/property/redirect> 
<http://dbpedia.org/resource/Boston> .

     Note the URL encoded %2C=",".

     Anyhow,  if I go to

http://dbpedia.org/page/Boston%2C_MA

     I see two redirects [one of which unescapes the comma] and 
ultimately end up at

http://dbpedia.org/page/Boston

     If I go to Wikipedia

http://wikipedia.org/page/Boston%2C_MA

     I get redirected to

http://wikipedia.org/page/Boston,_MA

     which,  oddly,  displays the same content as "Boston" [rather than 
301 redirecting...]

     When I do

  curl -H "Accept: application/rdf+xml" http://dbpedia.org/data/Boston.xml

      I see stuff like

<rdf:Description 
rdf:about="http://dbpedia.org/resource/Harvey_Mason%2C_Jr.";><dbpedia-owl:birthPlace
 
xmlns:dbpedia-owl="http://dbpedia.org/ontology/"; 
rdf:resource="http://dbpedia.org/resource/Boston"/></rdf:Description>

     Now If I run the SPARQL query

select ?Predicate where {<http://dbpedia.org/resource/Harvey_Mason,_Jr.> 
?Predicate <http://dbpedia.org/resource/Boston> }

     I get nothing,  but if I run

select ?Predicate where 
{<http://dbpedia.org/resource/Harvey_Mason%2C_Jr.> ?Predicate 
<http://dbpedia.org/resource/Boston> }

     I get

http://dbpedia.org/ontology/birthPlace

     So it looks like the %-encoded URI is the "real URI" in dbpedia.  
Obviously I ought to keep it around in case I want to run a SPARQL query 
now and then.  Also,  dbpedia encodes wikipedia this way as well,

<http://en.wikipedia.org/wiki/Harvey_Mason%2C_Jr.> 
<http://xmlns.com/foaf/0.1/primaryTopic> 
<http://dbpedia.org/resource/Harvey_Mason%2C_Jr.> .

------

I took a look at some standards docs and found:

http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference

I see that we encode UTF-8 text as octets,  and if the octets aren't 
US-ASCII characters,  I wed %-encode them.  However,  the spec also says 
that

*"Note:* Because of the risk of confusion between RDF URI references 
that would be equivalent if derefenced, the use of %-escaped characters 
in RDF URI references is strongly discouraged. "

------

Now the problem I've got with the Ookaboo API is that I know people are 
going to punch in

http://wikipedia.org/page/Boston,_MA

and I need to turn this into the right dbpedia URL.  My plan for dealing 
with this is to

(i) store the exact URI I get out of dbpedia,
(ii) always give people the exact URI out of dbpedia (if I publish RDFa 
or JSON data),
(iii) give the same URI for wikipedia that dbpedia gives (in HTML,  
RDFa,  etc.)
(iv) if I get a query,  apply the same canonicalization rules that 
dbpedia uses...

Which begs the question of what exactly those rules are.  What are they?


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to