I'm preparing course material about querying DBpedia from a web page using
Firefox and Greasemonkey, unpacking the payload received and patching the
information into a web page. My sample SPARQL query is for the state flowers
of states of the United States, a query that is listed on the Meow meow meow
blog at
http://www.craigethomas.com/blog/2009/02/anatomy-of-a-sparql-query-part-1-select/
Strategies for unpacking the payload are complicated by unpredictable
structural irregularities of the payload. I was wondering if someone could
suggest an explanation, or point out explanatory documentation that I could
provide my students.
Most of the states have a predictable XML payload that is structured like this:
<result>
<binding name="state">
<uri>http://dbpedia.org/resource/Mississippi</uri>
</binding>
<binding name="flower">
<uri>http://dbpedia.org/resource/Magnolia_Blossom</uri>
</binding>
</result>
But West Virginia's state flower is structured as a literal with an embedded
HTML tag:
<literal xml:lang="en">Rhododendron<br>(''Rhododendron
maximum'')</literal>
And Florida's state flower listing contains escape characters:
<uri>http://dbpedia.org/resource/Orange_%28fruit%29</uri>
There is also the general problem of multiple listings. For example,
California is listed with the California_Poppy twice.
What is an explanation for these structural irregularities?
Thanks, Terry
Terrence Brooks
Information School
University of Washington
Voice: 206 543-2646
Fax: 206 616-3152
E-mail: [email protected]
Web: http://faculty.washington.edu/tabrooks/