Hi,
I would just like to point out that there are other datasets with the URI/IRI
issue, also. We experienced the same problem in ACM SKOS taxonomy
(http://dl.acm.org/ccs/skos), for examle.
However, we should by no means identify it either as a Jena or dataset issue
but rather as a related specification inconsistency [1]:
"The set of RDF terms defined in RDF Concepts and Abstract Syntax
includes RDF URI references while SPARQL terms include IRIs. RDF URI
references containing "<", ">", '"' (double
quote), space, "{", "}", "|",
"\", "^", and
"`" are not IRIs. The behavior of a SPARQL query against RDF statements
composed of such RDF URI references is not defined."
For example, there is an undefined behavior in the case when a (regular
according to specification) SPARQL queries are executed against a (regular
according to specification) RDF dataset containing resources with space in the
resource name.
However, it would be very nice if Jena would tolerate the situation somehow. I
am sure there is no fully correct solution to the problem at Jena level, but a
work around in a form of URI decoder/encoder might be handy in the situation.
Thanks,
Milorad Tosic
[1] http://www.w3.org/TR/rdf-sparql-query/#QSynIRI
>________________________________
> From: Jomari Peterson <[email protected]>
>To: [email protected]
>Sent: Wednesday, December 12, 2012 6:12 AM
>Subject: Fuseki Error Leading to 500 Server Error?
>
>Good Day,
> My name is Jomari Peterson and I am relatively new to Semantic
>Web Applications. My expertise is actually in process management and
>strategy. However, I am trying to learn how to utilize Jena at a very fast
>pace for a project that I want to develop for demonstration. I have done
>majority of the tutorial work from the Jena site using a portion of RDF
>data download from BaseKBs website. This data was modified from Freebase's
>data dumps. I am really relying on amalgamating work that has been done
>and the examples set by others.
>
> This leads me to my problem today. However, first, I would like to
>thank everyone that has even made this possible for me. I appreciate the
>documentation that is out there. I have been taking notes as I have gleaned
>information from the site and other sources, so once I reach a point where
>I feel like I have gained the base knowledge, I can pass them along to
>assist future developers. At this point, I am able to upload, query and
>manipulate data from BaseKB. This is primarily due to the smaller file
>sizes and it being divided into separate files making things more
>manageable. They are in N-triple Syntax. I downloaded Freebase's recently
>released RDF Data Dump and wanted to use it, since it was direct from the
>source. I wanted to utilize it until I taught myself about TDB and Fuseki,
>since the file is over 30GB.
> After testing my BaseKB files, I was able to query the data and
>manipulate it. However, when I went to upload the Freebase RDF datadump, I
>received the following output.
>
><div class="mydiv" style="border:1px #000 solid"><textarea
>style="width:100%;height:120px;border:2px solid black;padding:4px;">
>22:57:16 INFO Fuseki :: [7] POST http://localhost:3030/ds/upload
>22:57:16 INFO Fuseki :: [7] Upload: Filename:
>freebase-rdf-2012-11-27-15-46.ttl,
>Content-Type=application/octet-stream, Charset=null => Turtle
>
>22:57:18 WARN Fuseki :: [line: 100091, col: 54] Bad
>IRI: <http://ja.wikipedia.org/wiki/イヴ・サン=ローラン> Code: 47/NOT_NFKC in
>PATH: The IRI is not in Unicode Normal Form KC.
>22:57:18 WARN Fuseki :: [line: 100091, col: 54] Bad
>IRI: <http://ja.wikipedia.org/wiki/イヴ・サン=ローラン> Code:
>56/COMPATIBILITY_CHARACTER in PATH: TODO
>22:57:19 WARN Fuseki :: [line: 182783, col: 33]
>Language not valid: es-419
>22:57:19 WARN Fuseki :: [line: 182874, col: 24]
>Language not valid: es-419
>22:57:19 WARN Fuseki :: [line: 230804, col: 54] Bad
>IRI: <http://ja.wikipedia.org/wiki/エシュ=シュル=アルゼット> Code: 47/NOT_NFKC in
>PATH: The IRI is not in Unicode Normal Form KC.
>22:57:19 WARN Fuseki :: [line: 230804, col: 54] Bad
>IRI: <http://ja.wikipedia.org/wiki/エシュ=シュル=アルゼット> Code:
>56/COMPATIBILITY_CHARACTER in PATH: TODO
>22:57:20 WARN Fuseki :: [line: 263095, col: 33]
>Language not valid: es-419
>22:57:20 WARN Fuseki :: [line: 263271, col: 24]
>Language not valid: es-419
>22:57:20 WARN Fuseki :: [line: 291130, col: 54] Bad
>IRI: <http://rpggeek.com/rpg/426/Changeling: The Dreaming> Code:
>17/WHITESPACE in PATH: A single whitespace character. These match no
>grammar rules of URIs/IRIs. These characters are permitted in RDF URI
>References, XML system identifiers, and XML Schema anyURIs.
>22:57:20 WARN Fuseki :: [line: 298926, col: 36] Bad
>IRI: <http://http:urbis.com> Code: 0/ILLEGAL_CHARACTER in PORT: The
>character violates the grammar rules for URIs/IRIs.
>22:57:22 WARN Fuseki :: [line: 320172, col: 55] Bad
>IRI: <http://pt.wikipedia.org/wiki/Estudo_Transcendental_Nº12> Code:
>47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
>22:57:22 WARN Fuseki :: [line: 320172, col: 55] Bad
>IRI: <http://pt.wikipedia.org/wiki/Estudo_Transcendental_Nº12> Code:
>56/COMPATIBILITY_CHARACTER in PATH: TODO
>22:57:22 WARN Fuseki :: [line: 331805, col: 47] Bad
>IRI: <http://www.skygate-int.com/ (defunct)> Code: 17/WHITESPACE in
>PATH: A single whitespace character. These match no grammar rules of
>URIs/IRIs. These characters are permitted in RDF URI References, XML
>system identifiers, and XML Schema anyURIs.
>22:57:22 WARN Fuseki :: [line: 334838, col: 55] Bad
>IRI: <http://de.wikipedia.org/wiki/Éclairs_sur_l’Au-delà_…> Code:
>47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
>22:57:22 WARN Fuseki :: [line: 334838, col: 55] Bad
>IRI: <http://de.wikipedia.org/wiki/Éclairs_sur_l’Au-delà_…> Code:
>56/COMPATIBILITY_CHARACTER in PATH: TODO
>
>22:57:22 ERROR Fuseki :: [line: 338972, col: 114] Broken
>IRI (bad character: '<'):
>http://www.nhs.uk/Services/Hospitals/Overview/DefaultView.aspx?id
>22:57:22 INFO Fuseki :: [7] 500 Server Error
></textarea></div>
>
>I did learn about URIs in the process of my work and I am assuming from my
>reading that IRIs are the international expansion of them to include
>additional characters. I don't know if the ViolationsCodes section would
>have helped or not, but it is currently not available in the "Support for
>Internationalised Resource Identifiers in Jena" on the apache site.
>
>*Now the question(s): Is there a way to skip these types of errors and
>continue with the rest of the dataset/file? Would it be better to try and
>load it into TDB first? Probably outside scope of this work: but how could
>I go about fixing/deleting this broken IRI character in such a big file? I
>appreciate your time and help.*
>*
>*
>(Freebase Users informed me there are about 7800 of these types of errors
>in this datadump, so advice on what I would need to do to delete them or
>skip them would be appreciated. The BaseKB dump does not have this issue
>but it is spread across 1000+ different files from what the owner of that
>dump told me)
>
>
>I manually made the Freebase Datadump .ttl because they stated in their
>documentation that it was turtle syntax. The beginning of the data looks
>like so...
>
><div class="mydiv" style="border:1px #000 solid"><textarea
>style="width:100%;height:120px;border:2px solid black;padding:4px;">
>@prefix ns: <http://rdf.freebase.com/ns/>.
>@prefix key: <http://rdf.freebase.com/key/>.
>@prefix owl: <http://www.w3.org/2002/07/owl#>.
>@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
>@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
>@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
>
>ns:american_football.football_historical_roster_position.number
>ns:type.object.name "Number"@en.
>ns:american_football.football_historical_roster_position.number
>ns:type.property.unique true.
>ns:american_football.football_historical_roster_position.number
>ns:type.object.type ns:type.property.
>ns:american_football.football_historical_roster_position.number
>rdfs:label "Number"@en.
>ns:american_football.football_historical_roster_position.number
>ns:type.property.expected_type ns:type.int.
>ns:american_football.football_historical_roster_position.number
>ns:type.property.schema
>ns:american_football.football_historical_roster_position.
>ns:american_football.football_historical_roster_position.number rdf:type
>owl:FunctionalProperty.
>ns:american_football.football_historical_roster_position.number
>rdfs:domain ns:american_football.football_historical_roster_position.
>ns:american_football.football_historical_roster_position.number
>rdfs:range ns:type.int.
>ns:american_football.football_player.footballdb_id
>ns:type.property.expected_type ns:type.enumeration.
>ns:american_football.football_player.footballdb_id ns:type.object.type
>ns:type.property.
>ns:american_football.football_player.footballdb_id
>ns:type.property.unique true.
>ns:american_football.football_player.footballdb_id
>ns:type.property.schema ns:american_football.football_player.
>ns:american_football.football_player.footballdb_id rdfs:label "footballdb
>ID"@en.
>ns:american_football.football_player.footballdb_id ns:type.object.name
>"footballdb
>ID"@en.
>ns:american_football.football_player.footballdb_id rdf:type
>owl:FunctionalProperty.
>ns:american_football.football_player.footballdb_id rdfs:domain
>ns:american_football.football_player.
>ns:american_football.football_player.footballdb_id rdfs:range
>ns:type.enumeration.
>ns:astronomy.astronomical_observatory.discoveries
>ns:type.property.expected_type</textarea></div>
>
>
>
>
>
>
>--
>Jomari Peterson
>"Creating the Context for Miracles"
>707-373-1093
>
>
>