On 12/12/12 08:41, Milorad Tosic wrote:
Hi,

I would just like to point out that there are other datasets with the URI/IRI 
issue, also. We experienced the same problem in ACM SKOS taxonomy 
(http://dl.acm.org/ccs/skos), for examle.

However, we should by no means identify it either as a Jena or dataset issue 
but rather as a related specification inconsistency [1]:
"The set of RDF terms defined in RDF Concepts and Abstract Syntax
    includes RDF URI references while SPARQL terms include IRIs. RDF URI
    references containing "<", ">", '"' (double
quote), space, "{", "}", "|",
"\", "^", and
"`" are not IRIs. The behavior of a SPARQL query against RDF statements composed of 
such RDF URI references is not defined."

For example, there is an undefined behavior in the case when a (regular 
according to specification) SPARQL queries are executed against a (regular 
according to specification) RDF dataset containing resources with space in the 
resource name.

However, it would be very nice if Jena would tolerate the situation somehow. I 
am sure there is no fully correct solution to the problem at Jena level, but a 
work around in a form of URI decoder/encoder might be handy in the situation.

It does - WARN is a warning - the process continues for the non-normalized URIs and the space.

Use of < or > in URI is more of a problem. It is going to break serialization into N-triples or Turtle when data returned as the result of some query or GSP operation.

Accepting that data, with a warning or not, is just delaying the problem until later when it is much harder to sort out.

FYI: RDF 1.1 will drop "RDF URI reference" and use "IRI".

        Andy


Thanks,
Milorad Tosic


[1] http://www.w3.org/TR/rdf-sparql-query/#QSynIRI




________________________________
From: Jomari Peterson <[email protected]>
To: [email protected]
Sent: Wednesday, December 12, 2012 6:12 AM
Subject: Fuseki Error Leading to 500 Server Error?

Good Day,
            My name is Jomari Peterson and I am relatively new to Semantic
Web Applications. My expertise is actually in process management and
strategy. However, I am trying to learn how to utilize Jena at a very fast
pace for a project that I want to develop for demonstration.  I have done
majority of the tutorial work from the Jena site using a portion of RDF
data download from BaseKBs website. This data was modified from Freebase's
data dumps.  I am really relying on amalgamating work that has been done
and the examples set by others.

          This leads me to my problem today. However, first, I would like to
thank everyone that has even made this possible for me. I appreciate the
documentation that is out there. I have been taking notes as I have gleaned
information from the site and other sources, so once I reach a point where
I feel like I have gained the base knowledge, I can pass them along to
assist future developers.  At this point, I am able to upload, query and
manipulate data from BaseKB. This is primarily due to the smaller file
sizes and it being divided into separate files making things more
manageable. They are in N-triple Syntax. I downloaded Freebase's recently
released RDF Data Dump and wanted to use it, since it was direct from the
source.  I wanted to utilize it until I taught myself about TDB and Fuseki,
since the file is over 30GB.
          After testing my BaseKB files, I was able to query the data and
manipulate it. However, when I went to upload the Freebase RDF datadump, I
received the following output.

<div class="mydiv" style="border:1px #000 solid"><textarea
style="width:100%;height:120px;border:2px solid black;padding:4px;">
22:57:16 INFO  Fuseki               :: [7] POST http://localhost:3030/ds/upload
22:57:16 INFO  Fuseki               :: [7] Upload: Filename:
freebase-rdf-2012-11-27-15-46.ttl,
Content-Type=application/octet-stream, Charset=null => Turtle

22:57:18 WARN  Fuseki               :: [line: 100091, col: 54] Bad
IRI: <http://ja.wikipedia.org/wiki/イヴ・サン=ローラン> Code: 47/NOT_NFKC in
PATH: The IRI is not in Unicode Normal Form KC.
22:57:18 WARN  Fuseki               :: [line: 100091, col: 54] Bad
IRI: <http://ja.wikipedia.org/wiki/イヴ・サン=ローラン> Code:
56/COMPATIBILITY_CHARACTER in PATH: TODO
22:57:19 WARN  Fuseki               :: [line: 182783, col: 33]
Language not valid: es-419
22:57:19 WARN  Fuseki               :: [line: 182874, col: 24]
Language not valid: es-419
22:57:19 WARN  Fuseki               :: [line: 230804, col: 54] Bad
IRI: <http://ja.wikipedia.org/wiki/エシュ=シュル=アルゼット> Code: 47/NOT_NFKC in
PATH: The IRI is not in Unicode Normal Form KC.
22:57:19 WARN  Fuseki               :: [line: 230804, col: 54] Bad
IRI: <http://ja.wikipedia.org/wiki/エシュ=シュル=アルゼット> Code:
56/COMPATIBILITY_CHARACTER in PATH: TODO
22:57:20 WARN  Fuseki               :: [line: 263095, col: 33]
Language not valid: es-419
22:57:20 WARN  Fuseki               :: [line: 263271, col: 24]
Language not valid: es-419
22:57:20 WARN  Fuseki               :: [line: 291130, col: 54] Bad
IRI: <http://rpggeek.com/rpg/426/Changeling: The Dreaming> Code:
17/WHITESPACE in PATH: A single whitespace character. These match no
grammar rules of URIs/IRIs. These characters are permitted in RDF URI
References, XML system identifiers, and XML Schema anyURIs.
22:57:20 WARN  Fuseki               :: [line: 298926, col: 36] Bad
IRI: <http://http:urbis.com> Code: 0/ILLEGAL_CHARACTER in PORT: The
character violates the grammar rules for URIs/IRIs.
22:57:22 WARN  Fuseki               :: [line: 320172, col: 55] Bad
IRI: <http://pt.wikipedia.org/wiki/Estudo_Transcendental_Nº12> Code:
47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
22:57:22 WARN  Fuseki               :: [line: 320172, col: 55] Bad
IRI: <http://pt.wikipedia.org/wiki/Estudo_Transcendental_Nº12> Code:
56/COMPATIBILITY_CHARACTER in PATH: TODO
22:57:22 WARN  Fuseki               :: [line: 331805, col: 47] Bad
IRI: <http://www.skygate-int.com/ (defunct)> Code: 17/WHITESPACE in
PATH: A single whitespace character. These match no grammar rules of
URIs/IRIs. These characters are permitted in RDF URI References, XML
system identifiers, and XML Schema anyURIs.
22:57:22 WARN  Fuseki               :: [line: 334838, col: 55] Bad
IRI: <http://de.wikipedia.org/wiki/Éclairs_sur_l’Au-delà_…> Code:
47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
22:57:22 WARN  Fuseki               :: [line: 334838, col: 55] Bad
IRI: <http://de.wikipedia.org/wiki/Éclairs_sur_l’Au-delà_…> Code:
56/COMPATIBILITY_CHARACTER in PATH: TODO

22:57:22 ERROR Fuseki               :: [line: 338972, col: 114] Broken
IRI (bad character: '<'):
http://www.nhs.uk/Services/Hospitals/Overview/DefaultView.aspx?id
22:57:22 INFO  Fuseki               :: [7] 500 Server Error
</textarea></div>

I did learn about URIs in the process of my work and I am assuming from my
reading that IRIs are the international expansion of them to include
additional characters. I don't know if the ViolationsCodes section would
have helped or not, but it is currently not available in the "Support for
Internationalised Resource Identifiers in Jena" on the apache site.

*Now the question(s): Is there a way to skip these types of errors and
continue with the rest of the dataset/file? Would it be better to try and
load it into TDB  first? Probably outside scope of this work: but how could
I go about fixing/deleting this broken IRI character in such a big file?  I
appreciate your time and help.*
*
*
(Freebase Users informed me there are about 7800 of these types of errors
in this datadump, so advice on what I would need to do to delete them or
skip them would be appreciated. The BaseKB dump does not have this issue
but it is spread across 1000+ different files from what the owner of that
dump told me)


I manually made the Freebase Datadump .ttl because they stated in their
documentation that it was turtle syntax. The beginning of the data looks
like so...

<div class="mydiv" style="border:1px #000 solid"><textarea
style="width:100%;height:120px;border:2px solid black;padding:4px;">
@prefix ns: <http://rdf.freebase.com/ns/>.
@prefix key: <http://rdf.freebase.com/key/>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

ns:american_football.football_historical_roster_position.number    ns:type.object.name    
"Number"@en.
ns:american_football.football_historical_roster_position.number    
ns:type.property.unique    true.
ns:american_football.football_historical_roster_position.number    
ns:type.object.type    ns:type.property.
ns:american_football.football_historical_roster_position.number    rdfs:label    
"Number"@en.
ns:american_football.football_historical_roster_position.number    
ns:type.property.expected_type    ns:type.int.
ns:american_football.football_historical_roster_position.number    
ns:type.property.schema    
ns:american_football.football_historical_roster_position.
ns:american_football.football_historical_roster_position.number    rdf:type    
owl:FunctionalProperty.
ns:american_football.football_historical_roster_position.number    rdfs:domain  
  ns:american_football.football_historical_roster_position.
ns:american_football.football_historical_roster_position.number    rdfs:range   
 ns:type.int.
ns:american_football.football_player.footballdb_id    
ns:type.property.expected_type    ns:type.enumeration.
ns:american_football.football_player.footballdb_id    ns:type.object.type    
ns:type.property.
ns:american_football.football_player.footballdb_id    ns:type.property.unique   
 true.
ns:american_football.football_player.footballdb_id    ns:type.property.schema   
 ns:american_football.football_player.
ns:american_football.football_player.footballdb_id    rdfs:label    "footballdb
ID"@en.
ns:american_football.football_player.footballdb_id    ns:type.object.name    
"footballdb
ID"@en.
ns:american_football.football_player.footballdb_id    rdf:type    
owl:FunctionalProperty.
ns:american_football.football_player.footballdb_id    rdfs:domain    
ns:american_football.football_player.
ns:american_football.football_player.footballdb_id    rdfs:range    
ns:type.enumeration.
ns:astronomy.astronomical_observatory.discoveries    
ns:type.property.expected_type</textarea></div>






--
Jomari Peterson
"Creating the Context for Miracles"
707-373-1093




Reply via email to