Good Day,
           My name is Jomari Peterson and I am relatively new to Semantic
Web Applications. My expertise is actually in process management and
strategy. However, I am trying to learn how to utilize Jena at a very fast
pace for a project that I want to develop for demonstration.  I have done
majority of the tutorial work from the Jena site using a portion of RDF
data download from BaseKBs website. This data was modified from Freebase's
data dumps.  I am really relying on amalgamating work that has been done
and the examples set by others.

         This leads me to my problem today. However, first, I would like to
thank everyone that has even made this possible for me. I appreciate the
documentation that is out there. I have been taking notes as I have gleaned
information from the site and other sources, so once I reach a point where
I feel like I have gained the base knowledge, I can pass them along to
assist future developers.  At this point, I am able to upload, query and
manipulate data from BaseKB. This is primarily due to the smaller file
sizes and it being divided into separate files making things more
manageable. They are in N-triple Syntax. I downloaded Freebase's recently
released RDF Data Dump and wanted to use it, since it was direct from the
source.  I wanted to utilize it until I taught myself about TDB and Fuseki,
since the file is over 30GB.
         After testing my BaseKB files, I was able to query the data and
manipulate it. However, when I went to upload the Freebase RDF datadump, I
received the following output.

<div class="mydiv" style="border:1px #000 solid"><textarea
style="width:100%;height:120px;border:2px solid black;padding:4px;">
22:57:16 INFO  Fuseki               :: [7] POST http://localhost:3030/ds/upload
22:57:16 INFO  Fuseki               :: [7] Upload: Filename:
freebase-rdf-2012-11-27-15-46.ttl,
Content-Type=application/octet-stream, Charset=null => Turtle

22:57:18 WARN  Fuseki               :: [line: 100091, col: 54] Bad
IRI: <http://ja.wikipedia.org/wiki/イヴ・サン=ローラン> Code: 47/NOT_NFKC in
PATH: The IRI is not in Unicode Normal Form KC.
22:57:18 WARN  Fuseki               :: [line: 100091, col: 54] Bad
IRI: <http://ja.wikipedia.org/wiki/イヴ・サン=ローラン> Code:
56/COMPATIBILITY_CHARACTER in PATH: TODO
22:57:19 WARN  Fuseki               :: [line: 182783, col: 33]
Language not valid: es-419
22:57:19 WARN  Fuseki               :: [line: 182874, col: 24]
Language not valid: es-419
22:57:19 WARN  Fuseki               :: [line: 230804, col: 54] Bad
IRI: <http://ja.wikipedia.org/wiki/エシュ=シュル=アルゼット> Code: 47/NOT_NFKC in
PATH: The IRI is not in Unicode Normal Form KC.
22:57:19 WARN  Fuseki               :: [line: 230804, col: 54] Bad
IRI: <http://ja.wikipedia.org/wiki/エシュ=シュル=アルゼット> Code:
56/COMPATIBILITY_CHARACTER in PATH: TODO
22:57:20 WARN  Fuseki               :: [line: 263095, col: 33]
Language not valid: es-419
22:57:20 WARN  Fuseki               :: [line: 263271, col: 24]
Language not valid: es-419
22:57:20 WARN  Fuseki               :: [line: 291130, col: 54] Bad
IRI: <http://rpggeek.com/rpg/426/Changeling: The Dreaming> Code:
17/WHITESPACE in PATH: A single whitespace character. These match no
grammar rules of URIs/IRIs. These characters are permitted in RDF URI
References, XML system identifiers, and XML Schema anyURIs.
22:57:20 WARN  Fuseki               :: [line: 298926, col: 36] Bad
IRI: <http://http:urbis.com> Code: 0/ILLEGAL_CHARACTER in PORT: The
character violates the grammar rules for URIs/IRIs.
22:57:22 WARN  Fuseki               :: [line: 320172, col: 55] Bad
IRI: <http://pt.wikipedia.org/wiki/Estudo_Transcendental_Nº12> Code:
47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
22:57:22 WARN  Fuseki               :: [line: 320172, col: 55] Bad
IRI: <http://pt.wikipedia.org/wiki/Estudo_Transcendental_Nº12> Code:
56/COMPATIBILITY_CHARACTER in PATH: TODO
22:57:22 WARN  Fuseki               :: [line: 331805, col: 47] Bad
IRI: <http://www.skygate-int.com/ (defunct)> Code: 17/WHITESPACE in
PATH: A single whitespace character. These match no grammar rules of
URIs/IRIs. These characters are permitted in RDF URI References, XML
system identifiers, and XML Schema anyURIs.
22:57:22 WARN  Fuseki               :: [line: 334838, col: 55] Bad
IRI: <http://de.wikipedia.org/wiki/Éclairs_sur_l’Au-delà_…> Code:
47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
22:57:22 WARN  Fuseki               :: [line: 334838, col: 55] Bad
IRI: <http://de.wikipedia.org/wiki/Éclairs_sur_l’Au-delà_…> Code:
56/COMPATIBILITY_CHARACTER in PATH: TODO

22:57:22 ERROR Fuseki               :: [line: 338972, col: 114] Broken
IRI (bad character: '<'):
http://www.nhs.uk/Services/Hospitals/Overview/DefaultView.aspx?id
22:57:22 INFO  Fuseki               :: [7] 500 Server Error
</textarea></div>

I did learn about URIs in the process of my work and I am assuming from my
reading that IRIs are the international expansion of them to include
additional characters. I don't know if the ViolationsCodes section would
have helped or not, but it is currently not available in the "Support for
Internationalised Resource Identifiers in Jena" on the apache site.

*Now the question(s): Is there a way to skip these types of errors and
continue with the rest of the dataset/file? Would it be better to try and
load it into TDB  first? Probably outside scope of this work: but how could
I go about fixing/deleting this broken IRI character in such a big file?  I
appreciate your time and help.*
*
*
(Freebase Users informed me there are about 7800 of these types of errors
in this datadump, so advice on what I would need to do to delete them or
skip them would be appreciated. The BaseKB dump does not have this issue
but it is spread across 1000+ different files from what the owner of that
dump told me)


I manually made the Freebase Datadump .ttl because they stated in their
documentation that it was turtle syntax. The beginning of the data looks
like so...

<div class="mydiv" style="border:1px #000 solid"><textarea
style="width:100%;height:120px;border:2px solid black;padding:4px;">
@prefix ns: <http://rdf.freebase.com/ns/>.
@prefix key: <http://rdf.freebase.com/key/>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

ns:american_football.football_historical_roster_position.number 
ns:type.object.name     "Number"@en.
ns:american_football.football_historical_roster_position.number 
ns:type.property.unique true.
ns:american_football.football_historical_roster_position.number 
ns:type.object.type     ns:type.property.
ns:american_football.football_historical_roster_position.number rdfs:label      
"Number"@en.
ns:american_football.football_historical_roster_position.number 
ns:type.property.expected_type  ns:type.int.
ns:american_football.football_historical_roster_position.number 
ns:type.property.schema 
ns:american_football.football_historical_roster_position.
ns:american_football.football_historical_roster_position.number rdf:type        
owl:FunctionalProperty.
ns:american_football.football_historical_roster_position.number rdfs:domain     
ns:american_football.football_historical_roster_position.
ns:american_football.football_historical_roster_position.number rdfs:range      
ns:type.int.
ns:american_football.football_player.footballdb_id      
ns:type.property.expected_type  ns:type.enumeration.
ns:american_football.football_player.footballdb_id      ns:type.object.type     
ns:type.property.
ns:american_football.football_player.footballdb_id      ns:type.property.unique 
true.
ns:american_football.football_player.footballdb_id      ns:type.property.schema 
ns:american_football.football_player.
ns:american_football.football_player.footballdb_id      rdfs:label      
"footballdb
ID"@en.
ns:american_football.football_player.footballdb_id      ns:type.object.name     
"footballdb
ID"@en.
ns:american_football.football_player.footballdb_id      rdf:type        
owl:FunctionalProperty.
ns:american_football.football_player.footballdb_id      rdfs:domain     
ns:american_football.football_player.
ns:american_football.football_player.footballdb_id      rdfs:range      
ns:type.enumeration.
ns:astronomy.astronomical_observatory.discoveries       
ns:type.property.expected_type</textarea></div>






-- 
Jomari Peterson
"Creating the Context for Miracles"
707-373-1093

Reply via email to