Hi Andy, Here are the triples from the neighborhood of line 270608. i tried finding the error but couldn't. Do you see any by chance? I printed the line number too on the left just in case. Ex: "line num 270591-" ------------------------------ line num: 270591- ns:m.01gqn1 ns:type.object.type ns:organization.organization. line num: 270592- ns:m.01gqn1 ns:type.object.key "/wikipedia/en/Worker$0027s_Party_of_Brazil". line num: 270593- ns:m.01gqn1 ns:type.object.name "Workers' Party"@en. line num: 270594- ns:m.01gqn1 ns:type.object.name "Partido dos Trabalhadores"@ca. line num: 270595- ns:m.01gqn1 ns:type.object.key "/wikipedia/en/Partido_dos_Trabalhadores". line num: 270596- ns:m.01gqn1 ns:type.object.name "Partido dos Trabalhadores"@de. line num: 270597- ns:m.01gqn1 ns:type.object.key "/wikipedia/pt_title/Partido_dos_Trabalhadores". line num: 270598- ns:m.01gqn1 ns:type.object.key "/wikipedia/it_title/Partito_dei_Lavoratori_$0028Brasile$0029". line num: 270599- ns:m.01gqn1 ns:type.object.key "/wikipedia/ja_title/$52B4$50CD$8005$515A_$0028$30D6$30E9$30B8$30EB$0029". line num: 270600- ns:m.01gqn1 ns:base.braziliangovt.brazilian_political_party.president ns:m.02ql58w. line num: 270601- ns:m.01gqn1 key:wikipedia.fr_id "742582". line num: 270602- ns:m.01gqn1 key:wikipedia.ja_id "1747452". line num: 270603- ns:m.01gqn1 ns:type.object.key "/wikipedia/es_id/281246". line num: 270604- ns:m.01gqn1 ns:type.object.key "/wikipedia/en/Brazilian_Worker$0027s_Party". line num: 270605- ns:m.01gqn1 ns:common.topic.topical_webpage < http://tse.gov.br/internet/partidos/partidos_politicos/pt.htm>. line num: 270606- ns:m.01gqn1 ns:base.braziliangovt.brazilian_political_party.number 13. line num: 270607- ns:m.01gqn1 ns:type.object.key "/en/workers_party". line num: 270608- ns:m.01gqn1 ns:type.object.key "/wikipedia/en/Brazilian_Workers_Party". line num: 270609- ns:m.01gqn1 ns:type.object.type ns:government.political_party. line num: 270610- ns:m.01gqn1 ns:common.topic.image ns:m.03s8rh8. line num: 270611- ns:m.01gqn1 ns:common.topic.official_website < http://www.pt.org.br/>. line num: 270612- ns:m.01gqn1 ns:type.object.type ns:business.employer. line num: 270613- ns:m.01gqn1 ns:type.object.key "/wikipedia/ru_id/1230551". line num: 270614- ns:m.01gqn1 ns:type.object.key "/wikipedia/en_id/224622". line num: 270615- ns:m.01gqn1 ns:type.object.type ns:base.braziliangovt.brazilian_political_party. line num: 270616- ns:m.01gqn1 ns:type.object.name "Partido de los Trabajadores"@es. line num: 270617- ns:m.01gqn1 ns:type.object.key "/wikipedia/ja/$52B4$50CD$8005$515A_$0028$30D6$30E9$30B8$30EB$0029". line num: 270618- ns:m.01gqn1 ns:common.topic.article ns:m.01gqn9. line num: 270619- ns:m.01gqn1 ns:common.topic.webpage ns:m.04yvgzd. line num: 270620- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://de.wikipedia.org/wiki/Partido_dos_Trabalhadores>. line num: 270621- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://fr.wikipedia.org/wiki/Parti_des_travailleurs_(Brésil)>. line num: 270622- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://it.wikipedia.org/wiki/index.html?curid=616917>. line num: 270623- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://ru.wikipedia.org/wiki/index.html?curid=1230551>. line num: 270624- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://ja.wikipedia.org/wiki/index.html?curid=1747452>. line num: 270625- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://de.wikipedia.org/wiki/index.html?curid=408018>. line num: 270626- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://ru.wikipedia.org/wiki/ПР°Ñ€Ñ‚иÑ?_трудÑ?щихÑ?Ñ?_(БразилиÑ?)>. line num: 270627- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://es.wikipedia.org/wiki/Partido_de_los_Trabajadores_(Brasil)>. line num: 270628- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://es.wikipedia.org/wiki/index.html?curid=281246>. line num: 270629- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage < http://it.wikipedia.org/wiki/Partito_dei_Lavoratori_(Brasile)>.
---------------------------- Thank you! With Regards, Abhishek S On Fri, Dec 28, 2012 at 1:15 AM, Andy Seaborne <[email protected]> wrote: > On 27/12/12 19:17, Abhishek Shivkumar wrote: > >> Thanks Andy. It runs for few lines and then throws an error like below >> saying that the triples are not ending with a DOT. I am assuming the file >> doesn't end every triple with a "." . Is there a work around for this? >> >> I read somewhere in the mailing list that this has been fixed in SVN. >> Anyways, I downloaded the jena from >> http://www.apache.org/dist/**jena/<http://www.apache.org/dist/jena/> >> > > 1/ the warning for es-419 is fixed. > > 2/ > [[ > The Bad IRI: > <http://*lv.wikipedia.org/**wiki/Riode<http://lv.wikipedia.org/wiki/Riode> > ₧aneiro_"Fluminense"**> Code: 4/UNWISE_CHARACTER in PATH: The character > matches no grammar rules of URIs/IRIs. These characters are permitted in > RDF URI References, XML system identifiers, and XML Schema anyURIs. > ]] > > Can't have " in URIs. The ₧ looks like an ISO-8859-1/UTF-8 encoding error. > > 3/ The file has bad syntax - you need to look at line 270608 or so and fix > it up. It's a bug in the freebase data. > > Shame they haven't fixed it - it was wrong previously as well. It may be > something like unmatched quotes, or an encoding error, so it may look right > but isn't. > > What are the lines around 270608? > > > (Another good reason for parsing data before loading!) > > Andy > > > >> Thanks much! >> >> *00:39:13 INFO loader :: Add: 150,000 triples (Batch: >> 40,950 >> / Avg* >> *: 23,648)* >> *00:39:14 INFO loader :: Add: 200,000 triples (Batch: >> 43,898 >> / Avg* >> *: 26,730)* >> *00:39:14 WARN riot :: [line: 209572, col: 54] Bad IRI: >> <http://* >> *lv.wikipedia.org/wiki/Riode₧**aneiro_"Fluminense"> Code: >> 4/UNWISE_CHARACTER >> in PAT* >> *H: The character matches no grammar rules of URIs/IRIs. These characters >> are per* >> *mitted in RDF URI References, XML system identifiers, and XML Schema >> anyURIs.* >> *00:39:14 WARN riot :: [line: 219452, col: 33] Language >> not vali* >> *d: es-419* >> *00:39:14 WARN riot :: [line: 219560, col: 24] Language >> not vali* >> *d: es-419* >> *00:39:15 INFO loader :: Add: 250,000 triples (Batch: >> 43,975 >> / Avg* >> *: 29,005)* >> *00:39:15 ERROR riot :: [line: 270608, col: 1 ] Triples >> not >> termi* >> *nated by DOT* >> *Exception in thread "main" org.openjena.riot.**RiotException: [line: >> 270608, >> col:* >> *1 ] Triples not terminated by DOT* >> >> * at >> org.openjena.riot.**ErrorHandlerFactory$**ErrorHandlerStd.fatal(** >> ErrorHand* >> *lerFactory.java:130)* >> * at >> org.openjena.riot.lang.**LangEngine.raiseException(** >> LangEngine.java:169)* >> * >> * >> * at >> org.openjena.riot.lang.**LangEngine.exceptionDirect(** >> LangEngine.java:162* >> *)* >> * at org.openjena.riot.lang.**LangEngine.exception(** >> LangEngine.java:155) >> * >> * at org.openjena.riot.lang.**LangEngine.expect(LangEngine.** >> java:147)* >> * at >> org.openjena.riot.lang.**LangEngine.expectOrEOF(**LangEngine.java:138)* >> * at >> org.openjena.riot.lang.**LangTurtle.expectEndOfTriples(** >> LangTurtle.java:* >> *57)* >> * at >> org.openjena.riot.lang.**LangTurtleBase.triples(** >> LangTurtleBase.java:285* >> *)* >> * at >> org.openjena.riot.lang.**LangTurtleBase.**triplesSameSubject(** >> LangTurtleBa* >> *se.java:223)* >> * at >> org.openjena.riot.lang.**LangTurtle.oneTopLevelElement(** >> LangTurtle.java:* >> *46)* >> * at >> org.openjena.riot.lang.**LangTurtleBase.runParser(** >> LangTurtleBase.java:1* >> *44)* >> * at org.openjena.riot.lang.**LangBase.parse(LangBase.java:**43)* >> * at org.openjena.riot.RiotReader.**parseTriples(RiotReader.java:* >> *97)* >> * at org.openjena.riot.RiotReader.**parseTriples(RiotReader.java:* >> *83)* >> * at org.openjena.riot.RiotReader.**parseTriples(RiotReader.java:* >> *56)* >> * at >> com.hp.hpl.jena.tdb.store.**bulkloader.BulkLoader.** >> loadTriples$(BulkLoad* >> *er.java:139)* >> * at >> com.hp.hpl.jena.tdb.store.**bulkloader.BulkLoader.** >> loadDefaultGraph(Bulk* >> *Loader.java:87)* >> * at >> com.hp.hpl.jena.tdb.TDBLoader.**loadDefaultGraph$(TDBLoader.**java:261)* >> * at com.hp.hpl.jena.tdb.TDBLoader.** >> loadGraph$(TDBLoader.java:244)*** >> * at com.hp.hpl.jena.tdb.TDBLoader.** >> loadGraph(TDBLoader.java:177)* >> * at com.hp.hpl.jena.tdb.TDBLoader.**load(TDBLoader.java:112)* >> * at tdb.tdbloader.**loadDefaultGraph(tdbloader.**java:150)* >> * at tdb.tdbloader.exec(tdbloader.**java:116)* >> >> * at arq.cmdline.CmdMain.**mainMethod(CmdMain.java:101)* >> * at arq.cmdline.CmdMain.mainRun(**CmdMain.java:63)* >> * at arq.cmdline.CmdMain.mainRun(**CmdMain.java:50)* >> * at tdb.tdbloader.main(tdbloader.**java:53)* >> >> Thank you! >> >> With Regards, >> Abhishek S >> >> >> On Fri, Dec 28, 2012 at 12:35 AM, Andy Seaborne <[email protected]> wrote: >> >> On 27/12/12 18:42, Abhishek Shivkumar wrote: >>> >>> Hi, >>>> >>>> I am trying to load a large (55 GB!) rdf file into JENA TDB for >>>> sparql >>>> querying later. Here is a snapshot of the file at the end of this email: >>>> >>>> When I am using TDBLoader from command line using the following command: >>>> >>>> *c:\JENA\apache-jena-2.7.4\****apache-jena-2.7.4\bat>****tdbloader.bat >>>> -loc >>>> test >>>> "C:\freebase-rdf-2012-12-09-****00-00"* >>>> >>>> >>> The TDB loader has no clue, via file extension, as to the syntax. The >>> default is n-quads/n-triples. >>> >>> But it's turtle, hence a syntax error. >>> >>> So either: >>> >>> 1/ Run "riotcmd.turtle FILE > data.nt" >>> >>> This is preferred because: >>> A/ It check the file is valid before loading. >>> B/ The NT loads faster. >>> >>> 2/ Rename the file to "something.ttl" >>> >>> Andy >>> >>> >>> I get this error: >>>> >>>> *23:40:30 INFO loader :: -- Start triples data phase* >>>> *23:40:30 INFO loader :: ** Load empty triples table* >>>> *23:40:30 INFO loader :: -- Start quads data phase* >>>> *23:40:30 INFO loader :: ** Load empty quads table* >>>> *23:40:30 INFO loader :: Load: C:\Users\IBM_ADMIN\My >>>> Documents\dow* >>>> *n\freebase-rdf-2012-12-09-00-****00\freebase-rdf-2012-12-09-**00-**00 >>>> -- >>>> >>>> 2012/12/27 23:4* >>>> *0:30 IST* >>>> *23:40:30 ERROR riot :: [line: 1, col: 1 ] Expected >>>> BNode >>>> >>>> or IRI:* >>>> * Got: [DIRECTIVE:prefix]* >>>> *Exception in thread "main" org.openjena.riot.****RiotException: >>>> [line: 1, >>>> >>>> col: >>>> >>>> 1 ] E* >>>> *xpected BNode or IRI: Got: [DIRECTIVE:prefix]* >>>> * at >>>> org.openjena.riot.****ErrorHandlerFactory$****ErrorHandlerStd.fatal(** >>>> ErrorHand* >>>> *lerFactory.java:130)* >>>> * at >>>> org.openjena.riot.lang.****LangEngine.raiseException(** >>>> >>>> LangEngine.java:169)* >>>> * >>>> * >>>> * at >>>> org.openjena.riot.lang.****LangEngine.exceptionDirect(** >>>> LangEngine.java:162* >>>> *)* >>>> * at org.openjena.riot.lang.****LangEngine.exception(** >>>> LangEngine.java:155) >>>> * >>>> * at >>>> org.openjena.riot.lang.****LangNTuple.checkIRIOrBNode(** >>>> LangNTuple.java:107* >>>> *)* >>>> * at org.openjena.riot.lang.****LangNQuads.parseOne(** >>>> LangNQuads.java:84)* >>>> * at org.openjena.riot.lang.****LangNQuads.parseOne(** >>>> LangNQuads.java:34)* >>>> * at org.openjena.riot.lang.****LangNTuple.runParser(** >>>> LangNTuple.java:69)* >>>> * at org.openjena.riot.lang.****LangBase.parse(LangBase.java:*** >>>> *43)* >>>> * at org.openjena.riot.RiotReader.*** >>>> *parseQuads(RiotReader.java:** >>>> 134)* >>>> * at org.openjena.riot.RiotReader.*** >>>> *parseQuads(RiotReader.java:** >>>> 121)* >>>> * at org.openjena.riot.RiotReader.*** >>>> *parseQuads(RiotReader.java:** >>>> 107)* >>>> * at >>>> com.hp.hpl.jena.tdb.store.****bulkloader.BulkLoader.** >>>> loadQuads$(BulkLoader* >>>> *.java:160)* >>>> * at >>>> com.hp.hpl.jena.tdb.store.****bulkloader.BulkLoader.** >>>> loadDataset(BulkLoade* >>>> *r.java:121)* >>>> * at com.hp.hpl.jena.tdb.TDBLoader.** >>>> **loadDataset$(TDBLoader.java:*** >>>> *283)* >>>> * at com.hp.hpl.jena.tdb.TDBLoader.** >>>> **loadDataset(TDBLoader.java:**** >>>> 196)* >>>> * at com.hp.hpl.jena.tdb.TDBLoader.****load(TDBLoader.java:75)* >>>> * at tdb.tdbloader.loadQuads(****tdbloader.java:163)* >>>> * at tdb.tdbloader.exec(tdbloader.****java:122)* >>>> * at arq.cmdline.CmdMain.****mainMethod(CmdMain.java:101)* >>>> * at arq.cmdline.CmdMain.mainRun(****CmdMain.java:63)* >>>> * at arq.cmdline.CmdMain.mainRun(****CmdMain.java:50)* >>>> * at tdb.tdbloader.main(tdbloader.****java:53)* >>>> >>>> >>>> >>>> I need help in understanding this error and how to solve it. Is there a >>>> problem with the input file? >>>> >>>> >>>> @prefix ns: <http://rdf.freebase.com/ns/>. >>>> @prefix key: <http://rdf.freebase.com/key/>****. >>>> @prefix owl: >>>> <http://www.w3.org/2002/07/****owl#<http://www.w3.org/2002/07/**owl#> >>>> <http://www.w3.org/2002/**07/owl# <http://www.w3.org/2002/07/owl#>> >>>> >>>>> . >>>>> >>>> @prefix rdfs: >>>> <http://www.w3.org/2000/01/****rdf-schema#<http://www.w3.org/2000/01/**rdf-schema#> >>>> <http://www.w3.org/**2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema#> >>>> > >>>> >>>>> . >>>>> >>>> @prefix xsd: >>>> <http://www.w3.org/2001/****XMLSchema#<http://www.w3.org/2001/**XMLSchema#> >>>> <http://www.w3.org/**2001/XMLSchema#<http://www.w3.org/2001/XMLSchema#> >>>> > >>>> >>>> . >>>>> >>>> >>>> ns:m.012rkqx ns:type.object.type ns:common.topic. >>>> ns:m.012rkqx ns:type.object.name "High Fidelity"@en. >>>> ns:m.012rkqx ns:type.object.type ns:music.single. >>>> ns:m.012rkqx ns:type.object.key ns:authority.musicbrainz.name.* >>>> *** >>>> >>>> TRACK3987054. >>>> ns:m.012rkqx ns:type.object.type ns:music.recording. >>>> ns:m.012rkqx key:authority.musicbrainz "258c45bd- >>>> 4437-4580-8988 >>>> -b3f3be975f9c". >>>> ns:m.012rkqx key:authority.musicbrainz.name "TRACK3987054". >>>> ns:m.012rkqx rdf:label "High Fidelity"@en. >>>> ns:m.012rkqx rdf:type ns:common.topic. >>>> ns:m.012rkqx rdf:type ns:music.single. >>>> ns:m.012rkqx rdf:type ns:music.recording. >>>> >>>> >>>> >>> >> >
