On 05/09/12 08:33, Stefan Scheffler wrote:
Hey. I'm not sure if tdbloader3 can handle rdf, or maybe there is a
special option for it.
I think tdbloader3 is NTriples/NQuads only (... checks the sourec code
... yes).
The others can load any format but they will not know rdf.u8 is RDF/XML.
1/ Try tdbloader and .rdf
2/ Better: run "riot" on the files first to validate them and convert to
N-Triples, keep the N-Triples output and load those.
Much better to "check then load" than have a large load crash due to bad
data.
Parsing of complex formats like RDF/XML slows the bulk loader down.
3/ Only use tdbloader3 if you think you need to - it needs a lot of
tuning to get performance to match tdbloader (or tdbloader2 on Linux).
tdbloader is simplest and is usually the same sort of speed as the
others up to, say, 50M triples, often up to 100M.
Andy
Because
org.openjena.riot.lang.LangNTriples.parseOne(LangNTriples.java:72)
handles the n-tiple format.
Regards
Stefan
On 05.09.2012 09:11, Phani Sajja wrote:
Hi all,
I want to load the ODP-RDF data into a dataset using the
*tdbloader3 *utility.
Later I want to query the dataset using SPARQL. From command line I
launched the command
* tdbloader3 --loc ~/development/odp-rdf/odp-rdf-tdb/
~development/odp-rdf/content/content.rdf.u8*
*content.rdf.u8* is the file which is downloaded from
http://www.dmoz.org/rdf.html
I am getting the exception
12:09:12 INFO tdbloader3 :: Load:
/home/phani/development/odp-rdf/content/content.rdf.u8 -- 2012/09/05
12:09:12 IST
12:09:12 ERROR riot :: [line: 4, col: 3 ] Triple not
terminated by DOT: [IRI:Topic r:id=""]
Exception in thread "main" org.openjena.riot.RiotException: [line: 4,
col:
3 ] Triple not terminated by DOT: [IRI:Topic r:id=""]
at
org.openjena.riot.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:125)
at org.openjena.riot.lang.LangEngine.raiseException(LangEngine.java:169)
at org.openjena.riot.lang.LangEngine.exceptionDirect(LangEngine.java:162)
at org.openjena.riot.lang.LangEngine.exception(LangEngine.java:155)
at org.openjena.riot.lang.LangNTriples.parseOne(LangNTriples.java:72)
at org.openjena.riot.lang.LangNTriples.parseOne(LangNTriples.java:33)
at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:69)
at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)
at tdb.tdbloader3.exec(tdbloader3.java:209)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at tdb.tdbloader3.main(tdbloader3.java:108)
I even rename the file to *content.rdf*, and the same exception appeared.
The first few lines of the RDF file contains
<?xml version="1.0" encoding="UTF-8"?>
<RDF xmlns:r="http://www.w3.org/TR/RDF/" xmlns:d="
http://purl.org/dc/elements/1.0/" xmlns="http://dmoz.org/rdf/">
<!-- Generated at 2012-08-19 00:04:22 EST from DMOZ 2.0 -->
<Topic r:id="">
<catid>1</catid>
</Topic>
<Topic r:id="Top/Arts">Scrum:
<catid>381773</catid>
</Topic>
<Topic r:id="Top/Arts/Animation">
<catid>423945</catid>
<link1 r:resource="http://www.awn.com/"></link1>
<link r:resource="http://animation.about.com/"></link>
<link r:resource="http://www.toonhound.com/"></link>
<link
r:resource="http://enculturation.gmu.edu/2_1/pisters.html"></link>
<link r:resource="
http://www.digitalmediafx.com/Features/animationhistory.html"></link>
<link r:resource="
http://www-viz.tamu.edu/courses/viza615/97spring/pjames/history/main.html
"></link>
<link
r:resource="http://www.spark-online.com/august00/media/romano.html
"></link>
<link r:resource="http://www.animated-divots.net/"></link>
</Topic>
<ExternalPage about="http://www.awn.com/">
<d:Title>Animation World Network</d:Title>
<d:Description>Provides information resources to the international
animation community. Features include searchable database archives,
monthly
magazine, web animation guide, the Animation Village, discussion
forums and
other useful resources.</d:Description>
<priority>1</priority>
<topic>Top/Arts/Animation</topic>
</ExternalPage>
............
What is the problem??