On 08/08/14 05:02, Deyan Chen wrote:
Hi Andy,
Basekb dumps come from freebase dump and their data format is N-Triples
RDF.
So for each basekb dump, it is uncompressed, attached a extension '.nt'
and then loaded into TDB.
But the tdbloader reports the following error:
15:19:22 ERROR riot :: [line: 309035, col: 135] Illegal
object: [INTEGER:5281023]
org.apache.jena.riot.RiotException: [line: 309035, col: 135] Illegal
object: [INTEGER:5281023]
...
And then, I print the triple:
<http://www.neusoft.com/ontologies/2013/6/medicine#m.07_71>
<http://www.neusoft.com/ontologies/2013/6/medicine#medicine.drug.pubchem>
5281023
.
It should be that tdbloader can't decide on the type of the object.
There are also many triples like this.
Then I change the extension from '.nt' to '.n3' and then reload these
dumps.
This time tdbloader load all the dumps into the TDB store without
reporting any errors.
And I can query all triples from the TDB store.
But I don't know why tdbloader don't check these errors any more when
the extension is '.n3'.
Thank you very much.
Deyan Chen
Hi there,
A integer written 5281023 isn't legal N-Triples - it is legal Turtle
(and N3 though).
In N-triples it's
"5281023"^^<http://www.w3.org/2001/XMLSchema#integer>
with no alternative short form.
It is a good idea to parse data to check before loading; call "riot
--validate".
if the files are compressed with gzip, you can use those directly. RIOT
looks for file extension .gz, adds a decompressor, strips then looks at
the next file extension to get the syntax type.
Generally, don't use N3 , use Turtle, which is a W3C standard. There is
some variety around the details of N3. Turtle is more rigorously
defined and the syntax details like prefix names, aligns with SPARQL.
Jena treats N3 as Turtle. There is more to N3 than just the data
format (like N3 formulae).
Andy