On 08/08/14 09:26, Deyan Chen wrote:
Hi Andy,

Thank you for your answer.

The syntax format of Freebase dumps is indeed N-Triples, but I don't
know why it includes these illegal triples(not N-Triples).

Let BaseKB know - they try to produce clean data.


BaseKB split Freebase dump into 1024 dumps and compress them with gzip.
But these dumps don't include any file extension.

According to tdbloader tests and your answer, If I can think that legal
N-Triples is also legal Turtle or N3 but not vice versa?

Yes. N-Triples is a subset of Turtle.


Deyan Chen

在 2014年08月08日 15:21, Andy Seaborne 写道:
On 08/08/14 05:02, Deyan Chen wrote:
Hi Andy,

Basekb dumps come from freebase dump and their data format is N-Triples
RDF.
So for each basekb dump, it is uncompressed, attached a extension '.nt'
and then loaded into TDB.

But the tdbloader reports the following error:

15:19:22 ERROR riot                 :: [line: 309035, col: 135] Illegal
object: [INTEGER:5281023]
org.apache.jena.riot.RiotException: [line: 309035, col: 135] Illegal
object: [INTEGER:5281023]
...


And then, I print the triple:

<http://www.neusoft.com/ontologies/2013/6/medicine#m.07_71>
<http://www.neusoft.com/ontologies/2013/6/medicine#medicine.drug.pubchem>
5281023
.

It should be that tdbloader can't decide on the type of the object.
There are also many triples like this.

Then I change the extension from '.nt' to '.n3' and then reload these
dumps.
This time tdbloader load all the dumps into the TDB store without
reporting any errors.
And I can query all triples from the TDB store.

But I don't know why tdbloader don't check these errors any more when
the extension is '.n3'.

Thank you very much.

Deyan Chen

Hi there,

A integer written 5281023 isn't legal N-Triples - it is legal Turtle
(and N3 though).

In N-triples it's

 "5281023"^^<http://www.w3.org/2001/XMLSchema#integer>

with no alternative short form.

It is a good idea to parse data to check before loading; call "riot
--validate".

if the files are compressed with gzip, you can use those directly.
RIOT looks for file extension .gz, adds a decompressor, strips then
looks at the next file extension to get the syntax type.

Generally, don't use N3 , use Turtle, which is a W3C standard. There
is some variety around the details of N3.  Turtle is more rigorously
defined and the syntax details like prefix names, aligns with SPARQL.
Jena treats N3 as Turtle.   There is more to N3 than just the data
format (like N3 formulae).

    Andy

.


---------------------------------------------------------------------------------------------------

Confidentiality Notice: The information contained in this e-mail and any
accompanying attachment(s) is intended only for the use of the intended
recipient and may be confidential and/or privileged of Neusoft
Corporation, its subsidiaries and/or its affiliates. If any reader of
this communication is not the intended recipient, unauthorized use,
forwarding, printing,  storing, disclosure or copying is strictly
prohibited, and may be unlawful.If you have received this communication
in error,please immediately notify the sender by return e-mail, and
delete the original message and all copies from your system. Thank you.
---------------------------------------------------------------------------------------------------


Reply via email to