Am 13.06.2012 15:55, schrieb Andy Seaborne:
On 13/06/12 14:19, Damian Steer wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 13/06/12 14:03, Stefan Scheffler wrote:
Hello, I need to import large n-triple files (dbpedia) into a tdb.
The problem is, that many of the triples are not valid (like
missing '<' or invalid chars) and leading to an exception which
quits the import... I just want to skip them and continue, so that
all valid triples are in the tdb at the end.
Is there a possibility to do that easily? I tried to rewrite the
ARQ, but this is very complex With friendly regards Stefan
Scheffler
You'd be much better off finding an n-triple parser that kept going
and also spat out (working) n-triples for piping to TDB. I can't see
an option like that in the riot command line.
There isn't such an option - there could be (if someone wants to
contribute a patch).
This is a typical ETL situation - you're going to have to clean those
triples (which were not written by an RDf tool presumably). Do you
want to loose them or fix them?
Checking before loading is always a good idea, especially data from
outside and other tools. When I receive TTL or RDF/XML, I parse to NT
which means its then checked. Then load the data.
Andy
Hi Andy,
At the moment i just want to skip the invalid triples (later they should
be stored and maybe fixed, if its possible).
The main goal is to have an import-proccess which runs automaticly and
don't stops on every found failure.
The moment of checking doesn't matter (atm ;)) . It can before or
during the import (but i used the second strategy on sesame).
Thanks Stefan
--
Stefan Scheffler
Avantgarde Labs GbR
Löbauer Straße 19, 01099 Dresden
Telefon: + 49 (0) 351 21590834
Email: [email protected]