Spaces in URIs are particularly problematic; even if you can get them into the data, using the data will likely break.

When ingesting data from somewhere else, it is good to check it before loading, then fix as needed before loading.

    riot --check file....

     Andy

http://lov.okfn.org/lov.nq.gz is only 749810 quads. tdbloader2 is overkill. Use tdbloader. tdbloader2 is an advantage for much larger data (100 million+ and even then it is not always faster)

On 07/04/17 13:17, Martynas Jusevičius wrote:
This question comes up regurarly: http://markmail.org/message/seqiw74hhdx2u64j

On Fri, Apr 7, 2017 at 2:10 PM, Laura Morales <[email protected]> wrote:
I'm trying to import the LOV dump [1] into Fuseki using tdbloader2. Unfortunately some quads are "broken" in the sense that they're not well-formed. For example this one

ERROR [line: 203556, col: 152] Bad character in IRI (space): <http://securitytoolbox.appspot.com/MASO#Objectif[space]...> org.apache.jena.riot.RiotException: [line: 203556, col: 152] Bad character in IRI (space): <http://securitytoolbox.appspot.com/MASO#Objectif[space]...>

Is there an option to tell tdbloader2 to simply ignore these nquads (or show a warning) and keep going instead of raising an exception and halting?

-----------------

The problem is much more the 'Spaces'.

But last not least, i think, a utility making database for Fuseki, may not 'encourage' the users throwing away this and that triple/quad-line because the user wants to run it to the end. It is clear where this ends, than there is no logic in that what you do...

I have had this proplem usally with downloaded dbpedia files

long_abstracts_en.nt
long_abstracts_en_uris_de.nt

I repaired each line in an editor, as our utility likes it and if i couldn't guess where the problem is for an object string, i wrote 'not readable' for it...

Yes, i did it so, may be i was an idiot...

baran


--
Using Opera's mail client: http://www.opera.com/mail/

Reply via email to