Thanks Andy,
> 2/ Better: run "riot" on the files first to validate them and convert to
> N-Triples, keep the N-Triples output and load those.
>
> Much better to "check then load" than have a large load crash due to bad
> data.
>
> Parsing of complex formats like RDF/XML slows the bulk loader down.
>
>
I followed the above step
1. Validate the RDF/XML, Convert RDF/XML to N-Triples using
*rdfparse*command line tool
3. Load N-Triples output to TDB using *tdbloader *command line tool
*
*
Command: *tdbloader* --loc ~/development/odp-rdf/ content.n3
Loading is finished with three types of warnings
- {W107} Bad URI:
- {W131} String not in Unicode Normal Form C:
- {W121} String is not legal in XML 1.1;
After loading it gives me
Completed: 22,389,276 triples loaded in 4,309.30 seconds [Rate: 5,195.57
per second]
I tried to count the triples using SPARQL query
SELECT (count(*) AS ?count) { ?s ?p ?o }
Triple count = 21669903
Does tdbloader omits loading the tuples with warnings.
Why there is a change in the number of triples
--
Regards
Phani. S