Thanks Andy,

> 2/ Better: run "riot" on the files first to validate them and convert to
> N-Triples, keep the N-Triples output and load those.
>
> Much better to "check then load" than have a large load crash due to bad
> data.
>
> Parsing of complex formats like RDF/XML slows the bulk loader down.
>
>
I followed the above step
1. Validate the RDF/XML, Convert RDF/XML to N-Triples using
*rdfparse*command line tool
3. Load N-Triples output to TDB using *tdbloader *command line tool
*
*
Command: *tdbloader* --loc ~/development/odp-rdf/ content.n3

Loading is finished with three types of warnings

   - {W107} Bad URI:
   - {W131} String not in Unicode Normal Form C:
   - {W121} String is not legal in XML 1.1;

After loading it gives me
Completed: 22,389,276 triples loaded in 4,309.30 seconds [Rate: 5,195.57
per second]

I tried to count the triples using SPARQL query

SELECT (count(*) AS ?count) { ?s ?p ?o }

Triple count = 21669903

Does tdbloader omits loading the tuples with warnings.

Why there is a change in the number of triples


-- 
Regards
Phani. S

Reply via email to