Re: Report on loading wikidata

Laura Morales Tue, 12 Dec 2017 02:13:46 -0800

> I hacked (i.e. no checking/setup/params) the data/index scripts to create
> s, p, o folders on soft linked three separate devices and moved in the
> respective.dat and .idn files, hard linked back to the data-triples.tmp.
> and ran the three triple indexes in parallel. sort was parallel 8 and
> buffer 8GB. It built the three indexes in the time taken to build one.
> 
> As an aside there are duplicate entries in the data-triples.tmp file, is
> this by design? if you sort data-triples.tmp | uniq > it returns a smaller
> file and I've checked visually and there are duplicate entries...
> 
> I'll tidy the script and make it available if anyone wants to perform a
> tweaked load, only really useful for large datasets.


Yes please, very useful. I feel like these improvements should be documented on 
the Jena website too so they don't get buried in a mailing list.

Re: Report on loading wikidata

Reply via email to