> I hacked (i.e. no checking/setup/params) the data/index scripts to create > s, p, o folders on soft linked three separate devices and moved in the > respective.dat and .idn files, hard linked back to the data-triples.tmp. > and ran the three triple indexes in parallel. sort was parallel 8 and > buffer 8GB. It built the three indexes in the time taken to build one. > > As an aside there are duplicate entries in the data-triples.tmp file, is > this by design? if you sort data-triples.tmp | uniq > it returns a smaller > file and I've checked visually and there are duplicate entries... > > I'll tidy the script and make it available if anyone wants to perform a > tweaked load, only really useful for large datasets.
Yes please, very useful. I feel like these improvements should be documented on the Jena website too so they don't get buried in a mailing list.
