Hi Markus,
A few comments inline.
Markus Neumann kirjoitti 12.09.2018 klo 18:08:
1. Inserting data:
It takes more than one hour to upload the measurements of a month
(7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of
fuseki)
Is there a way to do this faster?
One thing you may want to investigate is using Fuseki with a HDT
backend. Then you could create the HDT files offline and when the
loading is done, replace the files (maybe you need to restart Fuseki).
I've not actually done this myself though. The HDT support is in the
hdt-jena project which is not in the Apache Jena codebase. Its
maintenance status is a bit unclear.
2. Updating data:
We get new model runs 5 times per day. This is data for the next 10
days, that needs to be updated every time.
My idea was to create a named graph "forecast" that holds the latest
version of this data.
Every time a new model run arrives, I create a new temporary graph to upload the
data to. Once this is finished, I move the temporary graph to "forecast".
This seems to do the work twice as it takes 1 hour for the upload an 1
hour for the move.
Why do you need to move the temporary graph? The PUT operation is atomic
- the data being loaded will only be visible to queries after the whole
operation is complete.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi