Hi Markus,

A few comments inline.

Markus Neumann kirjoitti 12.09.2018 klo 18:08:
1. Inserting data:
        It takes more than one hour to upload the measurements of a month 
(7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of 
fuseki)
        Is there a way to do this faster?

One thing you may want to investigate is using Fuseki with a HDT backend. Then you could create the HDT files offline and when the loading is done, replace the files (maybe you need to restart Fuseki). I've not actually done this myself though. The HDT support is in the hdt-jena project which is not in the Apache Jena codebase. Its maintenance status is a bit unclear.

2. Updating data:
        We get new model runs 5 times per day. This is data for the next 10 
days, that needs to be updated every time.
        My idea was to create a named graph "forecast" that holds the latest 
version of this data.
        Every time a new model run arrives, I create a new temporary graph to upload the 
data to. Once this is finished, I move the temporary graph to "forecast".
        This seems to do the work twice as it takes 1 hour for the upload an 1 
hour for the move.
Why do you need to move the temporary graph? The PUT operation is atomic - the data being loaded will only be visible to queries after the whole operation is complete.

-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to