Updating large amounts of data

Markus Neumann Wed, 12 Sep 2018 08:09:17 -0700

Hi,

we are running a Fuseki server that will hold about 2.2 * 10^9 triples of 
meteorological data eventually.
I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 on a 
900GB SSD.


Now I face several performance issues:
1. Inserting data:
        It takes more than one hour to upload the measurements of a month 
(7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of 
fuseki)
        Is there a way to do this faster? 
2. Updating data:
        We get new model runs 5 times per day. This is data for the next 10 
days, that needs to be updated every time.
        My idea was to create a named graph "forecast" that holds the latest 
version of this data.
        Every time a new model run arrives, I create a new temporary graph to 
upload the data to. Once this is finished, I move the temporary graph to 
"forecast".
        This seems to do the work twice as it takes 1 hour for the upload an 1 
hour for the move.

Our data consists of the following:

Locations (total 1607 -> 16070 triples):
mm-locations:8500015 a mm:Location ;
    a geosparql:Geometry ;
    owl:sameAs <http://lod.opentransportdata.swiss/didok/8500015> ;
    geosparql:asWKT "POINT(7.61574425031 47.5425915732)"^^geosparql:wktLiteral ;
    mm:station_name "Basel SBB GB Ost" ;
    mm:abbreviation "BSGO" ;
    mm:didok_id 8500015 ;
    geo:lat 47.54259 ;
    geo:long 7.61574 ;
    mm:elevation 273 .

Parameters (total 14 -> 56 triples):
mm-parameters:t_2m:C a mm:Parameter ;
    rdfs:label "t_2m:C" ;
    dcterms:description "Air temperature at 2m above ground in degree 
Celsius"@en ;
    mm:unit_symbol "˚C" .

Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> 5Mio 
triples per day):
mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ;
    mm:location mm-locations:8500015 ;
    mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ;
    mm:value 15.1 ;
    mm:parameter mm-parameters:t_2m:C .

I would really appreciate if someone could give me some advice on how to handle 
this tasks or point out things I could do to optimize the organization of the 
data.

Many thanks and kind regards
Markus Neumann

Updating large amounts of data

Reply via email to