Hi,
we are running a Fuseki server that will hold about 2.2 * 10^9 triples of
meteorological data eventually.
I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 on a
900GB SSD.
Now I face several performance issues:
1. Inserting data:
It takes more than one hour to upload the measurements of a month
(7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of
fuseki)
Is there a way to do this faster?
2. Updating data:
We get new model runs 5 times per day. This is data for the next 10
days, that needs to be updated every time.
My idea was to create a named graph "forecast" that holds the latest
version of this data.
Every time a new model run arrives, I create a new temporary graph to
upload the data to. Once this is finished, I move the temporary graph to
"forecast".
This seems to do the work twice as it takes 1 hour for the upload an 1
hour for the move.
Our data consists of the following:
Locations (total 1607 -> 16070 triples):
mm-locations:8500015 a mm:Location ;
a geosparql:Geometry ;
owl:sameAs <http://lod.opentransportdata.swiss/didok/8500015> ;
geosparql:asWKT "POINT(7.61574425031 47.5425915732)"^^geosparql:wktLiteral ;
mm:station_name "Basel SBB GB Ost" ;
mm:abbreviation "BSGO" ;
mm:didok_id 8500015 ;
geo:lat 47.54259 ;
geo:long 7.61574 ;
mm:elevation 273 .
Parameters (total 14 -> 56 triples):
mm-parameters:t_2m:C a mm:Parameter ;
rdfs:label "t_2m:C" ;
dcterms:description "Air temperature at 2m above ground in degree
Celsius"@en ;
mm:unit_symbol "˚C" .
Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> 5Mio
triples per day):
mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ;
mm:location mm-locations:8500015 ;
mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ;
mm:value 15.1 ;
mm:parameter mm-parameters:t_2m:C .
I would really appreciate if someone could give me some advice on how to handle
this tasks or point out things I could do to optimize the organization of the
data.
Many thanks and kind regards
Markus Neumann