On 12/09/18 16:08, Markus Neumann wrote:
Hi,

we are running a Fuseki server that will hold about 2.2 * 10^9 triples of 
meteorological data eventually.
I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 on a 
900GB SSD.

Not sure if this is mentioned later in the thread (I'm in catch-up mode) but for TDB/TDB2, a lot of the workspace isn't in the heap, its the OS file system cache, so a bigger Java heap can actually slow things down.

    Andy


Now I face several performance issues:
1. Inserting data:
        It takes more than one hour to upload the measurements of a month 
(7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of 
fuseki)
        Is there a way to do this faster?
2. Updating data:
        We get new model runs 5 times per day. This is data for the next 10 
days, that needs to be updated every time.
        My idea was to create a named graph "forecast" that holds the latest 
version of this data.
        Every time a new model run arrives, I create a new temporary graph to upload the 
data to. Once this is finished, I move the temporary graph to "forecast".
        This seems to do the work twice as it takes 1 hour for the upload an 1 
hour for the move.

Our data consists of the following:

Locations (total 1607 -> 16070 triples):
mm-locations:8500015 a mm:Location ;
     a geosparql:Geometry ;
     owl:sameAs <http://lod.opentransportdata.swiss/didok/8500015> ;
     geosparql:asWKT "POINT(7.61574425031 47.5425915732)"^^geosparql:wktLiteral 
;
     mm:station_name "Basel SBB GB Ost" ;
     mm:abbreviation "BSGO" ;
     mm:didok_id 8500015 ;
     geo:lat 47.54259 ;
     geo:long 7.61574 ;
     mm:elevation 273 .

Parameters (total 14 -> 56 triples):
mm-parameters:t_2m:C a mm:Parameter ;
     rdfs:label "t_2m:C" ;
     dcterms:description "Air temperature at 2m above ground in degree 
Celsius"@en ;
     mm:unit_symbol "˚C" .

Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> 5Mio 
triples per day):
mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ;
     mm:location mm-locations:8500015 ;
     mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ;
     mm:value 15.1 ;
     mm:parameter mm-parameters:t_2m:C .

I would really appreciate if someone could give me some advice on how to handle 
this tasks or point out things I could do to optimize the organization of the 
data.

Many thanks and kind regards
Markus Neumann
        

Reply via email to