Markus Comments inline:
On 12/09/2018, 16:09, "Markus Neumann" <mneum...@meteomatics.com> wrote: Hi, we are running a Fuseki server that will hold about 2.2 * 10^9 triples of meteorological data eventually. I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 on a 900GB SSD. Now I face several performance issues: 1. Inserting data: It takes more than one hour to upload the measurements of a month (7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of fuseki) Is there a way to do this faster? At a minimum try GZipping the file and uploading it in GZipped form to reduce the amount of data transferred over the network. It is possible that your bottleneck here is actually network upload bandwith rather than anything with Jena itself. I would expect GZip to substantially reduce the file size and hopefully improve your load times. Secondly TDB is typically reported to achieve load speeds of up to around 200k triples/second, although that if for offline bulk loads with SSDs. Even if we assume you could achieve only 25k triples/second that would suggest a theoretical load time of approximately 11 minutes. If you can setup your system so the TDB database is written to an SSD that will improve your performance to some extent. Thirdly TDB is multi reader single writer (MRSW) concurrency so if you have a lot of reads happening while trying to upload, which is a write operation, the write operation will be forced to wait for active readers to finish before proceeding which may introduce some delays. So yes I think you should be able to get faster load times. 2. Updating data: We get new model runs 5 times per day. This is data for the next 10 days, that needs to be updated every time. My idea was to create a named graph "forecast" that holds the latest version of this data. Every time a new model run arrives, I create a new temporary graph to upload the data to. Once this is finished, I move the temporary graph to "forecast". This seems to do the work twice as it takes 1 hour for the upload an 1 hour for the move. Yes this is exactly what happens, the database that backs Fuseki, TDB, is a quads store so it is storing each triple as a quad of GSPO where G is the graph name. So when you move the temporary graph it has to copy all the quads from the source graph to the target graph and then delete that source graph. Rob Our data consists of the following: Locations (total 1607 -> 16070 triples): mm-locations:8500015 a mm:Location ; a geosparql:Geometry ; owl:sameAs <http://lod.opentransportdata.swiss/didok/8500015> ; geosparql:asWKT "POINT(7.61574425031 47.5425915732)"^^geosparql:wktLiteral ; mm:station_name "Basel SBB GB Ost" ; mm:abbreviation "BSGO" ; mm:didok_id 8500015 ; geo:lat 47.54259 ; geo:long 7.61574 ; mm:elevation 273 . Parameters (total 14 -> 56 triples): mm-parameters:t_2m:C a mm:Parameter ; rdfs:label "t_2m:C" ; dcterms:description "Air temperature at 2m above ground in degree Celsius"@en ; mm:unit_symbol "˚C" . Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> 5Mio triples per day): mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ; mm:location mm-locations:8500015 ; mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ; mm:value 15.1 ; mm:parameter mm-parameters:t_2m:C . I would really appreciate if someone could give me some advice on how to handle this tasks or point out things I could do to optimize the organization of the data. Many thanks and kind regards Markus Neumann