Hi Andy, thanks for pointing that out. what would you recommend as a heap size?
> Am 14.09.2018 um 15:04 schrieb Andy Seaborne <[email protected]>: > > > > On 12/09/18 16:08, Markus Neumann wrote: >> Hi, >> we are running a Fuseki server that will hold about 2.2 * 10^9 triples of >> meteorological data eventually. >> I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 >> on a 900GB SSD. > > Not sure if this is mentioned later in the thread (I'm in catch-up mode) but > for TDB/TDB2, a lot of the workspace isn't in the heap, its the OS file > system cache, so a bigger Java heap can actually slow things down. > > Andy > >> Now I face several performance issues: >> 1. Inserting data: >> It takes more than one hour to upload the measurements of a month >> (7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of >> fuseki) >> Is there a way to do this faster? >> 2. Updating data: >> We get new model runs 5 times per day. This is data for the next 10 >> days, that needs to be updated every time. >> My idea was to create a named graph "forecast" that holds the latest >> version of this data. >> Every time a new model run arrives, I create a new temporary graph to >> upload the data to. Once this is finished, I move the temporary graph to >> "forecast". >> This seems to do the work twice as it takes 1 hour for the upload an 1 >> hour for the move. >> Our data consists of the following: >> Locations (total 1607 -> 16070 triples): >> mm-locations:8500015 a mm:Location ; >> a geosparql:Geometry ; >> owl:sameAs <http://lod.opentransportdata.swiss/didok/8500015> ; >> geosparql:asWKT "POINT(7.61574425031 >> 47.5425915732)"^^geosparql:wktLiteral ; >> mm:station_name "Basel SBB GB Ost" ; >> mm:abbreviation "BSGO" ; >> mm:didok_id 8500015 ; >> geo:lat 47.54259 ; >> geo:long 7.61574 ; >> mm:elevation 273 . >> Parameters (total 14 -> 56 triples): >> mm-parameters:t_2m:C a mm:Parameter ; >> rdfs:label "t_2m:C" ; >> dcterms:description "Air temperature at 2m above ground in degree >> Celsius"@en ; >> mm:unit_symbol "˚C" . >> Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> >> 5Mio triples per day): >> mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ; >> mm:location mm-locations:8500015 ; >> mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ; >> mm:value 15.1 ; >> mm:parameter mm-parameters:t_2m:C . >> I would really appreciate if someone could give me some advice on how to >> handle this tasks or point out things I could do to optimize the >> organization of the data. >> Many thanks and kind regards >> Markus Neumann >>
