Re: Updating large amounts of data

Markus Neumann Fri, 14 Sep 2018 06:19:27 -0700

Hi Andy,

thanks for pointing that out.
what would you recommend as a heap size?


> Am 14.09.2018 um 15:04 schrieb Andy Seaborne <[email protected]>:
> 
> 
> 
> On 12/09/18 16:08, Markus Neumann wrote:
>> Hi,
>> we are running a Fuseki server that will hold about 2.2 * 10^9 triples of 
>> meteorological data eventually.
>> I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 
>> on a 900GB SSD.
> 
> Not sure if this is mentioned later in the thread (I'm in catch-up mode) but 
> for TDB/TDB2, a lot of the workspace isn't in the heap, its the OS file 
> system cache, so a bigger Java heap can actually slow things down.
> 
>    Andy
> 
>> Now I face several performance issues:
>> 1. Inserting data:
>>      It takes more than one hour to upload the measurements of a month 
>> (7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of 
>> fuseki)
>>      Is there a way to do this faster?
>> 2. Updating data:
>>      We get new model runs 5 times per day. This is data for the next 10 
>> days, that needs to be updated every time.
>>      My idea was to create a named graph "forecast" that holds the latest 
>> version of this data.
>>      Every time a new model run arrives, I create a new temporary graph to 
>> upload the data to. Once this is finished, I move the temporary graph to 
>> "forecast".
>>      This seems to do the work twice as it takes 1 hour for the upload an 1 
>> hour for the move.
>> Our data consists of the following:
>> Locations (total 1607 -> 16070 triples):
>> mm-locations:8500015 a mm:Location ;
>>     a geosparql:Geometry ;
>>     owl:sameAs <http://lod.opentransportdata.swiss/didok/8500015> ;
>>     geosparql:asWKT "POINT(7.61574425031 
>> 47.5425915732)"^^geosparql:wktLiteral ;
>>     mm:station_name "Basel SBB GB Ost" ;
>>     mm:abbreviation "BSGO" ;
>>     mm:didok_id 8500015 ;
>>     geo:lat 47.54259 ;
>>     geo:long 7.61574 ;
>>     mm:elevation 273 .
>> Parameters (total 14 -> 56 triples):
>> mm-parameters:t_2m:C a mm:Parameter ;
>>     rdfs:label "t_2m:C" ;
>>     dcterms:description "Air temperature at 2m above ground in degree 
>> Celsius"@en ;
>>     mm:unit_symbol "˚C" .
>> Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> 
>> 5Mio triples per day):
>> mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ;
>>     mm:location mm-locations:8500015 ;
>>     mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ;
>>     mm:value 15.1 ;
>>     mm:parameter mm-parameters:t_2m:C .
>> I would really appreciate if someone could give me some advice on how to 
>> handle this tasks or point out things I could do to optimize the 
>> organization of the data.
>> Many thanks and kind regards
>> Markus Neumann
>>

Re: Updating large amounts of data

Reply via email to