Re: Updating large amounts of data

Marco Neumann Thu, 13 Sep 2018 01:01:15 -0700

do you make the data endpoint publicly available?

1. did you try the tdbloader, what version of tdb2 do you use?


2. many ways to improve your response time here. what does a typical query
look like? do you make use of the spatial indexer?

and Andy has a work in progress here for more granular updates that might
be of interest to your effort as well: "High Availablity Apache Jena Fuseki"

https://afs.github.io/rdf-delta/ha-fuseki.html


On Wed, Sep 12, 2018 at 4:09 PM Markus Neumann <[email protected]>
wrote:

> Hi,
>
> we are running a Fuseki server that will hold about 2.2 * 10^9 triples of
> meteorological data eventually.
> I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2
> on a 900GB SSD.
>
> Now I face several performance issues:
> 1. Inserting data:
>         It takes more than one hour to upload the measurements of a month
> (7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of
> fuseki)
>         Is there a way to do this faster?
> 2. Updating data:
>         We get new model runs 5 times per day. This is data for the next
> 10 days, that needs to be updated every time.
>         My idea was to create a named graph "forecast" that holds the
> latest version of this data.
>         Every time a new model run arrives, I create a new temporary graph
> to upload the data to. Once this is finished, I move the temporary graph to
> "forecast".
>         This seems to do the work twice as it takes 1 hour for the upload
> an 1 hour for the move.
>
> Our data consists of the following:
>
> Locations (total 1607 -> 16070 triples):
> mm-locations:8500015 a mm:Location ;
>     a geosparql:Geometry ;
>     owl:sameAs <http://lod.opentransportdata.swiss/didok/8500015> ;
>     geosparql:asWKT "POINT(7.61574425031
> 47.5425915732)"^^geosparql:wktLiteral ;
>     mm:station_name "Basel SBB GB Ost" ;
>     mm:abbreviation "BSGO" ;
>     mm:didok_id 8500015 ;
>     geo:lat 47.54259 ;
>     geo:long 7.61574 ;
>     mm:elevation 273 .
>
> Parameters (total 14 -> 56 triples):
> mm-parameters:t_2m:C a mm:Parameter ;
>     rdfs:label "t_2m:C" ;
>     dcterms:description "Air temperature at 2m above ground in degree
> Celsius"@en ;
>     mm:unit_symbol "˚C" .
>
> Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio ->
> 5Mio triples per day):
> mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ;
>     mm:location mm-locations:8500015 ;
>     mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ;
>     mm:value 15.1 ;
>     mm:parameter mm-parameters:t_2m:C .
>
> I would really appreciate if someone could give me some advice on how to
> handle this tasks or point out things I could do to optimize the
> organization of the data.
>
> Many thanks and kind regards
> Markus Neumann
>
>
>

-- 


---
Marco Neumann
KONA

Re: Updating large amounts of data

Reply via email to