Hi Marco, as this is a project for a customer, I'm afraid we can't make the data public.
1. I'm running Fuseki-3.8.0 with the following configuration: @prefix : <http://base/#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix tdb2: <http://jena.apache.org/2016/tdb#> . @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix fuseki: <http://jena.apache.org/fuseki#> . @prefix spatial: <http://jena.apache.org/spatial#> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> . @prefix geosparql: <http://www.opengis.net/ont/geosparql#> . :service_tdb_all a fuseki:Service ; rdfs:label "TDB2 mm" ; fuseki:dataset :spatial_dataset ; fuseki:name "mm" ; fuseki:serviceQuery "query" , "sparql" ; fuseki:serviceReadGraphStore "get" ; fuseki:serviceReadWriteGraphStore "data" ; fuseki:serviceUpdate "update" ; fuseki:serviceUpload "upload" . :spatial_dataset a spatial:SpatialDataset ; spatial:dataset :tdb_dataset_readwrite ; spatial:index <#indexLucene> ; . <#indexLucene> a spatial:SpatialIndexLucene ; #spatial:directory <file:Lucene> ; spatial:directory "mem" ; spatial:definition <#definition> ; . <#definition> a spatial:EntityDefinition ; spatial:entityField "uri" ; spatial:geoField "geo" ; # custom geo predicates for 1) Latitude/Longitude Format spatial:hasSpatialPredicatePairs ( [ spatial:latitude geo:lat ; spatial:longitude geo:long ] ) ; # custom geo predicates for 2) Well Known Text (WKT) Literal spatial:hasWKTPredicates (geosparql:asWKT) ; # custom SpatialContextFactory for 2) Well Known Text (WKT) Literal spatial:spatialContextFactory # "com.spatial4j.core.context.jts.JtsSpatialContextFactory" "org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory" . :tdb_dataset_readwrite a tdb2:DatasetTDB2 ; tdb2:location "/srv/linked_data_store/fuseki-server/run/databases/mm" . I've been through the Fuseki documentation several times, but I find it still a bit confusing. I would highly appreciate if you could point me to other resources. I have not found the tdbloader in the fuseki repo. For now I use a small shell script that wraps curl to upload the data: if [ ! -z $2 ] then ADD="?graph=http://rdf.meteomatics.com/mm/graphs/$2" fi curl --basic -u user:password -X POST -F "filename=@$1" localhost:3030/mm/data${ADD} 2. Our customer has not specified a default use case yet, as the whole RDF concept is about as new to them as it is to me. I suppose it will be something like "Find all locations in a certain radius that have nice weather next saturday". I just took a glance at the ha-fuseki page and will give it a try later. Many thanks for your time Best Markus > Am 13.09.2018 um 10:00 schrieb Marco Neumann <[email protected]>: > > do you make the data endpoint publicly available? > > 1. did you try the tdbloader, what version of tdb2 do you use? > > 2. many ways to improve your response time here. what does a typical query > look like? do you make use of the spatial indexer? > > and Andy has a work in progress here for more granular updates that might > be of interest to your effort as well: "High Availablity Apache Jena Fuseki" > > https://afs.github.io/rdf-delta/ha-fuseki.html > > > On Wed, Sep 12, 2018 at 4:09 PM Markus Neumann <[email protected]> > wrote: > >> Hi, >> >> we are running a Fuseki server that will hold about 2.2 * 10^9 triples of >> meteorological data eventually. >> I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 >> on a 900GB SSD. >> >> Now I face several performance issues: >> 1. Inserting data: >> It takes more than one hour to upload the measurements of a month >> (7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of >> fuseki) >> Is there a way to do this faster? >> 2. Updating data: >> We get new model runs 5 times per day. This is data for the next >> 10 days, that needs to be updated every time. >> My idea was to create a named graph "forecast" that holds the >> latest version of this data. >> Every time a new model run arrives, I create a new temporary graph >> to upload the data to. Once this is finished, I move the temporary graph to >> "forecast". >> This seems to do the work twice as it takes 1 hour for the upload >> an 1 hour for the move. >> >> Our data consists of the following: >> >> Locations (total 1607 -> 16070 triples): >> mm-locations:8500015 a mm:Location ; >> a geosparql:Geometry ; >> owl:sameAs <http://lod.opentransportdata.swiss/didok/8500015> ; >> geosparql:asWKT "POINT(7.61574425031 >> 47.5425915732)"^^geosparql:wktLiteral ; >> mm:station_name "Basel SBB GB Ost" ; >> mm:abbreviation "BSGO" ; >> mm:didok_id 8500015 ; >> geo:lat 47.54259 ; >> geo:long 7.61574 ; >> mm:elevation 273 . >> >> Parameters (total 14 -> 56 triples): >> mm-parameters:t_2m:C a mm:Parameter ; >> rdfs:label "t_2m:C" ; >> dcterms:description "Air temperature at 2m above ground in degree >> Celsius"@en ; >> mm:unit_symbol "˚C" . >> >> Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> >> 5Mio triples per day): >> mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ; >> mm:location mm-locations:8500015 ; >> mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ; >> mm:value 15.1 ; >> mm:parameter mm-parameters:t_2m:C . >> >> I would really appreciate if someone could give me some advice on how to >> handle this tasks or point out things I could do to optimize the >> organization of the data. >> >> Many thanks and kind regards >> Markus Neumann >> >> >> > > -- > > > --- > Marco Neumann > KONA
