Hi,

I've been using Jena GeoSPARQL, and a dockerised copy of the spatial
indexer (https://github.com/zazuko/spatial-indexer - thanks guys, very
useful!). I've been running Jena GeoSPARQL fine with a spatial index for
smaller datasets.

I have a spatial dataset that is around 160 GB of nquads (uncompressed), of
which, at a guess 5% of the triples are geometry literals. Using the Jena
spatial indexer generates a spatial index close to 4 GB. I've been unable
to start a Jena GeoSPARQL instance for this dataset. I get out of memory
errors on startup. I've tried different heap values. The infrastructure
I've used is the largest AWS Fargate task available, 4 vCPU and 30 GB RAM.

Could anyone hazard a guess as to what infrastructure sizing would be
required for this dataset to run, and/or changes I could make to the
configuration (attached) that might allow it to start.

Thanks
@prefix :       <http://base/#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix ja:     <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb2:   <http://jena.apache.org/2016/tdb#> .
@prefix geosparql: <http://jena.apache.org/geosparql#> .

tdb2:GraphTDB  rdfs:subClassOf  ja:Model .

ja:ModelRDFS  rdfs:subClassOf  ja:Model .

ja:RDFDatasetSink  rdfs:subClassOf  ja:RDFDataset .

<http://jena.hpl.hp.com/2008/tdb#DatasetTDB>
        rdfs:subClassOf  ja:RDFDataset .

tdb2:GraphTDB2  rdfs:subClassOf  ja:Model .

<http://jena.apache.org/text#TextDataset>
        rdfs:subClassOf  ja:RDFDataset .

ja:RDFDatasetZero  rdfs:subClassOf  ja:RDFDataset .

:service_tdb_all  rdf:type            fuseki:Service ;
        rdfs:label                    "TDB2 mydb" ;
        fuseki:dataset                <#mydb> ;
        fuseki:name                   "mydb" ;
        fuseki:serviceQuery           "query" , "" , "sparql" ;
        fuseki:serviceReadGraphStore  "get" ;
        fuseki:serviceReadWriteGraphStore "data" ;
        fuseki:serviceUpdate          "" , "update" .

ja:ViewGraph  rdfs:subClassOf  ja:Model .

ja:GraphRDFS  rdfs:subClassOf  ja:Model .

tdb2:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .

<http://jena.hpl.hp.com/2008/tdb#GraphTDB>
        rdfs:subClassOf  ja:Model .

ja:DatasetTxnMem  rdfs:subClassOf  ja:RDFDataset .

tdb2:DatasetTDB2  rdfs:subClassOf  ja:RDFDataset .

ja:RDFDatasetOne  rdfs:subClassOf  ja:RDFDataset .

ja:MemoryDataset  rdfs:subClassOf  ja:RDFDataset .

<#mydb> rdf:type geosparql:geosparqlDataset ;
  geosparql:spatialIndexFile "/fuseki/databases/mydb/spatial.index";

  # some GeoSPARQL settings
  geosparql:inference            false ;
  geosparql:queryRewrite         false ;
  geosparql:indexEnabled         true ;
  geosparql:applyDefaultGeometry false ;

  # 3 item lists: [Geometry Literal, Geometry Transform, Query Rewrite]
  geosparql:indexSizes           "-1,-1,-1" ;       # Default - unlimited.
  geosparql:indexExpires         "5000,5000,5000" ; # Default - time in milliseconds.

  geosparql:dataset :tdb_dataset_readwrite ;
  .

:tdb_dataset_readwrite
        rdf:type       tdb2:DatasetTDB2 ;
        tdb2:unionDefaultGraph true ;
        tdb2:location  "/fuseki/databases/mydb" .

ja:DatasetRDFS  rdfs:subClassOf  ja:RDFDataset .

<http://jena.apache.org/geosparql#geosparqlDataset>
        rdfs:subClassOf  ja:RDFDataset .

Reply via email to