Hi, I've been using Jena GeoSPARQL, and a dockerised copy of the spatial indexer (https://github.com/zazuko/spatial-indexer - thanks guys, very useful!). I've been running Jena GeoSPARQL fine with a spatial index for smaller datasets.
I have a spatial dataset that is around 160 GB of nquads (uncompressed), of which, at a guess 5% of the triples are geometry literals. Using the Jena spatial indexer generates a spatial index close to 4 GB. I've been unable to start a Jena GeoSPARQL instance for this dataset. I get out of memory errors on startup. I've tried different heap values. The infrastructure I've used is the largest AWS Fargate task available, 4 vCPU and 30 GB RAM. Could anyone hazard a guess as to what infrastructure sizing would be required for this dataset to run, and/or changes I could make to the configuration (attached) that might allow it to start. Thanks
@prefix : <http://base/#> . @prefix fuseki: <http://jena.apache.org/fuseki#> . @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb2: <http://jena.apache.org/2016/tdb#> . @prefix geosparql: <http://jena.apache.org/geosparql#> . tdb2:GraphTDB rdfs:subClassOf ja:Model . ja:ModelRDFS rdfs:subClassOf ja:Model . ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset . <http://jena.hpl.hp.com/2008/tdb#DatasetTDB> rdfs:subClassOf ja:RDFDataset . tdb2:GraphTDB2 rdfs:subClassOf ja:Model . <http://jena.apache.org/text#TextDataset> rdfs:subClassOf ja:RDFDataset . ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset . :service_tdb_all rdf:type fuseki:Service ; rdfs:label "TDB2 mydb" ; fuseki:dataset <#mydb> ; fuseki:name "mydb" ; fuseki:serviceQuery "query" , "" , "sparql" ; fuseki:serviceReadGraphStore "get" ; fuseki:serviceReadWriteGraphStore "data" ; fuseki:serviceUpdate "" , "update" . ja:ViewGraph rdfs:subClassOf ja:Model . ja:GraphRDFS rdfs:subClassOf ja:Model . tdb2:DatasetTDB rdfs:subClassOf ja:RDFDataset . <http://jena.hpl.hp.com/2008/tdb#GraphTDB> rdfs:subClassOf ja:Model . ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset . tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset . ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset . ja:MemoryDataset rdfs:subClassOf ja:RDFDataset . <#mydb> rdf:type geosparql:geosparqlDataset ; geosparql:spatialIndexFile "/fuseki/databases/mydb/spatial.index"; # some GeoSPARQL settings geosparql:inference false ; geosparql:queryRewrite false ; geosparql:indexEnabled true ; geosparql:applyDefaultGeometry false ; # 3 item lists: [Geometry Literal, Geometry Transform, Query Rewrite] geosparql:indexSizes "-1,-1,-1" ; # Default - unlimited. geosparql:indexExpires "5000,5000,5000" ; # Default - time in milliseconds. geosparql:dataset :tdb_dataset_readwrite ; . :tdb_dataset_readwrite rdf:type tdb2:DatasetTDB2 ; tdb2:unionDefaultGraph true ; tdb2:location "/fuseki/databases/mydb" . ja:DatasetRDFS rdfs:subClassOf ja:RDFDataset . <http://jena.apache.org/geosparql#geosparqlDataset> rdfs:subClassOf ja:RDFDataset .
