Thanks - I’ll look into that, Stephen. I’d been trying it in 50k-triple chunks but the performance isn’t great: it takes about 1 min per request.
> On 16 Jul 2015, at 20:37, Stephen Allen <[email protected]> wrote: > > Hi Ric, > > You could try setting two properties for your dataset: > > <#yourdatasetname> rdf:type tdb:DatasetTDB ; > ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ; > ja:cxtValue "mapped" ] ; > ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue 10000 . > ] . > > The first one will use a temporary memory mapped file for storing > uncommitted TDB blocks. By default the blocks are stored in heap memory. > The second option will cause the update engine to store temporary bindings > generated during the delete operation to be written out to a temporary file > on disk after the specified threshold is passed. For the first option, you > can also use "direct", which will use process heap instead of JVM heap. > > Both of these options should reduce the heap usage and maybe get you to > what you are looking for. Try just the first option (memory mapped blocks) > first and see if that does it, since the second option will likely reduce > performance a bit. > > But Andy's suggestion of breaking up the query with limited subselects > should be working for you, since it also limits the size of the heap. > > -Stephen > > > On Wed, Jul 15, 2015 at 7:04 AM, Ric Roberts <[email protected]> wrote: > >> Hello again. I’ve tried upgrading to Fuseki 1.1.2, and it now gives a heap >> space error after only 6 minutes (instead of 60-odd). >> >> There are a few hundred graphs in the database, but most of them small >> (apart from the one I’m trying to delete). >> >> Does this mean that using the Graph Protocol is a no-go? I’ll try the >> batched deletes... >> >> >>> On 13 Jul 2015, at 22:51, Andy Seaborne <[email protected]> wrote: >>> >>> On 13/07/15 21:31, Andy Seaborne wrote: >>>> Hi Ric, >>>> >>>> Could you please try Fuseki 1.1.2 or Fuseki 2.0.0? >>>> >>>> How many datasets does the server host? >>>> >>>> 1.0.1 was Jan 2014 and IIRC this area has changed, especially DELETE of >>>> a graph with the Graph Store Protocol. However, if this is just due to >>>> transaction overheads (it's not immediately clear it is or is not), then >>>> DELETE {} WHERE { SELECT {...} LIMIT } is the way to go for an immediate >>>> solution. >>>> >>>> TDB1 (i.e. the Jena code) is a bit memory hungry for transactions. >>>> >>>> TDB2 is not memory bound but it isn't in the Jena codebase. It has been >>>> tested with 100 million triple loads in a single Fuseki2 upload. >>>> >>>> See >>>> http://www.sparql.org/validate/update >>> >>> That's the service point. >>> >>> http://www.sparql.org/update-validator.html >>> >>> is the HTML formm. >>> >>>> for checking syntax. >>>> >>>> Andy >>>> >>>> On 13/07/15 18:59, Ric Roberts wrote: >>>>> Hi. I’m having problems deleting a moderately large graph from a >>>>> jena-fuseki-1.0.1 database. >>>>> >>>>> The graph contains approximately 60 million triples, and the database >>>>> contains about 70 million triples in total. >>>>> >>>>> I’ve started Fuseki with 16G Heap. (JVM_ARGS=${JVM_ARGS:—Xmx16000M}). >>>>> The server has 32G RAM. >>>>> >>>>> When I issue the DELETE command over http, I see this in the fuseki >> log: >>>>> >>>>> 16:12:03 INFO [24] DELETE >>>>> http://127.0.0.1:3030/stagingdb/data?graph=http://example.com/graph >>>>> <http://127.0.0.1:3030/stagingdb/data?graph=http://example.com/graph> >>>>> 17:10:40 WARN [24] RC = 500 : Java heap space >>>>> 17:10:40 INFO [24] 500 Java heap space (3,517.614 s) >>>>> >>>>> i.e. it takes about an hour, and then 500s with an error about heap >>>>> space. >>>>> >>>>> I’ve also tried DROP and CLEAR SPARQL update statements but they >>>>> timeout with our default endpoint timeout of 30s. >>>>> >>>>> I’ve also tried deleting 1000 triples at a time, from the graph by >>>>> issuing a sparql update statement like this: >>>>> >>>>> DELETE { >>>>> GRAPH <http://example.com/graph <http://example.com/graph>> >>>>> { ?s ?p ?o } >>>>> } >>>>> WHERE { >>>>> GRAPH <http://example.com/graph <http://example.com/graph>> >>>>> { ?s ?p ?o } >>>>> } >>>>> LIMIT 1000 >>>>> >>>>> … but this times out too (which surprised me, as I only asked it to >>>>> find and DELETE 1000 triples). >>>>> >>>>> What is the recommended way to delete this graph - I need to replace >>>>> its contents fairly urgently on a production system. We loaded it by >>>>> loading 10,000 triples at a time, which worked fine, but I’m having >>>>> trouble deleting its current contents first. >>>>> >>>>> Any pointers appreciated. >>>>> Thanks, Ric. >>>> >>> >> >>
