Re: SPAM-LOW: Slow clear graph

Rob @ DNR Thu, 14 Nov 2024 03:31:26 -0800

Details matter here e.g. what storage layer is in use? How big is the graph 
being deleted?  How many other graphs (and triples) are in the server as a 
whole?  You say a curl request so can we assume Fuseki?  Are there other 
secondary indices involved e.g. Jena Text?


---

Most Jena storage, i.e. TDB/TDB2, is quad-oriented behind the scenes so when 
you issue a CLEAR GRAPH <uri> (or a DROP GRAPH <uri>) what happens internally 
is that it must scan each index and delete all quads with the relevant <uri> in 
the graph position of the quad.  For indexes where graph is later in the order 
e.g. SPOG these quads could be scattered across the entire index affecting many 
blocks on disk meaning the whole index needs to be read.

For TDB2 which uses copy on write data structures this might also end up 
effectively having to rewrite every single block in the index which for large 
datasets could take an exceedingly long time.

If you have secondary indices involved, e.g. Jena Text, then it is also 
potentially having to make the relevant delete requests to those indices as 
well.

---

So, my guess would be that you have a lot of disk IO happening on your server 
if you happened to look at its resource consumption while the CLEAR GRAPH is 
ongoing?

Rob


From: Mikael Pesonen <mikael.peso...@lingsoft.fi>
Date: Thursday, 14 November 2024 at 09:21
To: users@jena.apache.org <users@jena.apache.org>
Subject: SPAM-LOW: Slow clear graph
Curl command is running now over 24 hours with Jena, what could cause
that? Shouldn't clear graph always be done in few seconds? It's not an
expensive operation?

Re: SPAM-LOW: Slow clear graph

Reply via email to