Re: transactions and docProducers

Andy Seaborne Sun, 10 Jan 2016 02:20:42 -0800

On 08/01/16 15:34, Chris Dollin wrote:

Dear All


(Not sure if this is really an @dev or @users question)

When Fuseki handles a query (or update), is that query
(or update) handled by a single thread or might it
be handled by multiple threads over the lifetime of
the query (or update)?


Single thread per Fuseki request.

What you seem to be replying on is that the update changes are allhandled by a single thread per transaction, which is true, although forany part that will touch the text index, query and update are bothsingle-threaded.

From experience, just remember to remove the thread local (as well asnulling it out) each transaction otherwise there is memory growth. It'snot bad in Fuseki, threads come from a Jetty-managed pool; but the pooldoes not seem to guarantee to only reuse a fixed number and that itisn't deleting and creating new threads esp under load. That makes thenumber of ThreadLocals grow.


        Andy

[1]
You are using TDB for the triplestore.

I ask because

* we have a TextDocProducer implementation called
   TextDocProducerBatch. It (hence) follows the
   DatasetChanges interface, tracking adds and
   removes and updating a Lucene index.

* The "Batch" part is because it accumulates
   quads with the same subject and, when the subject
   changes, makes a single Entity for the subject
   rather than entities for each quad.

* The accumulating quads are held in a data structure

* It's possible that read queries are running in
   parallel with updates. The read queries also
   go through the TextDocProducerBatch. To prevent
   the read query performing operations on the update
   state [1] we're holding the state as a thread-local
   variable.

* This is only sound if all the TextDocProducer(Batch)
   operations for a given query (or update) are handled
   by a single transaction. Which seems plausible but I
   can't point to anything that actually says so.

* So: is it the case?

* An alternative I considered was, given that there can
   be at most on concurrent write transaction, to only
   do perform the batch-and-update-index operations when
   inside a write transaction. However, starting from
   a TextDocProducerBatch, which is initialised with just
   a TextIndex[Lucene] and a DatasetGraph[Transaction],
   there doesn't seem any way to find out what the current
   transaction is; you can find out that you are (or are
   not) *in* a transaction but not whether it's a read
   or write [2].

* Have I missed something?

Chris

[1] An actual problem that happened

[2] Yes, we could have a divergent version of Jena with
     patches to access the transaction, but then we end
     up using SNAPSHOT versions of Jena and gnashing teeth.

Re: transactions and docProducers

Reply via email to