On 08/01/16 15:34, Chris Dollin wrote:
Dear All
(Not sure if this is really an @dev or @users question)
When Fuseki handles a query (or update), is that query
(or update) handled by a single thread or might it
be handled by multiple threads over the lifetime of
the query (or update)?
Single thread per Fuseki request.
What you seem to be replying on is that the update changes are all
handled by a single thread per transaction, which is true, although for
any part that will touch the text index, query and update are both
single-threaded.
From experience, just remember to remove the thread local (as well as
nulling it out) each transaction otherwise there is memory growth. It's
not bad in Fuseki, threads come from a Jetty-managed pool; but the pool
does not seem to guarantee to only reuse a fixed number and that it
isn't deleting and creating new threads esp under load. That makes the
number of ThreadLocals grow.
Andy
[1]
You are using TDB for the triplestore.
I ask because
* we have a TextDocProducer implementation called
TextDocProducerBatch. It (hence) follows the
DatasetChanges interface, tracking adds and
removes and updating a Lucene index.
* The "Batch" part is because it accumulates
quads with the same subject and, when the subject
changes, makes a single Entity for the subject
rather than entities for each quad.
* The accumulating quads are held in a data structure
* It's possible that read queries are running in
parallel with updates. The read queries also
go through the TextDocProducerBatch. To prevent
the read query performing operations on the update
state [1] we're holding the state as a thread-local
variable.
* This is only sound if all the TextDocProducer(Batch)
operations for a given query (or update) are handled
by a single transaction. Which seems plausible but I
can't point to anything that actually says so.
* So: is it the case?
* An alternative I considered was, given that there can
be at most on concurrent write transaction, to only
do perform the batch-and-update-index operations when
inside a write transaction. However, starting from
a TextDocProducerBatch, which is initialised with just
a TextIndex[Lucene] and a DatasetGraph[Transaction],
there doesn't seem any way to find out what the current
transaction is; you can find out that you are (or are
not) *in* a transaction but not whether it's a read
or write [2].
* Have I missed something?
Chris
[1] An actual problem that happened
[2] Yes, we could have a divergent version of Jena with
patches to access the transaction, but then we end
up using SNAPSHOT versions of Jena and gnashing teeth.