Dear All
(Not sure if this is really an @dev or @users question)
When Fuseki handles a query (or update), is that query
(or update) handled by a single thread or might it
be handled by multiple threads over the lifetime of
the query (or update)?
I ask because
* we have a TextDocProducer implementation called
TextDocProducerBatch. It (hence) follows the
DatasetChanges interface, tracking adds and
removes and updating a Lucene index.
* The "Batch" part is because it accumulates
quads with the same subject and, when the subject
changes, makes a single Entity for the subject
rather than entities for each quad.
* The accumulating quads are held in a data structure
* It's possible that read queries are running in
parallel with updates. The read queries also
go through the TextDocProducerBatch. To prevent
the read query performing operations on the update
state [1] we're holding the state as a thread-local
variable.
* This is only sound if all the TextDocProducer(Batch)
operations for a given query (or update) are handled
by a single transaction. Which seems plausible but I
can't point to anything that actually says so.
* So: is it the case?
* An alternative I considered was, given that there can
be at most on concurrent write transaction, to only
do perform the batch-and-update-index operations when
inside a write transaction. However, starting from
a TextDocProducerBatch, which is initialised with just
a TextIndex[Lucene] and a DatasetGraph[Transaction],
there doesn't seem any way to find out what the current
transaction is; you can find out that you are (or are
not) *in* a transaction but not whether it's a read
or write [2].
* Have I missed something?
Chris
[1] An actual problem that happened
[2] Yes, we could have a divergent version of Jena with
patches to access the transaction, but then we end
up using SNAPSHOT versions of Jena and gnashing teeth.