Dear All

(Not sure if this is really an @dev or @users question)

When Fuseki handles a query (or update), is that query
(or update) handled by a single thread or might it
be handled by multiple threads over the lifetime of
the query (or update)?

I ask because

* we have a TextDocProducer implementation called
  TextDocProducerBatch. It (hence) follows the
  DatasetChanges interface, tracking adds and
  removes and updating a Lucene index.

* The "Batch" part is because it accumulates
  quads with the same subject and, when the subject
  changes, makes a single Entity for the subject
  rather than entities for each quad.

* The accumulating quads are held in a data structure

* It's possible that read queries are running in
  parallel with updates. The read queries also
  go through the TextDocProducerBatch. To prevent
  the read query performing operations on the update
  state [1] we're holding the state as a thread-local
  variable.

* This is only sound if all the TextDocProducer(Batch)
  operations for a given query (or update) are handled
  by a single transaction. Which seems plausible but I
  can't point to anything that actually says so.

* So: is it the case?

* An alternative I considered was, given that there can
  be at most on concurrent write transaction, to only
  do perform the batch-and-update-index operations when
  inside a write transaction. However, starting from
  a TextDocProducerBatch, which is initialised with just
  a TextIndex[Lucene] and a DatasetGraph[Transaction],
  there doesn't seem any way to find out what the current
  transaction is; you can find out that you are (or are
  not) *in* a transaction but not whether it's a read
  or write [2].

* Have I missed something?

Chris

[1] An actual problem that happened

[2] Yes, we could have a divergent version of Jena with
    patches to access the transaction, but then we end
    up using SNAPSHOT versions of Jena and gnashing teeth.

Reply via email to