afs commented on PR #3753:
URL: https://github.com/apache/jena/pull/3753#issuecomment-3904732930

   `StageGeneratorGeneric` needs your patch - thanks (took v2, and put `if ( 
input.hasNext )` at the start).
   
   PR revised.
   
   > I occasionally got failing query cancel tests on a 14 core machine - 
before and with your PR.
   
   What sort of machine is that?
   
   I have a Dell PC with `12th Gen Intel® Core™ i7-12700 × 20` - 4 P cores, 16 
E cores. Each P core has its own L2 cache and groups of 4 E cores have an L2 
cache. 
   
   I have seen these errors once in several months of a lot of Jena builds 
while removing the deprecated code and other cleaning up for Jena6. The CI on 
Jenkins and github has it quite often when the build systems are busy.
   
   ---
   
   Looks like you have found the cause of both the `jena-integeration-tests` 
and `jena-geosparql` test failures!
   
   It's `QueryIterAbortable` for `jena-integeration-tests` which inherits from 
`QueryIterPlainWrapper`.
   
   However ... :smile: ... this shows another problem.
   It's not just missing the iterator and failing to close it. The code can 
concurrently manipulate the iterator.
   
   `requestCancel` is on another thread.
   The `synchronized` makes `close` / `requestCancel` safe. 
   
   `hasNextBinding` / `moveToNextBinding` / `close` can manipulate the iterator 
safely as they are all on the same thread.
   What Rust might call the owning thread.
   
   `requestCancel` / `hasNextBinding` / `moveToNextBinding` is a mixture of 
threads. While closing due to cancellation, `*NextBinding` may be active and 
there is no protection.
   
   In Rust, this code pattern would not even compile!
   
   So I think using `synchronized` will fix the build crashes but a complete 
fix needs to only perform the `close` on the query thread. Adding 
`synchronized` to `*NextBinding` is very heavy.
   
   As using `synchroized` for `close` / `requestCancel` is step forward, let's 
put it in and see how the CI reacts. In parallel,
   I'll raise an issue to discuss what to do with a permanent fix which I fear 
may impact other iterators. One solution is a volatile so the cancel signal is 
put onto the query execution thread.
   
   I'm currently at the stage of wondering why the whole system does not crash 
more often! This is a common state when working on concurrency bugs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to