Re: Jena 3.17, TDB performance and behaviour with read-only transaction

Andy Seaborne Sat, 10 Apr 2021 07:04:14 -0700

Marco,

I've reproduced it on my system.


Recorded as:
https://issues.apache.org/jira/browse/JENA-2086

Not sure what is going on but it does not look right (it has been awhilesince I looked at that code).


    Thanks
    Andy


On 08/04/2021 23:44, Zak Mc Kracken wrote:

Hi Andy,

thank you for your quick reply.

On 08/04/2021 18:59, Andy Seaborne wrote:
Now I'm seeing performance problems with the TDB used in read-onlytransactions, as explained by the documentation:
https://github.com/Rothamsted/rdf2pg/blob/44f2bd16b27a6f13f447d1070f6abcea45f3d492/rdf2pg-core/src/main/java/uk/ac/rothamsted/kg/rdf2pg/pgmaker/support/rdf/RdfDataManager.java#L153
Is there an outer transaction?
Not in the case at issue. I wrote it that way cause it's a genericutility, which I happened to use in nested transactions.
(BTW there is "Txn.executeRead" to do this "if in transaction" pattern.)
Thanks, I've rewritten everything with this approach:
https://github.com/Rothamsted/rdf2pg/blob/d240d22a4f237297ae931aaccf8a4db10e3d19c3/rdf2pg-core/src/main/java/uk/ac/rothamsted/kg/rdf2pg/pgmaker/support/rdf/RdfDataManager.java#L155
Now the code is more readable, but it didn't become much faster. AndJournal.sync() is still taking a lot of time. Inside this, the methodthat consumes the consumes most of time issun.nio.ch.FileChannelImpl.force ().
As you can see, the approach is: begin RO transaction, query, endtransaction, all done in parallel threads (8 to 32, depending on theunderlining system).
Using VisualVM, I see the threads running the code above often go inthe "monitor" state, ie, they wait for a Java synchronized object to be
Did you happen to notice on which object they are "synchronized" on?
I can't find a profile that is able to show me that, however, byexclusion, I see synchronized sections are met by the Journal only,namely, by FileChannelImpl.
freed up, most of the time they wait 1-3 seconds for that. While it'shard to know where exactly this happens, I commented all actionsaround and I left the above TDB reading only, and then they blockeach-other more often.
Moreover, VisualVM also allows me to see that the threads spend a lotof time with
*org.apache.jena.dboe.transaction.txn.Transaction.end ()*,
drilling down the later, I can see that*org.apache.jena.dboe.transaction.txn.journal.Journal.sync ()* is themethod consuming most of the time.
That does not sound right.

Do you have a call trace for this?
Please, see the attachment, hope it helps.
In TDB2, a RO operation shoudl eb purely read-only. The writer paysfor everything (unlike TDB1).
I can't get why the sync() above takes so much time. I'm using TDB2
Furthermore, It used to be much faster with past Jena versions (withthe same code):https://github.com/Rothamsted/graphdb-benchmarks#test-results
The version information isn't jumping out of that page.

Which version of Jena?
(and was it TDB1 or TDB2 at the time?)
Sorry. Currently I'm using TDB2 with Jena 3.17. Recently, I've upgradedfrom 3.14 and before this, I remember it was working with good performance.That benchmark I linked was done in 2018 with Jena 3.9.0 (I'vereconstructed that from the git commits, not 100% sure, but, let's say95%).
Thanks again,
Marco.

Re: Jena 3.17, TDB performance and behaviour with read-only transaction

Reply via email to