Questions relative to http://jena.apache.org/documentation/tdb/tdb_transactions.html The page states that read transactions do not have any locking or other overhead. In the case of write transactions, it reads as if for every update, a before and after image of the data is written to a separate journal log. It does not say this explicitly, is this true? A read transaction would only see the state of the database as it existed when the transaction was started. If one or more write transactions are committed during a read transaction, the doc does not state when these updates are propagated to the database. If they are propagated before the read transaction finishes, then a before image of the data must be kept and provided to the read transaction. These before images must be timestamped, because multiple update transactions may have committed, serially. As one or more update transactions commit, but before they are propagated to the database, the transactionally consistent state of the database as perceived by other transactions is entirely dependent on when they started. In fact, the state can vary among the different transactions based on both their own start time, and the time when updates were committed to the database (as opposed to when the updates are propagated to the database). For example, assume update transactions UT1 and UT2 are run serially. First UT1 updates data X and performs a commit. Then UT2 begins and reads X, it should see the update made by UT1, even if UT1's updates have not yet been propagated to the database. The doc says that "If a single read transaction runs for a long time when there are many updates, the system will consume a lot of temporary resources." I believe this is due to the before/after image journal. Is the above description correct? In prototyping, Jena "works" without any transactions. If you have a database where you are running code serially, you can read the database, write to the database, without using any transactions at all. Obviously, if you have a series of updates but you encounter an exception that terminates your code before it completes, you run the risk of corrupting the database. You also lose the ability of having concurrent transactions, independently doing reads and/or writes with full isolation. Trying to do this would likely lead to a corrupted database. But once you want concurrency, updates, and isolation, transactions become a necessity. In response to an issue I had yesterday, Andy indicated that the required order of calls to Jena are the following: Dataset dataset = TDBFactory.assembleDataset(TDB_ASSEMBLER_FILE); try { dataset.begin(ReadWrite.WRITE); // or READ Model model = dataset.getNamedModel("modelname"); OntModel omodel = ModelFactory.createOntologyModel( OntModelSpec.OWL_DL_MEM_RULE_INF, model); omodel.prepare(); // perform operations with the model/omodel... // call dataset.commit() or dataset.abort() } finally { dataset.end(); } Presumably, one can perform a series of transactions, as follows: dataset.begin dataset.commit dataset.begin dataset.commit But if the creation of the Model and OntModel need to be done inside of the begin/commit, then the extensive overhead of performing the inferencing must be performed within each transaction. Obviously, if there are updates going on concurrently, this would be the only way for the second transaction above to pick up any updates performed concurrently. But if one "knows" in their system implementation that a given Model (named graph in the dataset) is read-only, that updates will not be performed, is it OK to not use transactions at all? The doc does state you can use a TDB-backed database non-transactionally. If READ transactions truly incur no overhead, is there any reason NOT to enclose them in transactions anyway? And if a given Model is never updated, can one perform a series of READ transactions without incurring the overhead of the OntModel inferencing with each READ transaction? As Andy had pointed out in response to me, the Model and OntModel need to be done in the context of a transaction, which implies they cannot span across transactions. But for READ transactions, or no transactions at all, I have been able to successfully perform them without a problem. The doc states that "A TDB-backed dataset can be used non-transactionally but once used in a transaction, it should be used transactionally after that." Is this due to how TDB determines the latest state of the values? Does this rule only apply within a single JVM use of a TDB dataset, or does it apply even across separate JVM interactions with the TDB dataset? What I am struggling with here is the need and benefit of being able to provide read access to named graphs that are created once and not changed, but these named graphs live in the same TDB database as other named graphs that will have updated transactions and require transactions. Is there an answer to this?
David Jordan Senior Software Developer SAS Institute Inc. Health & Life Sciences, Research & Development Bldg R ▪ Office 4467 600 Research Drive ▪ Cary, NC 27513 Tel: 919 531 1233 ▪ david.jor...@sas.com<mailto:david.jor...@sas.com> www.sas.com<http://www.sas.com/> SAS® … THE POWER TO KNOW®