transaction and caching

David Jordan Tue, 30 Apr 2013 08:40:08 -0700

Questions relative to 
http://jena.apache.org/documentation/tdb/tdb_transactions.html
The page states that read transactions do not have any locking or other 
overhead. In the case of write transactions, it reads as if for every update, a 
before and after image of the data is written to a separate journal log. It 
does not say this explicitly, is this true?
A read transaction would only see the state of the database as it existed when 
the transaction was started. If one or more write transactions are committed 
during a read transaction, the doc does not state when these updates are 
propagated to the database. If they are propagated before the read transaction 
finishes, then a before image of the data must be kept and provided to the read 
transaction. These before images must be timestamped, because multiple update 
transactions may have committed, serially. As one or more update transactions 
commit, but before they are propagated to the database, the transactionally 
consistent state of the database as perceived by other transactions is entirely 
dependent on when they started. In fact, the state can vary among the different 
transactions based on both their own start time, and the time when updates were 
committed to the database (as opposed to when the updates are propagated to the 
database). For example, assume update transactions UT1 and UT2 are run 
serially. First UT1 updates data X and performs a commit. Then UT2 begins and 
reads X, it should see the update made by UT1, even if UT1's updates have not 
yet been propagated to the database. The doc says that "If a single read 
transaction runs for a long time when there are many updates, the system will 
consume a lot of temporary resources." I believe this is due to the 
before/after image journal. Is the above description correct?
In prototyping, Jena "works" without any transactions. If you have a database 
where you are running code serially, you can read the database, write to the 
database, without using any transactions at all. Obviously, if you have a 
series of updates but you encounter an exception that terminates your code 
before it completes, you run the risk of corrupting the database. You also lose 
the ability of having concurrent transactions, independently doing reads and/or 
writes with full isolation. Trying to do this would likely lead to a corrupted 
database. But once you want concurrency, updates, and isolation, transactions 
become a necessity.
In response to an issue I had yesterday, Andy indicated that the required order 
of calls to Jena are the following:
Dataset dataset = TDBFactory.assembleDataset(TDB_ASSEMBLER_FILE);
try {
       dataset.begin(ReadWrite.WRITE);  // or READ
       Model model = dataset.getNamedModel("modelname");
       OntModel omodel = ModelFactory.createOntologyModel(
                           OntModelSpec.OWL_DL_MEM_RULE_INF, model);
       omodel.prepare();
       // perform operations with the model/omodel...
       // call dataset.commit() or dataset.abort()
} finally {
       dataset.end();
}
Presumably, one can perform a series of transactions, as follows:
            dataset.begin
            dataset.commit
            dataset.begin
            dataset.commit
But if the creation of the Model and OntModel need to be done inside of the 
begin/commit, then the extensive overhead of performing the inferencing must be 
performed within each transaction. Obviously, if there are updates going on 
concurrently, this would be the only way for the second transaction above to 
pick up any updates performed concurrently.
But if one "knows" in their system implementation that a given Model (named 
graph in the dataset) is read-only, that updates will not be performed, is it 
OK to not use transactions at all? The doc does state you can use a TDB-backed 
database non-transactionally.
If READ transactions truly incur no overhead, is there any reason NOT to 
enclose them in transactions anyway? And if a given Model is never updated, can 
one perform a series of READ transactions without incurring the overhead of the 
OntModel inferencing with each READ transaction? As Andy had pointed out in 
response to me, the Model and OntModel need to be done in the context of a 
transaction, which implies they cannot span across transactions. But for READ 
transactions, or no transactions at all, I have been able to successfully 
perform them without a problem.
The doc states that "A TDB-backed dataset can be used non-transactionally but 
once used in a transaction, it should be used transactionally after that." Is 
this due to how TDB determines the latest state of the values? Does this rule 
only apply within a single JVM use of a TDB dataset, or does it apply even 
across separate JVM interactions with the TDB dataset?
What I am struggling with here is the need and benefit of being able to provide 
read access to named graphs that are created once and not changed, but these 
named graphs live in the same TDB database as other named graphs that will have 
updated transactions and require transactions. Is there an answer to this?


David Jordan
Senior Software Developer
SAS Institute Inc.
Health & Life Sciences, Research & Development
Bldg R ▪ Office 4467
600 Research Drive ▪ Cary, NC 27513
Tel: 919 531 1233 ▪ david.jor...@sas.com<mailto:david.jor...@sas.com>
www.sas.com<http://www.sas.com/>
SAS® … THE POWER TO KNOW®

transaction and caching

Reply via email to