Hi Andy,

before I contacted this list I was doing some background reading to try to figure out why the BulkUpdateHandler had been deleted, but you know how difficult this can be by searching through mailing list archives. And there is just too much traffic to stay up to date on a daily basis. As a constructive advice, it would be good if the deprecation of BUH would have been properly documented in the source code. Right now it just states "Bulk update operations are going to be removed" without hinting at a proper replacement. My original question was exactly on what will replace it, and it seems like the TransactionHandlers are now responsible for it.

I did some tests but at least for SDB this pattern does not seem to be working as it should.

The example operation is to insert 10k triples into an SDB:

        List<Triple> triples = new LinkedList<Triple>();
        for(int i = 0; i < 10000; i++) {
            triples.add(Triple.create(OWL.Thing.asNode(),
RDFS.seeAlso.asNode(), NodeFactory.createLiteral("" + i)));
        }

Wrapping the add with a TransactionHandler.begin/commit takes 40 seconds (sdb is a GraphSDB):

        {
            sdb.getTransactionHandler().begin();
            GraphUtil.add(sdb, triples);
            sdb.getTransactionHandler().commit();
sdb.find(OWL.Thing.asNode(), RDFS.seeAlso.asNode(), Node.ANY).toList();
        }

While the SDB-specific trick with the event manager takes 1 second:

        {
            sdb.getEventManager().notifyEvent(sdb, GraphEvents.startRead);
            GraphUtil.add(sdb, triples);
            sdb.getEventManager().notifyEvent(sdb, GraphEvents.finishRead);
sdb.find(OWL.Thing.asNode(), RDFS.seeAlso.asNode(), Node.ANY).toList();
        }

(The find at the end is there to verify that the triples are immediately available after write, and not delayed by some thread in the background).

The latter solution (above) uses the same call sequence as SDB's BulkUpdateHandler, which seems to use some background thread with a queue to do the actual writing:

            store.getLoader().startBulkUpdate();
            ...
            store.getLoader().flushTriples();

while the TransactionHandlerSDB does the following

            sqlConnection.setAutoCommit(false) ;
            ...
            sqlConnection.commit() ;
            sqlConnection.setAutoCommit(true) ;

So the two approaches are very different, with the current implementation of TransactionHandlerSDB in my tests much less efficient than the BulkUpdateHandler. Obviously I would prefer to call the BulkUpdateHandler mechanism until this has been resolved (or shown to be my mistake).

Further comment below...

On 9/4/2013 23:22, Andy Seaborne wrote:
Current documentation:

http://jena.apache.org/documentation/sdb/loading_data.html

which includes:

 model.notifyEvent(GraphEvents.startRead)
 ... do add/remove operations ...
 model.notifyEvent(GraphEvents.finishRead)

This design pattern looks weak to me and it looks weird to call a startRead to initiate a write. We furthermore cannot rely on some SDB-specific code as our code shall also work with other back-ends. So I'd much rather call begin/commit using transactions.

TopQuadrant have already released recently.

Yes and this is why we are having a breather now to finally catch up with the new Jena version. Such low-level, risky changes should be done in the beginning of a life cycle.

Currently, as SDB patches come in, I personally try to find time to apply them. I don't have an SDB test system and certainly do not have setups of
each of the databases with SDB adapters.

Such time is my time - my employer doesn't use SDB.

I hear your frustration, and as always your contributions are greatly appreciated. Rest assured that I also have other things to do than tracking down and adjusting our code to changes between Jena versions. So far I have spent one week on this bulk update issue alone. We do have 90 usages of the BulkUpdateHandler in our code, and who knows what other side effects the migration into Transactions will have. So what seems like a good simplification to the API from the perspective of the Jena developers also has downsides to users with a very large, six-year-old code base like ours.

Thanks
Holger

Reply via email to