Hi Andy,
before I contacted this list I was doing some background reading to try
to figure out why the BulkUpdateHandler had been deleted, but you know
how difficult this can be by searching through mailing list archives.
And there is just too much traffic to stay up to date on a daily basis.
As a constructive advice, it would be good if the deprecation of BUH
would have been properly documented in the source code. Right now it
just states "Bulk update operations are going to be removed" without
hinting at a proper replacement. My original question was exactly on
what will replace it, and it seems like the TransactionHandlers are now
responsible for it.
I did some tests but at least for SDB this pattern does not seem to be
working as it should.
The example operation is to insert 10k triples into an SDB:
List<Triple> triples = new LinkedList<Triple>();
for(int i = 0; i < 10000; i++) {
triples.add(Triple.create(OWL.Thing.asNode(),
RDFS.seeAlso.asNode(), NodeFactory.createLiteral(""
+ i)));
}
Wrapping the add with a TransactionHandler.begin/commit takes 40 seconds
(sdb is a GraphSDB):
{
sdb.getTransactionHandler().begin();
GraphUtil.add(sdb, triples);
sdb.getTransactionHandler().commit();
sdb.find(OWL.Thing.asNode(), RDFS.seeAlso.asNode(),
Node.ANY).toList();
}
While the SDB-specific trick with the event manager takes 1 second:
{
sdb.getEventManager().notifyEvent(sdb, GraphEvents.startRead);
GraphUtil.add(sdb, triples);
sdb.getEventManager().notifyEvent(sdb, GraphEvents.finishRead);
sdb.find(OWL.Thing.asNode(), RDFS.seeAlso.asNode(),
Node.ANY).toList();
}
(The find at the end is there to verify that the triples are immediately
available after write, and not delayed by some thread in the background).
The latter solution (above) uses the same call sequence as SDB's
BulkUpdateHandler, which seems to use some background thread with a
queue to do the actual writing:
store.getLoader().startBulkUpdate();
...
store.getLoader().flushTriples();
while the TransactionHandlerSDB does the following
sqlConnection.setAutoCommit(false) ;
...
sqlConnection.commit() ;
sqlConnection.setAutoCommit(true) ;
So the two approaches are very different, with the current
implementation of TransactionHandlerSDB in my tests much less efficient
than the BulkUpdateHandler. Obviously I would prefer to call the
BulkUpdateHandler mechanism until this has been resolved (or shown to be
my mistake).
Further comment below...
On 9/4/2013 23:22, Andy Seaborne wrote:
Current documentation:
http://jena.apache.org/documentation/sdb/loading_data.html
which includes:
model.notifyEvent(GraphEvents.startRead)
... do add/remove operations ...
model.notifyEvent(GraphEvents.finishRead)
This design pattern looks weak to me and it looks weird to call a
startRead to initiate a write. We furthermore cannot rely on some
SDB-specific code as our code shall also work with other back-ends. So
I'd much rather call begin/commit using transactions.
TopQuadrant have already released recently.
Yes and this is why we are having a breather now to finally catch up
with the new Jena version. Such low-level, risky changes should be done
in the beginning of a life cycle.
Currently, as SDB patches come in, I personally try to find time to
apply
them. I don't have an SDB test system and certainly do not have
setups of
each of the databases with SDB adapters.
Such time is my time - my employer doesn't use SDB.
I hear your frustration, and as always your contributions are greatly
appreciated. Rest assured that I also have other things to do than
tracking down and adjusting our code to changes between Jena versions.
So far I have spent one week on this bulk update issue alone. We do have
90 usages of the BulkUpdateHandler in our code, and who knows what other
side effects the migration into Transactions will have. So what seems
like a good simplification to the API from the perspective of the Jena
developers also has downsides to users with a very large, six-year-old
code base like ours.
Thanks
Holger