Re: SDB to TDB transition

Rob Vesse Tue, 15 Apr 2014 09:04:43 -0700

David

Answers inline:


On 15/04/2014 08:19, "Lebling, David (US SSA)"
<[email protected]> wrote:

>Rob,
>
>Thanks, that is very helpful. I have successfully installed and run
>Fuseki and populated a simple in-memory database via the Fuseki control
>center page. I also got a sketch of the necessary modifications to my
>existing implementations to where it almost compiles.
>
>I have a couple of questions, though:
>
>1. The DatasetAccessor you get from the factory seems to have methods for
>get, put and delete. Is any reason not to use them instead of your
>recommendation of UpdateExecutionFactory.createRemote()? The way my
>services are used they "never" (well, hardly ever) do anything but
>replace whole named graphs. Would it be more efficient to do things
>piecemeal, which I seem to recall is also possible.

As you note DatasetAccessor only works at the whole graph level, if your
graphs are always relatively small then working at the graph level is
likely easiest particularly if you are already doing complex modifications
on Model/Graph instances in-memory

Doing things piece meal may/may not be more efficient depending on the
complexity of the modification.  If you can express things as SPARQL
Updates then they may run faster depending on how much of the data they
touch.

>
>2. Given a DatasetAccessor is there any way to list the named graphs?
>That's a method available through Dataset.

No, DatasetAccessor was designed to support the SPARQL 1.1 Graph Store
Protocol which unfortunately does not include that functionality.  Of
course you can get that from the store using the following query:

SELECT DISTINCT ?g WHERE { GRAPH ?g { } }

>
>3. Is the Model begin/commit/abort transaction pattern needed when
>dealing with these classes? It's necessary with the old SDB version. If
>it's needed, what Model is the one that is controlled?
>DatasetAccessor.putModel() seems to take an in-memory model and stuff it
>in the TDB store, so there's nothing available to do the transaction on.
>What's the pattern here?

No, Fuseki automatically manages transactions for you if the underlying
dataset is transactional (as TDB is) so each write operation will run in
an atomic transaction.

TDB is MR+SW concurrency so read transactions can continue to proceed
while a write is happening and they don't see the state of the write until
the write is able to persist which only happens once there are no active
readers.  However if you have lots of writes then they will queue up
behind each other as you can only have one active write at any time.

See http://jena.apache.org/documentation/tdb/tdb_transactions.html for
more information on transactions in TDB

>
>4. What is the best way to "reset" a TDB database? (In SDB there was a
>truncate() method that did that.

I'm not sure that there is a simple method call

You can delete the files on disk (though this may not work on Windows due
to JVM and memory mapped file interactions) but you'd also need to clear
TDBs in-memory caches as otherwise you could see strange behaviour.
TDB.reset() should do this for you.

Of course if the TDB instance is being managed by the Fuseki server then
you don't have this option.  One possibility is just to issue the
following update:

DROP ALL

However due to the design of TDB this doesn't necessarily truncate
everything (the node table will be left as is and existing Node IDs
re-used once you start adding new data).

The other possibility is that Andy and Ian have been working on a Fuseki 2
that includes management capabilities for things like adding/removing
databases so it may be possible that once that is ready you can use those
capabilities to achieve this aim.


>
>5. I haven't really gotten into how Assemblers and such are set up. It
>looks very complicated. I will probably have more questions when I get
>that far.

Assemblers are not something I'm familiar with though others on this list
should be able to help you out with any questions you might have.

Rob

>
>Thanks,
>
>Dave
>
>-----Original Message-----
>From: Rob Vesse [mailto:[email protected]]
>Sent: Thursday, April 10, 2014 6:56 PM
>To: [email protected]
>Subject: Re: SDB to TDB transition
>
>Hi David
>
>So the first think to point out which may make things tricky for you is
>that TDB can only be accessed from a single JVM at a single time.  This
>is due to the fact that  it uses memory mapped files, journaling and
>caching so if you try and use it from multiple JVMs you are almost
>guaranteed to corrupt your data.
>
>However the workaround for this is to introduce Fuseki into your
>architecture as your database server.  Fuseki can serve up TDB datasets
>and clients then access them over HTTP.
>
>If you use the standard APIs for remote query
>(QueryExecutionFactory.sparqlService()), remote update
>(UpdateExecutionFactory.createRemote()) and remote graph access
>(DatasetAccessorFactory.createHTTP()) then your application can actually
>be refactored to be agnostic of the backend and you can always switch out
>Fuseki+TDB for another SPARQL compliant server if necessary.
>
>With Fuseki+TDB being able to query the union graph can either be done at
>the server configuration level or by specifying a magic graph URI in your
>queries - <urn:x-arq:UnionGraph> though continuing to use this feature
>will make it harder to migrate off Fuseki+TDB should you ever need to
>since this feature goes beyond standard SPARQL.
>
>I don¹t see why you can¹t keep your interaction roughly the same as you
>have it now you would merely need to change the underlying implementation.
> Of course if your operations are mostly coarse-grained I.e. in terms of
>graphs then the DatasetAccessor API may actually cover much of what you
>need which would make your implementation fairly trivial.
>
>Hope this is enough to get you started, please feel free to ask further
>questions or for clarifications as you explore this,
>
>Cheers,
>
>Rob
>
>On 10/04/2014 12:09, "Lebling, David (US SSA)"
><[email protected]> wrote:
>
>>I am looking at finally biting the bullet and transitioning from SDB to
>>TDB. The first step is to come up with a level-of-effort estimate to
>>see if this fits in our budget.
>>
>>We are using Jena 2.11.0 and SDB 1.4.0. We have a set of five web
>>services (which can be in separate JVMs) that use SDB as an OWL storage
>>device. The items  stored are named graphs. These are read, modified in
>>memory (sometimes through inference, sometimes though pure Java code
>>adding and removing and modifying statements), and written back out. We
>>also use SPARQL queries on the union graph to find graphs of interest.
>>Although currently there are a fairly small number of these named
>>graphs, we want to be able to expand the system to hold a much larger
>>number. One of the stumbling blocks with SDB was a bug in its multi-JVM
>>concurrency code that wasn't fixed due to lack of SDB support.
>>
>>All interactions with the SDB database are through a single class which
>>implements an interface with read, write, delete, find, etc. on named
>>graphs and open and close on the database itself.
>>
>>Any advice on how to go about architecting and implementing a TDB
>>version of the above would be appreciated. More details can be supplied
>>if needed, of course.
>>
>>Thanks,
>>
>>Dave Lebling
>>
>
>
>
>

Re: SDB to TDB transition

Reply via email to