Hi Arne, hi Paul,

sorry for the late response and thank you for your suggestions.

@Arne I haven't considered that possibility yet, that sounds intriguing.
The immediate issue I see is that our Fuseki lives in a separate container,
the application only communicates with it through HTTP, so we don't really
have access to its dataset. But there may be ways around that.

@Paul the workflow you describe is how I wish our application would work.
But we use Fuseki as our data persistence, not just as a projection onto
data. So there is no concept of "data sources" or any data transformation,
which I could tweek.

In any case, I think I have an idea what my options are, and will see
from there, thanks!

Best,
Balduin

On Mon, Feb 12, 2024 at 11:59 PM Arne Bernhardt <[email protected]>
wrote:

> Hi Balduin,
>
> I have no experience with the Fuseki persistent storage and maybe I do not
> fully understand the reasons for your current migration workflow but I
> think you can make it much faster and safer by using the Apache Jena Java
> API for TBD.
> The API is hopefully the same for TBD2.
> Proposed workflow:
> - make a backup (optional, but: better safe than sorry)
> - shutdown your fuseki server
> - load your data directly from java via TBD API (
>
> https://jena.apache.org/documentation/tdb/java_api.html#using-a-directory-name
> )
> - perform your migration directly on the persisted data while using
> transactions (
> https://jena.apache.org/documentation/tdb/tdb_transactions.html)
>   -> if anything goes wrong, you can simply abort the transaction and
> nothing has changed
> - commit and end program
> - start the fuseki again
>
> I hope it works out for you.
> Arne
>
>
> Am Fr., 9. Feb. 2024 um 15:18 Uhr schrieb
> <[email protected]>:
>
> > Hi Andy,
> >
> > > If I understand correctly, this is a schema change requiring the data
> to
> > change.
> >
> > Correct, but we don't enforce any schema on the database level (no SHACL
> > involved), that's only done programmatically in the application.
> >
> > > The transformation of the data to the updated data model could be done
> > offline, that would reduce downtime. If the data is being continuously
> > updated, that's harder because the offline copy will get out of step with
> > the live data.
> > > How often does the data change (not due to application logic changes)?
> >
> > The data might theoretically change constantly, so just doing it offline
> > on a copy isn't really possible.
> > A compromise I've been thinking about, which would still be better than
> > downtime, would be a read-only mode for the duration of the migration.
> But
> > for now, the application doesn't support something like this yet.
> > (And if we can get the downtime to something reasonable, that would be
> > good enough.)
> >
> > > Do you have a concrete example of such a change?
> >
> > These changes can vary from very simple to very complex:
> > - The simplest case would maybe be that a certain property that used to
> be
> > optional on a certain type of resource becomes mandatory; for all
> instances
> > where this is not present, a default value needs to be supplied.
> >   => this we could easily do with a SPARQL update.
> > - The most complex case I encountered so far was roughly this:
> >   Given in graph A (representing the data model for a subset of the
> data),
> > a particular statement on something of type P (defining some kind of
> > property) is present, and in graph B (the subset of data corresponding to
> > the model in A), a certain statement holds true for all V (which have a
> > reference to P), then P should be modified. If the statement does not
> hold
> > true for all V, then each V where it does not, must be modified to
> become a
> > more complex object.
> >   (More concretely: V represents a text value. If P says that V may
> > contain markup, then check if any V contains markup. If not, change P to
> > say that it does not contain markup;  if any V contains markup, then all
> Vs
> > that represent text without markup need to be changed to contain text
> with
> > markup. Text without markup here represents a bit of reification around a
> > string literal; text with markup follows a sophisticated standoff markup
> > model, and even if no markup is present, it needs to contain information
> on
> > nature of the markup that is used.)
> >   => this is something I would not know how to, or feel comfortable
> > attempting in SPARQL, so it needs to happen in code.
> >
> > Long story short: Some simple changes I could easily do in SPARQL; the
> > more complex ones would require me to be able to do the changes in code,
> > but it might be possible to have it set up in such a way that the code
> > essentially has read access to the data and can generate update queries
> > from that.
> >
> > Our previous setup worked like this (durations not measured, just from
> > experience):
> > On application start, if a migration needs doing, it won't start right
> > away, but kick that process off:
> > - download an entire dump of fuseki to a file on disk (ca. 20 min.)
> > - load the dump into an in-memory Jena model (10 min. -> plus huge memory
> > consumption that will always grow proportional to our data growing)
> > - perform the migration on the in-memory model (1 sec. - 1 min.)
> > - dump the model to a file on disk
> > - drop all graphs from fuseki (20 min.)
> > - upload the dump into fuseki (20 min.)
> > Then the application would start... so at least 1h downtime, clearly room
> > for improvement.
> > The good thing about this approach is that if the migration fails, the
> > data would not be corrupted because the data loaded in fuseki is not
> > affected.
> >
> > My best bet at this point is to say, we take the risk of data corruption
> > (thank god for backups!) and operate on the live fuseki database. This
> cuts
> > out the time-consuming downloading, uploading etc. and solves the memory
> > issue with loading the entire database into an in-memory model. Then the
> > migration is either just SPARQL or a programmatic series of database
> > interactions leading to update queries.
> > Then we would probably be down from 1h downtime to 1 min, which would be
> a
> > huge improvement.
> >
> > Does that sound reasonable? Are there better ways? Anything I'm missing?
> >
> > Best & thanks! (and sorry for the wall of text)
> > Balduin
> >
> >
> > -----Original Message-----
> > From: Andy Seaborne <[email protected]>
> > Sent: Freitag, 9. Februar 2024 14:12
> > To: [email protected]
> > Subject: Re: Database Migrations in Fuseki
> >
> > Hi Balduin,
> >
> > On 07/02/2024 11:05, Balduin Landolt wrote:
> > > Hi everyone,
> > >
> > > we're storing data in Fuseki as a persistence for our application
> > > backend, the data is structured according to the application logic.
> > > Whenever something changes in our application logic, we have to do a
> > > database migration, so that the data conforms to the updated model.
> > > Our current solution to that is very home-spun, not exactly stable and
> > > comes with a lot of downtime, so we try to avoid it whenever possible.
> >
> > If I understand correctly, this is a schema change requiring the data to
> > change.
> >
> > The transformation of the data to the updated data model could be done
> > offline, that would reduce downtime. If the data is being continuously
> > updated, that's harder because the offline copy will get out of step with
> > the live data.
> >
> > How often does the data change (not due to application logic changes)?
> >
> > > I'm now looking into how this could be improved in the future. My
> > > double question is:
> > > 1) is there any tooling I missed, to help with this process? (In SQL
> > > world for example, there are out of the box solutions for that.)
> > > 2) and if not, more broadly, does anyone have any hints on how I could
> > > best go about this?
> >
> > Do you have a concrete example of such a change? maybe chnage-in-place is
> > possible but that depends on w=howupdates happen, how the dada feeds
> change
> > with the application logic change.
> >
> >      Andy
> >
> > >
> > > Thanks in advance!
> > > Balduin
> > >
> >
> >
> >
> >
>

Reply via email to