Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?

Andy Seaborne Tue, 30 Aug 2022 03:02:45 -0700

Hi Eugen,

Yes, the backup should be written then atomically moved (i.e. samedirectory). Cleanup would then be "delete" by pattern in the serverstartup script.

As to putting a process script around the functionality, it is anexternal script which needs access to the server file area (to know thestate of backups). The file system state is the definitive state - notthe jobs (that's a UI feature).

This would make a good independent project or contribution. Or publishedexample as a starting point because the requirements will be depend onthe deployment environment and it seems unlikely to me that there is aone size fits all.


Fuseki should make sure it has the right behaviours (like atomic write).

    Andy

autopostgresqlbackup itself is GPL.

On 29/08/2022 11:20, Eugen Stan wrote:

Hello,
We are using fuseki and we would like to implement a backup policysimilar in capabilities to what [autopostgresqlbackup] has to offer.
Are there any existing solutions out there that can do all / part of these?

We would like to take:
* daily backups for a week
* weekly backups - 1 per week, last 4 weeks
* monthly backups - 1/ month, last 6 months


I believe this could be scripted with via the HTTP API + directory access.
The backup api in [fuseki-server-protocol] can trigger a backup and canalso list existing backups.
Unfortunately in the current implementation, backup is not consistent.
In case of a server crash during backup, the file will remain thereincomplete.Also, since tasks are stored in memory and cleaned (periodically / onrestart) there is no way to know for sure if the backup was successfulor not.
In have encountered the above quite often in some workloads.
The in-consistency could be solved by writing the backup to temporaryfile name and on completion, renaming it to final file name.
Rename is usually atomic operation on POSIX file systems.
/backup-list API can list all backups or split backups in complete /incomplete. IMO for now, it can list all of them.
The in progress backup could be stored alongside the other backups witha file marker like: dataset_date.nq.gz.INCOMPLETE .
Once it's done it can be renamed to dataset_date.nq.gz .
Cleanup might be handled externally. In case of a crash, the file willremain INCOMPLETE until it is removed by system by checking a specificamount of time has passed since backup was started (1-2 days).
WDYT?


[autopostgresqlbackup] https://github.com/k0lter/autopostgresqlbackup
[fuseki-server-protocol]https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html
Thanks,

Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?

Reply via email to