Re: Discussion about file format for the future

2020-06-11 Thread Leland Best
Hi All, First let me admit up front to being mostly a "lurker" on this list. Even now with all the new development I've had far more pressing matters to deal with and have not been following along in the detail I should have been. But this issue about file formats really made me (figuratively)

Re: Discussion about file format for the future

2020-06-11 Thread Derek Atkins
"Eric L. Zolf" writes: > Hi, > > to close this discussion, I've created an enhancement request #399 but > don't hold your breath, it's not yet on the priority list. > > KR, Eric > > https://github.com/rdiff-backup/rdiff-backup/issues/399 Thanks. I realize this is a low priority and also

Re: Discussion about file format for the future

2020-06-09 Thread Eric L. Zolf
Hi, to close this discussion, I've created an enhancement request #399 but don't hold your breath, it's not yet on the priority list. KR, Eric https://github.com/rdiff-backup/rdiff-backup/issues/399 On 09/06/2020 22:51, rhkra...@gmail.com wrote: > On Tuesday, June 09, 2020 12:30:24 PM Robert

Re: Discussion about file format for the future

2020-06-09 Thread rhkramer
On Tuesday, June 09, 2020 12:30:24 PM Robert Nichols wrote: > On 6/9/20 9:44 AM, rhkra...@gmail.com wrote: > > In the case of rdiff-back, it wouldn't surprise me that diffs (deltas) > > are stored as forward deltas, and, in removing old deltas, a new "base" > > must be created before deleting the

Re: Discussion about file format for the future

2020-06-09 Thread Robert Nichols
On 6/9/20 9:44 AM, rhkra...@gmail.com wrote: In the case of rdiff-back, it wouldn't surprise me that diffs (deltas) are stored as forward deltas, and, in removing old deltas, a new "base" must be created before deleting the deltas. (My words probably aren't exactly correct, I hope they are

Re: Discussion about file format for the future

2020-06-09 Thread rhkramer
On Tuesday, June 09, 2020 10:19:39 AM Derek Atkins wrote: > EricZolf writes: > > 3. to answer Derek's e-mail as well: would it have an impact on speed? > > To be honest, no clue, we would need to analyze this. > > Just as another data point, apparently a year ago my backup server > wasn't

Re: Discussion about file format for the future

2020-06-09 Thread Dominic Raferd
On Tue, 9 Jun 2020 at 15:28, Derek Atkins wrote: > > EricZolf writes: > > > 3. to answer Derek's e-mail as well: would it have an impact on speed? > > To be honest, no clue, we would need to analyze this. > > Just as another data point, apparently a year ago my backup server > wasn't backing

Re: Discussion about file format for the future

2020-06-09 Thread Derek Atkins
EricZolf writes: > 3. to answer Derek's e-mail as well: would it have an impact on speed? > To be honest, no clue, we would need to analyze this. Just as another data point, apparently a year ago my backup server wasn't backing stuff up, so for the past few days my nightly backup hasn't had

Re: Discussion about file format for the future

2020-06-08 Thread Patrik Dufresne
Just throwing out an idea here: If we are going with a database or some sort of keystore for the metadata. We can still generate one "keystore" file for each revision. But the current mirror metadata could contain the full history.=Making sure we could always go back to the previous keystore

Re: Discussion about file format for the future

2020-06-06 Thread Robert Nichols
On 6/5/20 4:44 PM, Arrigo Marchiori wrote: If we were going to substitute a lot of files with a single file (that is what a SQLite database is in the end, right?) then we may somehow introduce a "single point of failure" for the whole backup. The rdiff-backup archive structure is already a

Re: Discussion about file format for the future

2020-06-05 Thread EricZolf
Hi, allow me to "top-answer" because there are so many threads in this discussion: 1. SPOF (single point of failure) and complexity is definitely something to consider 2. a middle step could be to offer a parameter to tweak the variable `max_diff_chain` in `metadata.py`, e.g. down to 0 or 1 so

Re: Discussion about file format for the future

2020-06-05 Thread Arrigo Marchiori
Dear Patrik, All, I will try to contribute to this interesting conversation. On Fri, Jun 05, 2020 at 08:16:30AM -0400, Patrik Dufresne wrote: > As mentioned by Robert searching for metadata is complex because you need > to scan multiple file to actually find the right value. instead of having a

Re: Discussion about file format for the future

2020-06-05 Thread Derek Atkins
I do wonder... Removing year-old incrementals from backups with lots of small files seems to take a very long time. I don't know if that time is being spent in executing 'rm' or spent updating metadata.. To that end, if it IS spent in metadata, I wonder if something like a SQLite DB would make

Re: Discussion about file format for the future

2020-06-05 Thread Patrik Dufresne
As mentioned by Robert searching for metadata is complex because you need to scan multiple file to actually find the right value. instead of having a query if we were using a database. Obviously performance-wise it's not great either because we need to scan multiple file. The only thing I hate

Re: Discussion about file format for the future

2020-06-04 Thread Robert Nichols
On 6/4/20 11:43 AM, Patrik Dufresne wrote: But two cent on the subject is, should we really keep this filebase ? For rdiffweb, scanning the metadata files is a nightmare. When I just need a subset of the data to be displayed to the user. I always thought a database could be better fit for the

Re: Discussion about file format for the future

2020-06-04 Thread Eric L. Zolf
Hi, that's an interesting idea, I was more in the optic of keeping a simple file based structure but if there are requirements, we can think about alternatives, it could be a SQL or even a noSQL DB, or a keystore. Regarding the terabytes of data, I take the point, I never foresaw to convert the

Re: Discussion about file format for the future

2020-06-04 Thread Patrik Dufresne
Hi Eric, My priority in that regard would be to make sure all of this backward compatible with the previous repository. I mean, I have terabytes of data and I don't foresee a way to convert all these metadata files to a new format. And I'm probably not alone with this situation. We should

Re: Discussion about file format for the future

2020-06-04 Thread rhkramer
Thanks! On Thursday, June 04, 2020 08:37:24 AM EricZolf wrote: > Correct, configuration files and "metadata" files, but also potentially > to exchange information between client and server (e.g. the version > string could be enriched to exchange more information than just > version).

Re: Discussion about file format for the future

2020-06-04 Thread EricZolf
On 04/06/2020 14:20, rhkra...@gmail.com wrote: >* what is the rdiff file format used for? Presumably it is not the format > of > the stored backups, right? Is it the configuration files? Something else? Correct, configuration files and "metadata" files, but also potentially to exchange

Re: Discussion about file format for the future

2020-06-04 Thread rhkramer
Top posting because I'm not really replying to a specific comment in the post. From the peanut gallery (I like sitting here, sometimes ;-), two things: * +1 on opening the dsicussion early * what is the rdiff file format used for? Presumably it is not the format of the stored backups,

Re: Discussion about file format for the future

2020-06-04 Thread EricZolf
Hi, On 04/06/2020 12:54, Dominic Raferd wrote: > On Thu, 4 Jun 2020 at 11:46, EricZolf wrote: >> >> rdiff-backup has currently its own file formats, which are far from >> being standard, meaning a lot of custom code to handle these formats, >> respectively a lot of different files... > > +1 for

Re: Discussion about file format for the future

2020-06-04 Thread Dominic Raferd
On Thu, 4 Jun 2020 at 11:46, EricZolf wrote: > > rdiff-backup has currently its own file formats, which are far from > being standard, meaning a lot of custom code to handle these formats, > respectively a lot of different files... +1 for YAML, but please retain compatibility (at least for

Discussion about file format for the future

2020-06-04 Thread EricZolf
Hi, rdiff-backup has currently its own file formats, which are far from being standard, meaning a lot of custom code to handle these formats, respectively a lot of different files. Middle term I'd like to move to a more standard format for code efficiency reasons: if the code is in a library, I