Re: File versioning based on shallow Git repositories?

2018-04-13 Thread Jakub Narebski
Hello Johannes,

Johannes Schindelin  writes:
> On Fri, 13 Apr 2018, Jakub Narebski wrote:
>> Hallvard Breien Furuseth  writes:
>> 
>>> Also maybe it'll be worthwhile to generate .git/info/grafts in a local
>>> clone of the repo to get back easily visible history.  No grafts in
>>> the original repo, grafts mess things up.
>> 
>> Just a reminder: modern Git has "git replace", a modern and safe
>> alternative to the grafts file.
>
> Right!
>
> Maybe it is time to start deprecating grafts? They *do* cause problems,
> such as weird "missing objects" problems when trying to fetch into, or
> push from, a repository with grafts. These problems are not shared by the
> `git replace` method.

Also you can propagate "git replace" info with clone / fetch / push.

> I just sent out a patch to add a deprecation warning.

Thank you for this.

-- 
Jakub Narębski


Re: File versioning based on shallow Git repositories?

2018-04-13 Thread Johannes Schindelin
Hi Kuba,

On Fri, 13 Apr 2018, Jakub Narebski wrote:

> Hallvard Breien Furuseth  writes:
> 
> > Also maybe it'll be worthwhile to generate .git/info/grafts in a local
> > clone of the repo to get back easily visible history.  No grafts in
> > the original repo, grafts mess things up.
> 
> Just a reminder: modern Git has "git replace", a modern and safe
> alternative to the grafts file.

Right!

Maybe it is time to start deprecating grafts? They *do* cause problems,
such as weird "missing objects" problems when trying to fetch into, or
push from, a repository with grafts. These problems are not shared by the
`git replace` method.

I just sent out a patch to add a deprecation warning.

Ciao,
Dscho


Re: File versioning based on shallow Git repositories?

2018-04-13 Thread Jakub Narebski
Hallvard Breien Furuseth  writes:

> Also maybe it'll be worthwhile to generate .git/info/grafts in a local
> clone of the repo to get back easily visible history.  No grafts in
> the original repo, grafts mess things up.

Just a reminder: modern Git has "git replace", a modern and safe
alternative to the grafts file.

Best,
-- 
Jakub Narębski


Re: File versioning based on shallow Git repositories?

2018-04-12 Thread Hallvard Breien Furuseth

On 12. april 2018 23:07, Rafael Ascensao wrote:

Would initiating a repo with a empty root commit, tag it with 'base' then
use $ git rebase --onto base master@{30 days ago} master;
be viable?


No... my question was confused from the beginning.  With such large files
I _shouldn't_ have history (or grafts), otherwise Git spends a lot of CPU
time creating diffs when I look at a commit, or worse, when I try git log.
Which I discovered quickly when trying real data instead of test-data:-)

Ævar's suggestion was exactly right in that respect.  Thanks again!

--
Hallvard


Re: File versioning based on shallow Git repositories?

2018-04-12 Thread Rafael Ascensao
Would initiating a repo with a empty root commit, tag it with 'base' then

use $ git rebase --onto base master@{30 days ago} master;

be viable?

The --orphan & tag is perhaps more robust, since it's "harder" to move
tags around.

--
Rafael Ascensão


Re: File versioning based on shallow Git repositories?

2018-04-12 Thread Ævar Arnfjörð Bjarmason

On Thu, Apr 12 2018, Hallvard Breien Furuseth wrote:

> On 12. april 2018 20:47, Ævar Arnfjörð Bjarmason wrote:
>> 1. Create a backup.git repo
>> 2. Each time you make a backup, checkout a new orphan branch, see "git
>> checkout --orphan"
>> 3. You copy the files over, commit them, "git log" at this point shows
>> one commit no matter if you've done this before.
>> 4. You create a tag for this backup, e.g. one named after the current
>> time, delete the branch.
>> 5. You then have a retention period for the tags, e.g. only keep the
>> last 30 tags if you do daily backups for 30 days of backups.
>>
>> Then as soon as you delete the tags the old commit will be unreferenced,
>> and you can make git-gc delete the data.
>
> Nice!
> Why the tags though, instead of branches named after the current time?

Because tags are idiomatic in git for a reference that doesn't change,
but sure, if you'd like branches that'll work too.

> One --orphan branch/tag per day with several commits would work for me.
>
> Also maybe it'll be worthwhile to generate .git/info/grafts in a local
> clone of the repo to get back easily visible history.  No grafts in
> the original repo, grafts mess things up.

Maybe, I have not tried this with grafts.


Re: File versioning based on shallow Git repositories?

2018-04-12 Thread Hallvard Breien Furuseth

On 12. april 2018 20:47, Ævar Arnfjörð Bjarmason wrote:

1. Create a backup.git repo
2. Each time you make a backup, checkout a new orphan branch, see "git
checkout --orphan"
3. You copy the files over, commit them, "git log" at this point shows
one commit no matter if you've done this before.
4. You create a tag for this backup, e.g. one named after the current
time, delete the branch.
5. You then have a retention period for the tags, e.g. only keep the
last 30 tags if you do daily backups for 30 days of backups.

Then as soon as you delete the tags the old commit will be unreferenced,
and you can make git-gc delete the data.


Nice!
Why the tags though, instead of branches named after the current time?

One --orphan branch/tag per day with several commits would work for me.

Also maybe it'll be worthwhile to generate .git/info/grafts in a local
clone of the repo to get back easily visible history.  No grafts in
the original repo, grafts mess things up.

--
Hallvard


Re: File versioning based on shallow Git repositories?

2018-04-12 Thread Ævar Arnfjörð Bjarmason

On Thu, Apr 12 2018, Hallvard Breien Furuseth wrote:

> Can I use a shallow Git repo for file versioning, and regularly purge
> history older than e.g. 2 weeks?  Purged data MUST NOT be recoverable.
>
> Or is there a backup tool based on shallow Git cloning which does this?
> Push/pull to another shallow repo would be nice but is not required.
> The files are text files up to 1/4 Gb, usually with few changes.
>
>
> If using Git - I see "git fetch --depth" can shorten history now.
> How do I do that without 'fetch', in the origin repo?
> Also Documentation/technical/shallow.txt describes some caveats, I'm
> not sure how relevant they are.
>
> To purge old data -
>   git config core.logallrefupdates false
>   git gc --prune=now --aggressive
> Anything else?
>
> I'm guessing that without --aggressive, some expired info might be
> deduced from studying the packing of the remaining objects.  Don't
> know if we'll be required to be that paranoid.

The shallow feature is not for this use-case, but there's a much easier
solution that I've used for exactly this use-case, e.g. taking backups
of SQL dumps that delta-compress well, and then throwing out old
backups.

You:

1. Create a backup.git repo
2. Each time you make a backup, checkout a new orphan branch, see "git
   checkout --orphan"
3. You copy the files over, commit them, "git log" at this point shows
   one commit no matter if you've done this before.
4. You create a tag for this backup, e.g. one named after the current
   time, delete the branch.
5. You then have a retention period for the tags, e.g. only keep the
   last 30 tags if you do daily backups for 30 days of backups.

Then as soon as you delete the tags the old commit will be unreferenced,
and you can make git-gc delete the data.

You'll still be able to `git diff` between tags, even though they have
unrelated histories, and the files will still delta-compress.