On Sun, 10 Nov 2019 at 09:32, Alexandre Bergel via Pharo-users <
[email protected]> wrote:

> Hi Cyril,
>
> I tried something to remove some large blob from the history. The code
> source of Roassal2 is about 7Mb large, but the .git folder is about 150 Mb!
> But at the end, it was the push was rejected because some pullrequests
> exist. So, I did not suspect that I had an impact. Sorry about that.
>

It seems strange it was rejected because some pull request existed.   Were
you doing it from the command line?
Was it something like one of these error messages? Particularly Step 11?
https://github.community/t5/How-to-use-Git-and-GitHub/How-to-deal-with-quot-refusing-to-merge-unrelated-histories-quot/td-p/12619




> Help is welcome to shrink Roassal2’s .git folder.
>

I just cloned Rossal2 and `du -sh .` gave 88M, so its looks like you had
some success reducing it.
Google found me a way to list large objects...
```
git rev-list --objects --all \
        | git cat-file --batch-check='%(objecttype) %(objectname)
%(objectsize) %(rest)' \
        | sed -n 's/^blob //p' \
        | sort --numeric-sort --key=2 \
        | cut -c 1-12,41- \
        | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i
--suffix=B --padding=7 --round=nearest \
        > ../list.txt
```
The largest file entry was...  e0c5f0885bac  432KiB src/Roassal2/
RTRoassalExample.class.st

That file is no longer in the repo, but...
$ grep RTRoassalExample.class.st ../list.txt  | wc -l
==> 158
and
158 * 432K ==> 68M

Found the commit with that blob to examine...
$ git log --all --pretty=format:%H -- src/Roassal2/RTRoassalExample.class.st
| xargs -n1 -I% sh -c "git ls-tree % -- src/Roassal2/
RTRoassalExample.class.st | grep -q e0c5f0885bac && echo %"
==> a7753aef2a9f14cf5c84da83b8ebff7e4e35f0e9

$ git checkout a7753aef2a9f14cf5c84da83b8ebff7e4e35f0e9
$ vi src/Roassal2/RTRoassalExample.class.st

and I see the culprit is icons being encoded directly in the class...
   { #category : #icons }
   RTRoassalExample >> exampleAligningGroupsIcon [
        ^
'iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAYAAABw4pVUAAAICElEQVR4XuWaWWhUVxjHI27Q
   1oJttfjUusQHQdE+uIFUJYIVBC1VrDv6oKDUd5fGKK3RRgkRFfelaKvR1rpS7YbGFrUWcaHF
   msYsNMaJSWa9SWbmf3v+9/YmM5M7zj5J/P4wzGQyc+855/edbzuTo+vy1ZWUI38J5AORL/lA
   5AORL/lA5AORL/lA5AORL/lA5AORL/lA5Es+EPlA5Es+EPlA5Es+EPlA5Es+EPlA5Es+EPmS
   D0Q+EPmSD0Q+EPmSD0Q+EPmSD0Q+EPmSD0S+5AORD0S+5AORD0S+5AORD6QratMmoLkZCH3v
   ...etc

One way to reduce those historical file sizes would be running git-filter
with an automated way to extract those icons to separate files and add code
to load them.
Seems hard.

Following another path of investigation led me first to to "Git Compression
of Blobs and Packfiles"
https://gist.github.com/matthewmccullough/2695758

So trying...
$ git gc --aggressive
$ du -sh .
==> 18M

https://stackoverflow.com/questions/28720151/git-gc-aggressive-vs-git-repack

led me to the same job being done by...
$ git repack -a -d -f --depth=10 --window=250
$ du -sh .
==> 18M

My understanding is that this is safe and doesn't affect the commit
history.
However its just a local result.  A few things I read gives the feeling
that pushing from that repacked repo won't change anything on the server
since it only sends a diff to the server, which then repacks in its own
time.

Perhaps the only way is ask GIthub Support if they can repack it.
https://help.github.com/en/github/working-with-github-support/submitting-a-ticket


cheers -ben

Reply via email to