On Sun, 10 Nov 2019 at 09:32, Alexandre Bergel via Pharo-users < [email protected]> wrote:
> Hi Cyril, > > I tried something to remove some large blob from the history. The code > source of Roassal2 is about 7Mb large, but the .git folder is about 150 Mb! > But at the end, it was the push was rejected because some pullrequests > exist. So, I did not suspect that I had an impact. Sorry about that. > It seems strange it was rejected because some pull request existed. Were you doing it from the command line? Was it something like one of these error messages? Particularly Step 11? https://github.community/t5/How-to-use-Git-and-GitHub/How-to-deal-with-quot-refusing-to-merge-unrelated-histories-quot/td-p/12619 > Help is welcome to shrink Roassal2’s .git folder. > I just cloned Rossal2 and `du -sh .` gave 88M, so its looks like you had some success reducing it. Google found me a way to list large objects... ``` git rev-list --objects --all \ | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \ | sed -n 's/^blob //p' \ | sort --numeric-sort --key=2 \ | cut -c 1-12,41- \ | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest \ > ../list.txt ``` The largest file entry was... e0c5f0885bac 432KiB src/Roassal2/ RTRoassalExample.class.st That file is no longer in the repo, but... $ grep RTRoassalExample.class.st ../list.txt | wc -l ==> 158 and 158 * 432K ==> 68M Found the commit with that blob to examine... $ git log --all --pretty=format:%H -- src/Roassal2/RTRoassalExample.class.st | xargs -n1 -I% sh -c "git ls-tree % -- src/Roassal2/ RTRoassalExample.class.st | grep -q e0c5f0885bac && echo %" ==> a7753aef2a9f14cf5c84da83b8ebff7e4e35f0e9 $ git checkout a7753aef2a9f14cf5c84da83b8ebff7e4e35f0e9 $ vi src/Roassal2/RTRoassalExample.class.st and I see the culprit is icons being encoded directly in the class... { #category : #icons } RTRoassalExample >> exampleAligningGroupsIcon [ ^ 'iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAYAAABw4pVUAAAICElEQVR4XuWaWWhUVxjHI27Q 1oJttfjUusQHQdE+uIFUJYIVBC1VrDv6oKDUd5fGKK3RRgkRFfelaKvR1rpS7YbGFrUWcaHF msYsNMaJSWa9SWbmf3v+9/YmM5M7zj5J/P4wzGQyc+855/edbzuTo+vy1ZWUI38J5AORL/lA 5AORL/lA5AORL/lA5AORL/lA5AORL/lA5Es+EPlA5Es+EPlA5Es+EPlA5Es+EPlA5Es+EPmS D0Q+EPmSD0Q+EPmSD0Q+EPmSD0Q+EPmSD0S+5AORD0S+5AORD0S+5AORD6QratMmoLkZCH3v ...etc One way to reduce those historical file sizes would be running git-filter with an automated way to extract those icons to separate files and add code to load them. Seems hard. Following another path of investigation led me first to to "Git Compression of Blobs and Packfiles" https://gist.github.com/matthewmccullough/2695758 So trying... $ git gc --aggressive $ du -sh . ==> 18M https://stackoverflow.com/questions/28720151/git-gc-aggressive-vs-git-repack led me to the same job being done by... $ git repack -a -d -f --depth=10 --window=250 $ du -sh . ==> 18M My understanding is that this is safe and doesn't affect the commit history. However its just a local result. A few things I read gives the feeling that pushing from that repacked repo won't change anything on the server since it only sends a diff to the server, which then repacks in its own time. Perhaps the only way is ask GIthub Support if they can repack it. https://help.github.com/en/github/working-with-github-support/submitting-a-ticket cheers -ben
