Bug#1056103: dgit fills up my disk with .git/dgit/unpack directories

2023-11-16 Thread gregor herrmann
On Thu, 16 Nov 2023 22:31:44 +, Ian Jackson wrote:

> Hi.  Thanks for the report.  I'm sorry that you're finding dgit has
> done xsomething inconvenient.

Thanks for the quick reply!
 
> gregor herrmann writes ("Bug#1056103: dgit fills up my disk with 
> .git/dgit/unpack directories"):
> > Recently I noticed that my usage of `dgit --gbp push-source' leaves
> > .git/dgit/unpack directories in each touched package directory. Which
> > in my case amounts to 1.5 GB below my pkg-perl directory (for more
> > than 1100 dgit-pushed packages since 2019).
> Ah.  I hadn't really considered this kind of use case.  (I obviously
> touch smaller packages, or fewer different packges, or sometbing.)

Yeah, I touch a lot of packages, and it took me some years to find
this issue :)
 
> I think the wasted space ought to be be O(the size of the source
> package), although constant factor may be 2 or 3.  Is this not the
> case for you ?  If there are cases where dgit has wasted much more
> space then that would be straightforwardly buggy I think.

Ack, from what I see, the contents of .git/dgit/unpack is one (the
current) tarball plus the unpacked upstream source.
It's just that this accumulates over > 1000 (potentially 4000 in the
pkg-perl case) packages to some GB.
 
> You're right that these directories are not really needed after dgit
> has completed.  

Great, thanks for the confirmation.

> However, they can be useful for debugging failures.

Hm, yeah, so maybe this "keep temporary artifacts" should be opt-in?

> Another future run of dgit would remove it of course, but would just
> leave another one.  And there's no central tracking and they're hidden
> in .git so you wouldn't normally see them and know to delete them.

Indeed, it took me some years to detect them :)
 
> So, yes, I can see the problem and I agree that something better
> should be done.

Cool, thanks.
 
> I think there are some tradeoffs involved, so may not entirely
> straightforward.  Some thought will be needed.  (Some of the things in
> .git/dgit are hardlinked from elsewhere, so it's not as simple as
> using TMPDIR instead.)

Oh, sure -- from my wild guessing I saw files in .git/dgit which
looked relevant and are also small; but one level below,
.git/dgit/unpack is the thing which needs space …
 
> Perhaps dgit should, by default, clean up this stuff just before it
> exits successfully, but leave it behind for debugging failures.

That sounds very reasonable to me.


Thanks again,
gregor

-- 
 .''`.  https://info.comodo.priv.at -- Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
   `-   


signature.asc
Description: Digital Signature


Bug#1056103: dgit fills up my disk with .git/dgit/unpack directories

2023-11-16 Thread Ian Jackson
Control: tags -1 confirmed

Hi.  Thanks for the report.  I'm sorry that you're finding dgit has
done xsomething inconvenient.

gregor herrmann writes ("Bug#1056103: dgit fills up my disk with 
.git/dgit/unpack directories"):
> Recently I noticed that my usage of `dgit --gbp push-source' leaves
> .git/dgit/unpack directories in each touched package directory. Which
> in my case amounts to 1.5 GB below my pkg-perl directory (for more
> than 1100 dgit-pushed packages since 2019).

Ah.  I hadn't really considered this kind of use case.  (I obviously
touch smaller packages, or fewer different packges, or sometbing.)

I think the wasted space ought to be be O(the size of the source
package), although constant factor may be 2 or 3.  Is this not the
case for you ?  If there are cases where dgit has wasted much more
space then that would be straightforwardly buggy I think.

You're right that these directories are not really needed after dgit
has completed.  However, they can be useful for debugging failures.
Another future run of dgit would remove it of course, but would just
leave another one.  And there's no central tracking and they're hidden
in .git so you wouldn't normally see them and know to delete them.

So, yes, I can see the problem and I agree that something better
should be done.

I think there are some tradeoffs involved, so may not entirely
straightforward.  Some thought will be needed.  (Some of the things in
.git/dgit are hardlinked from elsewhere, so it's not as simple as
using TMPDIR instead.)

Perhaps dgit should, by default, clean up this stuff just before it
exits successfully, but leave it behind for debugging failures.

Thanks,
Ian.

-- 
Ian JacksonThese opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.



Bug#1056103: dgit fills up my disk with .git/dgit/unpack directories

2023-11-16 Thread gregor herrmann
Package: dgit
Version: 11.5
Severity: normal

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Recently I noticed that my usage of `dgit --gbp push-source' leaves
.git/dgit/unpack directories in each touched package directory. Which
in my case amounts to 1.5 GB below my pkg-perl directory (for more
than 1100 dgit-pushed packages since 2019).

My assumption is that `.git/dgit/unpack' is a temporary workspace
with no further relevance. To test this hypothesis, I now went to a
package's source dir, rm'd .git/dgit/unpack, imported the new
upstream release, and ran `dgit --gbp push-source' at the end. And I
didn't notice any unusual output. So I guess the assumption that
`.git/dgit/unpack' can and should be cleaned is not completely crazy.

A quick look at /usr/bin/dgit (quick as I didn't want to read 8000+
lines of code) shows e.g.:

  1877  $playground = fresh_playground 'dgit/unpack';
  2600  changedir $playground;
  2885  rmtree $playground;
  4768  changedir $playground;
  4834  changedir $playground;
  6199  changedir $playground;
  6250  changedir "$playground/work";
  6258  @git, qw(pull --ff-only -q), "$playground/work", qw(master);
  7450  changedir $playground;
  7634  changedir $playground;

i.e. lots of `changedir $playground' but only one `rmtree
$playground'. My _guess_ is that at least one more strategically
placed removal of this probably temporary directory would be called
for.

For the sake of disk space (and backup time -- where I first noticed
the issue), please take a look into this issue.


Cheers,
gregor


-BEGIN PGP SIGNATURE-

iQKTBAEBCgB9FiEE0eExbpOnYKgQTYX6uzpoAYZJqgYFAmVWj65fFIAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEQx
RTEzMTZFOTNBNzYwQTgxMDREODVGQUJCM0E2ODAxODY0OUFBMDYACgkQuzpoAYZJ
qgZ39A/+PCxSeDpMjt9xRtZiY26VGXv10Ss4SMtYzXn4imGdGzEvyUXFQeYrYC7f
nk4dBrB0+5PzE29RvJG6IV99VB+UeTtmiFQyBCTZX9TBvHo16l/5WssVkM3Gt7qa
OW1NoIxcuKAJ+HX/UCVaLSj7P4SUmewSIqY78EAdpOrfoPOvfxsjbG6/cIPdi7Vq
CjzpH890jsq5biKCtinaaDqzpZV/BnbQiaiYEq1vL6aLc5I6y/2kv/ZHfVx9ZDtk
7Kqv7/I34WLveJjnMd3lv/WSnChO8lT1Tx2y8fOoZ461YQ8q1h9g4N083QVx9LSF
itLgiVE6l7AiJ1qQJA4ji6+/DU7q+YlKQIWg78liIzOHcpUtIoYNTQN2vusIuu9E
VS3/clDWycZnnGYO1iN1lloRzI2OjJWMXrXRMtSsHgBJBmoBrizqgek1xXtiM4OI
rzr9rm/aHV5N4Y0uWgR+YLTjBrG+Ly9OXYiATsA49twaM+EtjR19MwVsoE8r4e6g
Jus8yXjlFdXwwe3Bp+W8uhT3QulX2tPu1C+ZMDm+9ZpYHKPbzEvSXMHff/BRWjTc
2rwysLkMmUUL8tE0BOUrUuytm9whsrTu+AVWLpbw4LO2ZLmjDMrN7XvM0GlA3BoJ
TOrUIhxEG0WMc7PpBOFskq4KWC42Dq8iEVPKqLCPaAoeVmd+CTo=
=Dzmz
-END PGP SIGNATURE-