Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, 3 Jul 2018, Matt Turner wrote: On Tue, Jul 3, 2018 at 11:38 AM Rich Freeman wrote: 4. by default git tends to accumulate history, which can eat up disk space. I imagine this could be automatically trimmed if users wanted, though during syncing it would at least need to store all the commits between the last fetched and next-fetched, and that means fetching things that might have been subsequently removed/changed This is why I have not switched to git. I have /usr/portage on a separate 1GB partition (with distfiles and packages stored elsewhere). The ebuild tree is 600MB with rsync and cannot fit on the partition with git. I'd be happy to switch if the space requirements were similar. Same here. One cannot avoid 3 things: death, taxes and insufficient hard-disk space. Andrey
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018 at 12:36 PM Rich Freeman wrote: > > On Tue, Jul 3, 2018 at 12:22 PM Matt Turner wrote: > > > > On Tue, Jul 3, 2018 at 11:38 AM Rich Freeman wrote: > > > 4. by default git tends to accumulate history, which can eat up disk > > > space. I imagine this could be automatically trimmed if users wanted, > > > though during syncing it would at least need to store all the commits > > > between the last fetched and next-fetched, and that means fetching > > > things that might have been subsequently removed/changed > > > > This is why I have not switched to git. I have /usr/portage on a > > separate 1GB partition (with distfiles and packages stored elsewhere). > > The ebuild tree is 600MB with rsync and cannot fit on the partition > > with git. > > > > git clone https://github.com/gentoo-mirror/gentoo.git . --depth 1 > ... > du -sh . > 662M. > > So, with a shallow clone it seems comparable. > > The issue is getting git to constantly trim, probably along the lines of: > https://stackoverflow.com/a/34829535 Exactly. I'm not sure git can automatically trim out history on git pull and I'm even less sure it would be able to do it without temporarily exceeding 1GB of storage.
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018 at 12:33 PM Matthias Maier wrote: > > > On Tue, Jul 3, 2018, at 11:22 CDT, Matt Turner wrote: > > > I'd be happy to switch if the space requirements were similar. > > $ git clone --depth=1 https://github.com/gentoo-mirror/gentoo > > occupies 662M on my machine (just tested). With full history > (i.e. without --depth=1) I am at 1.1GB. Wait a week and emerge --sync again; it won't fit.
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018 at 12:41 PM Kristian Fiskerstrand wrote: > > I would expect as much. But my primary argument would be key management > related, it is simply impossible to present a raw copy of our repo to > end-users and have them verify each commit > While related, I think that the question of distribution is still a fair one. We can still check an infra key on the head commit with git distribution. Granted, if we want to go further than that then the implementation will vary between git vs rsync distribution because the signed git metadata is only available easily in git. -- Rich
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018 at 12:34 PM William Hubbs wrote: > > On Tue, Jul 03, 2018 at 11:40:53AM -0400, Rich Freeman wrote: > > On Tue, Jul 3, 2018 at 11:32 AM Brian Dolbec wrote: > > > 2) we have a large infrastructure of rsync mirrors, which we do not for > > > git. > > > > > > > Do we need them. I've yet to see somebody complain about poor syncing > > performance from github. I imagine we could just use that and a few > > other free mirroring services to distribute the tree. > > I don't feel comfortable relying on github as a primary means of > distributing the tree due to our social contract. It is a value-added > kind of service, but we should not rely on it. > Do you know that all our existing mirrors are 100% FOSS? It is a mirror. You upload something. Somebody else downloads the same thing. If we were distributing tarballs via http would we really care if the mirror is running apache vs IIS? Do we even check our existing mirrors for such things? Do we care that they're running on coreboot too, without an IME? Hey, I'm all for having all the mirrors we can, and it isn't like mirroring git is particularly difficult. I just think that there is a double-standard being applied when it comes to get. I completely get the argument when it comes to things like issues/PRs/etc since those aren't distributed, but for git itself you really just need something that supports the protocol and it is trivial to replace. Certainly for anything we host we should use FOSS because it is the cleanest solution anyway. -- Rich
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
I would expect as much. But my primary argument would be key management related, it is simply impossible to present a raw copy of our repo to end-users and have them verify each commit Original message From: William Hubbs Date: 7/3/18 17:39 (GMT+01:00) To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync? On Tue, Jul 03, 2018 at 08:32:55AM -0700, Brian Dolbec wrote: > On Tue, 3 Jul 2018 10:22:35 -0500 > William Hubbs wrote: > > > All, > > > > Mostly because of the recent "trustless infrastructure" thread, I am > > wondering why we are still distributing the portage tree primarily > > via rsync instead of git? > > > > Can someone educate me on that, and is it worth considering moving > > away from rsync distribution? > > > > Thanks, > > > > William > > > > because: > > 1) it is still the most bandwidth economical means of distributing the > tree Even more so than http or https? Thanks, William
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018 at 12:22 PM Matt Turner wrote: > > On Tue, Jul 3, 2018 at 11:38 AM Rich Freeman wrote: > > 4. by default git tends to accumulate history, which can eat up disk > > space. I imagine this could be automatically trimmed if users wanted, > > though during syncing it would at least need to store all the commits > > between the last fetched and next-fetched, and that means fetching > > things that might have been subsequently removed/changed > > This is why I have not switched to git. I have /usr/portage on a > separate 1GB partition (with distfiles and packages stored elsewhere). > The ebuild tree is 600MB with rsync and cannot fit on the partition > with git. > git clone https://github.com/gentoo-mirror/gentoo.git . --depth 1 ... du -sh . 662M. So, with a shallow clone it seems comparable. The issue is getting git to constantly trim, probably along the lines of: https://stackoverflow.com/a/34829535 -- Rich
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 03, 2018 at 11:40:53AM -0400, Rich Freeman wrote: > On Tue, Jul 3, 2018 at 11:32 AM Brian Dolbec wrote: > > 2) we have a large infrastructure of rsync mirrors, which we do not for > > git. > > > > Do we need them. I've yet to see somebody complain about poor syncing > performance from github. I imagine we could just use that and a few > other free mirroring services to distribute the tree. I don't feel comfortable relying on github as a primary means of distributing the tree due to our social contract. It is a value-added kind of service, but we should not rely on it. William signature.asc Description: Digital signature
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018, at 11:22 CDT, Matt Turner wrote: > I'd be happy to switch if the space requirements were similar. $ git clone --depth=1 https://github.com/gentoo-mirror/gentoo occupies 662M on my machine (just tested). With full history (i.e. without --depth=1) I am at 1.1GB. Best, Matthias signature.asc Description: PGP signature
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018 at 11:38 AM Rich Freeman wrote: > 4. by default git tends to accumulate history, which can eat up disk > space. I imagine this could be automatically trimmed if users wanted, > though during syncing it would at least need to store all the commits > between the last fetched and next-fetched, and that means fetching > things that might have been subsequently removed/changed This is why I have not switched to git. I have /usr/portage on a separate 1GB partition (with distfiles and packages stored elsewhere). The ebuild tree is 600MB with rsync and cannot fit on the partition with git. I'd be happy to switch if the space requirements were similar.
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018 at 11:32 AM Brian Dolbec wrote: > > 1) it is still the most bandwidth economical means of distributing the > tree Is this true? If I do two syncs 10min apart, I have to imagine that less data will get transferred for git. Certianly there will be less disk IO. I think the main issue is when does the crossover happen because if I sync a year apart git is going to send every file that was ever added and then removed from the tree in that time. Also, do we care about bandwidth when there are mirrors that offer it for free? > 2) we have a large infrastructure of rsync mirrors, which we do not for > git. > Do we need them. I've yet to see somebody complain about poor syncing performance from github. I imagine we could just use that and a few other free mirroring services to distribute the tree. While I appreciate all the donors giving us mirrors/etc, it seems like we would be much more resilient if we didn't require them in the first place. -- Rich
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 03, 2018 at 08:32:55AM -0700, Brian Dolbec wrote: > On Tue, 3 Jul 2018 10:22:35 -0500 > William Hubbs wrote: > > > All, > > > > Mostly because of the recent "trustless infrastructure" thread, I am > > wondering why we are still distributing the portage tree primarily > > via rsync instead of git? > > > > Can someone educate me on that, and is it worth considering moving > > away from rsync distribution? > > > > Thanks, > > > > William > > > > because: > > 1) it is still the most bandwidth economical means of distributing the > tree Even more so than http or https? Thanks, William signature.asc Description: Digital signature
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, Jul 3, 2018 at 11:22 AM William Hubbs wrote: > > Mostly because of the recent "trustless infrastructure" thread, I am > wondering why we are still distributing the portage tree primarily > via rsync instead of git? > > Can someone educate me on that, and is it worth considering moving away > from rsync distribution? > Here are the pros/cons that I've seen come up in the past: 1. emerge-webrsync is probably more secure at the moment, because emerge --sync with git leaves the tree corrupt if it doesn't verify. That seems like something that could be fixed, and which should be fixed regardless (presumably somebody just has to do the work - I can't imagine the portage team would turn away patches). 2. git seems to be more efficient for frequent syncing, while rsync seems to be more efficient for infrequest syncing. I'd guess the crossover is somewhere around a week or few, but I don't have data to support that. 3. we have more rsync mirrors, though with the possibility of using mirrors like github I don't see why this matters (as long as we actually secure distribution). 4. by default git tends to accumulate history, which can eat up disk space. I imagine this could be automatically trimmed if users wanted, though during syncing it would at least need to store all the commits between the last fetched and next-fetched, and that means fetching things that might have been subsequently removed/changed Personally I stick with git. I want the history anyway, and since I sync frequently it involves WAY less disk IO and seems to be very network-efficient as well. -- Rich
Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
On Tue, 3 Jul 2018 10:22:35 -0500 William Hubbs wrote: > All, > > Mostly because of the recent "trustless infrastructure" thread, I am > wondering why we are still distributing the portage tree primarily > via rsync instead of git? > > Can someone educate me on that, and is it worth considering moving > away from rsync distribution? > > Thanks, > > William > because: 1) it is still the most bandwidth economical means of distributing the tree 2) we have a large infrastructure of rsync mirrors, which we do not for git. 3) see #1 -- Brian Dolbec pgpjViOjx5GaR.pgp Description: OpenPGP digital signature
[gentoo-dev] rfc: why are we still distributing the portage tree via rsync?
All, Mostly because of the recent "trustless infrastructure" thread, I am wondering why we are still distributing the portage tree primarily via rsync instead of git? Can someone educate me on that, and is it worth considering moving away from rsync distribution? Thanks, William signature.asc Description: Digital signature