Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread grozin

On Tue, 3 Jul 2018, Matt Turner wrote:

On Tue, Jul 3, 2018 at 11:38 AM Rich Freeman  wrote:

4.  by default git tends to accumulate history, which can eat up disk
space.  I imagine this could be automatically trimmed if users wanted,
though during syncing it would at least need to store all the commits
between the last fetched and next-fetched, and that means fetching
things that might have been subsequently removed/changed


This is why I have not switched to git. I have /usr/portage on a
separate 1GB partition (with distfiles and packages stored elsewhere).
The ebuild tree is 600MB with rsync and cannot fit on the partition
with git.

I'd be happy to switch if the space requirements were similar.
Same here. One cannot avoid 3 things: death, taxes and insufficient 
hard-disk space.


Andrey



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Matt Turner
On Tue, Jul 3, 2018 at 12:36 PM Rich Freeman  wrote:
>
> On Tue, Jul 3, 2018 at 12:22 PM Matt Turner  wrote:
> >
> > On Tue, Jul 3, 2018 at 11:38 AM Rich Freeman  wrote:
> > > 4.  by default git tends to accumulate history, which can eat up disk
> > > space.  I imagine this could be automatically trimmed if users wanted,
> > > though during syncing it would at least need to store all the commits
> > > between the last fetched and next-fetched, and that means fetching
> > > things that might have been subsequently removed/changed
> >
> > This is why I have not switched to git. I have /usr/portage on a
> > separate 1GB partition (with distfiles and packages stored elsewhere).
> > The ebuild tree is 600MB with rsync and cannot fit on the partition
> > with git.
> >
>
> git clone https://github.com/gentoo-mirror/gentoo.git . --depth 1
> ...
> du -sh .
> 662M.
>
> So, with a shallow clone it seems comparable.
>
> The issue is getting git to constantly trim, probably along the lines of:
> https://stackoverflow.com/a/34829535

Exactly. I'm not sure git can automatically trim out history on git
pull and I'm even less sure it would be able to do it without
temporarily exceeding 1GB of storage.



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Matt Turner
On Tue, Jul 3, 2018 at 12:33 PM Matthias Maier  wrote:
>
>
> On Tue, Jul  3, 2018, at 11:22 CDT, Matt Turner  wrote:
>
> > I'd be happy to switch if the space requirements were similar.
>
>  $ git clone --depth=1 https://github.com/gentoo-mirror/gentoo
>
> occupies 662M on my machine (just tested). With full history
> (i.e. without --depth=1) I am at 1.1GB.

Wait a week and emerge --sync again; it won't fit.



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Rich Freeman
On Tue, Jul 3, 2018 at 12:41 PM Kristian Fiskerstrand  wrote:
>
> I would expect as much. But my primary argument would be key management 
> related, it is simply impossible to present a raw copy of our repo to 
> end-users and have them verify each commit
>

While related, I think that the question of distribution is still a
fair one.  We can still check an infra key on the head commit with git
distribution.  Granted, if we want to go further than that then the
implementation will vary between git vs rsync distribution because the
signed git metadata is only available easily in git.

-- 
Rich



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Rich Freeman
On Tue, Jul 3, 2018 at 12:34 PM William Hubbs  wrote:
>
> On Tue, Jul 03, 2018 at 11:40:53AM -0400, Rich Freeman wrote:
> > On Tue, Jul 3, 2018 at 11:32 AM Brian Dolbec  wrote:
> > > 2) we have a large infrastructure of rsync mirrors, which we do not for
> > > git.
> > >
> >
> > Do we need them.  I've yet to see somebody complain about poor syncing
> > performance from github.  I imagine we could just use that and a few
> > other free mirroring services to distribute the tree.
>
> I don't feel comfortable relying on github as a primary means of
> distributing the tree due to our social contract. It is a value-added
> kind of service, but we should not rely on it.
>

Do you know that all our existing mirrors are 100% FOSS?

It is a mirror.  You upload something.  Somebody else downloads the same thing.

If we were distributing tarballs via http would we really care if the
mirror is running apache vs IIS?  Do we even check our existing
mirrors for such things?  Do we care that they're running on coreboot
too, without an IME?

Hey, I'm all for having all the mirrors we can, and it isn't like
mirroring git is particularly difficult.  I just think that there is a
double-standard being applied when it comes to get.  I completely get
the argument when it comes to things like issues/PRs/etc since those
aren't distributed, but for git itself you really just need something
that supports the protocol and it is trivial to replace.  Certainly
for anything we host we should use FOSS because it is the cleanest
solution anyway.

-- 
Rich



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Kristian Fiskerstrand
I would expect as much. But my primary argument would be key management 
related, it is simply impossible to present a raw copy of our repo to end-users 
and have them verify each commit
 Original message From: William Hubbs  
Date: 7/3/18  17:39  (GMT+01:00) To: gentoo-dev@lists.gentoo.org Subject: Re: 
[gentoo-dev] rfc: why are we still distributing the portage tree via rsync? 
On Tue, Jul 03, 2018 at 08:32:55AM -0700, Brian Dolbec wrote:
> On Tue, 3 Jul 2018 10:22:35 -0500
> William Hubbs  wrote:
> 
> > All,
> > 
> > Mostly because of the recent "trustless infrastructure" thread, I am
> > wondering why we are still distributing the portage tree primarily
> > via rsync instead of git?
> > 
> > Can someone educate me on that, and is it worth considering moving
> > away from rsync distribution?
> > 
> > Thanks,
> > 
> > William
> > 
> 
> because:
> 
> 1) it is still the most bandwidth economical means of distributing the
> tree
 
 Even more so than http or https?

 Thanks,

 William



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Rich Freeman
On Tue, Jul 3, 2018 at 12:22 PM Matt Turner  wrote:
>
> On Tue, Jul 3, 2018 at 11:38 AM Rich Freeman  wrote:
> > 4.  by default git tends to accumulate history, which can eat up disk
> > space.  I imagine this could be automatically trimmed if users wanted,
> > though during syncing it would at least need to store all the commits
> > between the last fetched and next-fetched, and that means fetching
> > things that might have been subsequently removed/changed
>
> This is why I have not switched to git. I have /usr/portage on a
> separate 1GB partition (with distfiles and packages stored elsewhere).
> The ebuild tree is 600MB with rsync and cannot fit on the partition
> with git.
>

git clone https://github.com/gentoo-mirror/gentoo.git . --depth 1
...
du -sh .
662M.

So, with a shallow clone it seems comparable.

The issue is getting git to constantly trim, probably along the lines of:
https://stackoverflow.com/a/34829535

-- 
Rich



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread William Hubbs
On Tue, Jul 03, 2018 at 11:40:53AM -0400, Rich Freeman wrote:
> On Tue, Jul 3, 2018 at 11:32 AM Brian Dolbec  wrote:
> > 2) we have a large infrastructure of rsync mirrors, which we do not for
> > git.
> >
> 
> Do we need them.  I've yet to see somebody complain about poor syncing
> performance from github.  I imagine we could just use that and a few
> other free mirroring services to distribute the tree.

I don't feel comfortable relying on github as a primary means of
distributing the tree due to our social contract. It is a value-added
kind of service, but we should not rely on it.

William



signature.asc
Description: Digital signature


Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Matthias Maier

On Tue, Jul  3, 2018, at 11:22 CDT, Matt Turner  wrote:

> I'd be happy to switch if the space requirements were similar.

 $ git clone --depth=1 https://github.com/gentoo-mirror/gentoo

occupies 662M on my machine (just tested). With full history
(i.e. without --depth=1) I am at 1.1GB.

Best,
Matthias


signature.asc
Description: PGP signature


Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Matt Turner
On Tue, Jul 3, 2018 at 11:38 AM Rich Freeman  wrote:
> 4.  by default git tends to accumulate history, which can eat up disk
> space.  I imagine this could be automatically trimmed if users wanted,
> though during syncing it would at least need to store all the commits
> between the last fetched and next-fetched, and that means fetching
> things that might have been subsequently removed/changed

This is why I have not switched to git. I have /usr/portage on a
separate 1GB partition (with distfiles and packages stored elsewhere).
The ebuild tree is 600MB with rsync and cannot fit on the partition
with git.

I'd be happy to switch if the space requirements were similar.



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Rich Freeman
On Tue, Jul 3, 2018 at 11:32 AM Brian Dolbec  wrote:
>
> 1) it is still the most bandwidth economical means of distributing the
> tree

Is this true?  If I do two syncs 10min apart, I have to imagine that
less data will get transferred for git.  Certianly there will be less
disk IO.  I think the main issue is when does the crossover happen
because if I sync a year apart git is going to send every file that
was ever added and then removed from the tree in that time.

Also, do we care about bandwidth when there are mirrors that offer it for free?

> 2) we have a large infrastructure of rsync mirrors, which we do not for
> git.
>

Do we need them.  I've yet to see somebody complain about poor syncing
performance from github.  I imagine we could just use that and a few
other free mirroring services to distribute the tree.

While I appreciate all the donors giving us mirrors/etc, it seems like
we would be much more resilient if we didn't require them in the first
place.

-- 
Rich



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread William Hubbs
On Tue, Jul 03, 2018 at 08:32:55AM -0700, Brian Dolbec wrote:
> On Tue, 3 Jul 2018 10:22:35 -0500
> William Hubbs  wrote:
> 
> > All,
> > 
> > Mostly because of the recent "trustless infrastructure" thread, I am
> > wondering why we are still distributing the portage tree primarily
> > via rsync instead of git?
> > 
> > Can someone educate me on that, and is it worth considering moving
> > away from rsync distribution?
> > 
> > Thanks,
> > 
> > William
> > 
> 
> because:
> 
> 1) it is still the most bandwidth economical means of distributing the
> tree
 
 Even more so than http or https?

 Thanks,

 William



signature.asc
Description: Digital signature


Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Rich Freeman
On Tue, Jul 3, 2018 at 11:22 AM William Hubbs  wrote:
>
> Mostly because of the recent "trustless infrastructure" thread, I am
> wondering why we are still distributing the portage tree primarily
> via rsync instead of git?
>
> Can someone educate me on that, and is it worth considering moving away
> from rsync distribution?
>

Here are the pros/cons that I've seen come up in the past:

1.  emerge-webrsync is probably more secure at the moment, because
emerge --sync with git leaves the tree corrupt if it doesn't verify.
That seems like something that could be fixed, and which should be
fixed regardless (presumably somebody just has to do the work - I
can't imagine the portage team would turn away patches).

2.  git seems to be more efficient for frequent syncing, while rsync
seems to be more efficient for infrequest syncing.  I'd guess the
crossover is somewhere around a week or few, but I don't have data to
support that.

3.  we have more rsync mirrors, though with the possibility of using
mirrors like github I don't see why this matters (as long as we
actually secure distribution).

4.  by default git tends to accumulate history, which can eat up disk
space.  I imagine this could be automatically trimmed if users wanted,
though during syncing it would at least need to store all the commits
between the last fetched and next-fetched, and that means fetching
things that might have been subsequently removed/changed

Personally I stick with git.  I want the history anyway, and since I
sync frequently it involves WAY less disk IO and seems to be very
network-efficient as well.

-- 
Rich



Re: [gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread Brian Dolbec
On Tue, 3 Jul 2018 10:22:35 -0500
William Hubbs  wrote:

> All,
> 
> Mostly because of the recent "trustless infrastructure" thread, I am
> wondering why we are still distributing the portage tree primarily
> via rsync instead of git?
> 
> Can someone educate me on that, and is it worth considering moving
> away from rsync distribution?
> 
> Thanks,
> 
> William
> 

because:

1) it is still the most bandwidth economical means of distributing the
tree

2) we have a large infrastructure of rsync mirrors, which we do not for
git.

3) see #1
-- 
Brian Dolbec 



pgpjViOjx5GaR.pgp
Description: OpenPGP digital signature


[gentoo-dev] rfc: why are we still distributing the portage tree via rsync?

2018-07-03 Thread William Hubbs

All,

Mostly because of the recent "trustless infrastructure" thread, I am
wondering why we are still distributing the portage tree primarily
via rsync instead of git?

Can someone educate me on that, and is it worth considering moving away
from rsync distribution?

Thanks,

William



signature.asc
Description: Digital signature