Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-27 Thread Patrick Lauer
On 02/27/2016 11:50 PM, Robin H. Johnson wrote:
> On Sat, Feb 27, 2016 at 02:14:12PM +0100, Luca Barbato wrote:
>> On 24/02/16 01:33, Duncan wrote:
>>> That option is there, and indeed, a patch providing it was specifically 
>>> added to portage for infra to use, because appending entries to existing 
>>> files is vastly easier and more performant than trying to prepend entries 
>>> and having to rewrite the entire file as a result.
>> This sounds wrong in many different ways. The changelog files are tiny
>> and makes next to no difference truncate+write or append.
> Prior to seperating ChangeLog files into years, this was way worse:
> a kernel bump present in any of gentoo-sources, hardened-sources,
> vanilla-sources meant another 100k of data to sent. It's not a lot
> overall, but here's some quick stats from one of our rsync servers, on
> bytes sent.
[snip]
>
> So, now the question:
> If we use appending changelogs, the large changelogs only differ by a
> few hundred bytes. If we instead have to rewrite them, it's 50k+ per
> changelog.
from /usr/share/portage/config/make.globals:

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times
--omit-dir-times --compress --force --whole-file --delete --stats
--human-readable --timeout=180 --exclude=/distfiles --exclude=/local
--exclude=/packages --exclude=/.git"

Notice the --whole-file part there.

>
> For each 50k changelog, the median transfer would get 0.25% larger.
>
Well, we could just have less changes ;)

160GB/day per server is about 2MB/s, ~16Mbit, or about 5TB/month. That's
still included in the 'free' bandwidth that el cheapo hosters like
Hetzner provide with their smallest servers ...



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-27 Thread Robin H. Johnson
On Sat, Feb 27, 2016 at 02:14:12PM +0100, Luca Barbato wrote:
> On 24/02/16 01:33, Duncan wrote:
> > That option is there, and indeed, a patch providing it was specifically 
> > added to portage for infra to use, because appending entries to existing 
> > files is vastly easier and more performant than trying to prepend entries 
> > and having to rewrite the entire file as a result.
> This sounds wrong in many different ways. The changelog files are tiny
> and makes next to no difference truncate+write or append.
Prior to seperating ChangeLog files into years, this was way worse:
a kernel bump present in any of gentoo-sources, hardened-sources,
vanilla-sources meant another 100k of data to sent. It's not a lot
overall, but here's some quick stats from one of our rsync servers, on
bytes sent.

Stats for Feb 25, from one of the 3 primary rsync.g.o servers, on the
'bytes sent' output from rsyncd.

rsyncd example output:
Feb 25 00:03:17 quetzal rsyncd[27280]: sent 4930260 bytes  received 32215 bytes 
 total size 408174052

3909 entries.

Min RAW size: 4833709 bytes [1]
Median RAW size: 22436094 bytes.
Mean RAW size: 45652781 bytes.
Sum of RAW size: 178456721459 bytes = ~166GiB (per day!)

The min possible transfer size is forcing an rsync with no changes; it
just sends the metadata about the files (path, mtime, size, etc).

Let's subtract that from all the rest of the entries, to get stats about
the data transfer.

Median data size: 17602385 bytes
Mean data size: 40819072 bytes

So, now the question:
If we use appending changelogs, the large changelogs only differ by a
few hundred bytes. If we instead have to rewrite them, it's 50k+ per
changelog.

For each 50k changelog, the median transfer would get 0.25% larger.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
E-Mail : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-27 Thread Raymond Jennings
Especially if the changelog files are broken up by year or so.

On Sat, Feb 27, 2016 at 5:14 AM, Luca Barbato  wrote:

> On 24/02/16 01:33, Duncan wrote:
> > That option is there, and indeed, a patch providing it was specifically
> > added to portage for infra to use, because appending entries to existing
> > files is vastly easier and more performant than trying to prepend entries
> > and having to rewrite the entire file as a result.
>
> This sounds wrong in many different ways. The changelog files are tiny
> and makes next to no difference truncate+write or append.
>
> lu
>
>


Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-26 Thread Rich Freeman
On Fri, Feb 26, 2016 at 7:59 AM, Martin Vaeth  wrote:
> Rich Freeman  wrote:
>>>
>>> And currently the git history is still almost empty...
>>>
>>
>> If you want pre-migration history you need to fetch that separately.
>
> How? Neither on gitweb.gentoo.org nor on github I found an obvious
> repository with this data.

https://wiki.gentoo.org/wiki/Gentoo_git_workflow#Grafting_Gentoo_History_Onto_the_Active_Repo

If you're interested in history it is easy to do, and the repo on
github works fine for web access or the various github stats/etc.
Well, sort-of - I get the impression that github doesn't host a lot of
repos with that much history and when you push that repo to github for
the first time it will timeout and die and the repo will appear on the
site 30-60min later (I imagine subsequent pushes would be fine).  I
think we actually have one of the largest git repos out there in terms
of number of objects.  At least, when I was keeping tabs on other
migration efforts there weren't many that came close (including some
projects that you'd think of as having a lot of history).  The fact
that every package revision+patch+etc is a file in Gentoo is a big
part of that.

>
>> It is about 1.7G.
>> Considering that this represents a LOT more than 2-3 years of history
>
> If the 1.7G are fully compressed history, this would confirm
> my estimate rather precisely, if it represents (1700/120 - 1) ~ 13 years.

Perhaps I misread your post then.  I saw lots of numbers but not many
units, and I probably didn't follow what you intended to say.

>
> Note that I compared squashfs with a git user who does not even
> care about git-internal recompression. Of course, you can decrease
> the factor somewhat if e.g. your checked-out tree is still stored
> on squashfs. This does not change the fact that the factor will
> increase every year by about 1 (or probably more, because git
> uses the uneffective gzip compression, only).
>

A checkout of gentoo-x86 is about 590M.  If you use the repo that
includes cache/etc it expands to 1.2G.  13 years of history is 1.7G.
Clearly it doesn't increase by a factor of 1 every year, unless again
I'm misunderstanding what you're intending to communicate.

A git checkout consists of two parts.  It has the .git directory which
contains all the data, and it consists of the working tree.  In the
case of gentoo-x86 the working tree is about 440MB and the history is
about 150M.

The working tree doesn't really change in size much - it just reflects
the size of the current revision of the tree. It is also not
compressed (unless you stick the whole thing in a squashfs, which you
could certainly do).  It is the history which continuously grows.
However, the history IS compressed and the reality is that most new
ebuilds are similar to ebuilds that are already in the history, so it
compresses very well.  Of course it would be nice if you could use
something other than gzip to compress it.

There is no reason that somebody couldn't distribute squashfs versions
of a git /usr/portage, but if you want the full history it would still
be around 1.7G.  It would still be smaller than a checked-out tree
(the 1.7G figure is just history - it doesn't include the extra 440MB
or so for the checkout).

My point wasn't so much that there aren't sized benefits to squashfs
and no history.  I'm just saying that git is pretty efficient for what
it does do.

-- 
Rich



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-26 Thread Rich Freeman
On Fri, Feb 26, 2016 at 6:00 AM, Martin Vaeth  wrote:
>
> And currently the git history is still almost empty...
>

If you want pre-migration history you need to fetch that separately.
It is about 1.7G.

Considering that this represents a LOT more than 2-3 years of history
(including periods where the commit rate was higher than it is today)
I think your estimates of where the migrated repo will be in 2-3 years
is too high.  It will of course be larger than the space required for
an rsync squashfs.

-- 
Rich



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-25 Thread Gordon Pettey
On Thu, Feb 25, 2016 at 1:12 AM, Martin Vaeth  wrote:

> Luis Ressel  wrote:
> >
> > That would require a local git clone. And that's exactly what those who
> > still want Changelogs are trying to avoid.
>
> You need even a deep git clone with full history.
>
> Already now this means that you need 2 (or already 3?) times the
> disk space as for an rysnc mirror; multiply all numbers by 4
> if you used squashfs to store the tree.
>
> In the course of the years the factor will continue to increase;
> I guess at least by 1 for every year (there is possibility of some
> compression of history, but OTOH, many packages are added and
> removed, eclasses keep changing, etc.)
>
> So in 2-3 years, it can be for some users 20 times the disk storage
> than what it needs now.
>

Or, in 2-3 years, maybe people will stop with the hyperbole. Hopefully
sooner. The tree is a bunch of text files, of which a whole lot of text is
repeated (esomewrapper, eclass-based builds which are identical but for a
single line, version updates to packages that make no changes at all to the
ebuild, etc.) which is great for compression, which git does.


Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-25 Thread M. J. Everitt
On 25/02/16 08:59, Kent Fredric wrote:
> On 25 February 2016 at 21:02, Consus  wrote:
>> Well, we do have one
>> 
>> https://gitweb.gentoo.org/repo/gentoo.git/log/dev-lang/perl
>> 
>> I bet folks want to check out what's new in their local copy of 
>> Portage tree.
> 
> 
> With a custom, portage oriented, on-demand log generator you could
>  produce a lot more detail ( and in a text format that doesn't 
> require a web browser to view ) , and potentially use
> understanding of portage conventions to generate change data
> outside those explicitly stated.
> 
> Though that would be a "later feature" you could potentially bolt 
> on after the main logic was sorted out.
> 
> The idea being you could request a changelog for a package with a 
> list of "interest aspects" and have the log reduced to changes
> that affect those interests.
> 
> For instance, you could do :
> 
> curl http://thing.gentoo.org/changes/dev-lang/perl?arch=~x86
> 
> And with a bit of effort, you could generate a changelog that is 
> only relevant for somebody who is on ~x86, eliding changes that
> x86 didn't get yet.
> 
> For instance, an ~x86 filter would elide stabilizations for ~x86, 
> because you don't care about stabilizations if you're assuming 
> ~arch. ( And it would elide changes that were only visible for 
> other arches )
> 
> And this filter wouldn't necessarily be implemented in "grep for 
> keywords in the commit message", but *analyse the change in the 
> directory* based, which would give the ability to do things that 
> would otherwise only be possible with a git clone.
> 
> 
> 
This idea is quite neat - you could do either some basic User-Agent
check and either render a web page for viewing online for changes, or
even have a specifier that gave you some other output options .. eg.
ChangeLog (rev. chron) or basic web or XML or JSON which you could
then post-process if you desired.

I know this is kind of bloating the idea, but the flexibility and such
would make it Really Useful .. I think, anyhow ...

MJE



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-25 Thread Kent Fredric
On 25 February 2016 at 21:02, Consus  wrote:
> Well, we do have one
>
> https://gitweb.gentoo.org/repo/gentoo.git/log/dev-lang/perl
>
> I bet folks want to check out what's new in their local copy of Portage
> tree.


With a custom, portage oriented, on-demand log generator you could
produce a lot more detail ( and in a text format that doesn't require
a web browser to view ) , and potentially use understanding of portage
conventions to generate change data outside those explicitly stated.

Though that would be a "later feature" you could potentially bolt on
after the main logic was sorted out.

The idea being you could request a changelog for a package with a list
of "interest aspects" and have the log reduced to changes that affect
those interests.

For instance, you could do :

   curl http://thing.gentoo.org/changes/dev-lang/perl?arch=~x86

And with a bit of effort, you could generate a changelog that is only
relevant for somebody who is on ~x86, eliding changes that x86 didn't
get yet.

For instance, an ~x86 filter would elide stabilizations for ~x86,
because you don't care about stabilizations if you're assuming ~arch.
( And it would elide changes that were only visible for other arches )

And this filter wouldn't necessarily be implemented in "grep for
keywords in the commit message", but *analyse the change in the
directory* based, which would give the ability to do things that would
otherwise only be possible with a git clone.



-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-25 Thread Consus
On 18:46 Thu 25 Feb, Kent Fredric wrote:
> I'm considering bolting together some Perl that would allow you to run
> a small HTTP service rooted in a git repo dir, and would then generate
> given changes files on demand and then cache their results somehow.
> 
> Then you could have a "Live changes as a service" where interested
> parties could simply do:
> 
>  curl http://thing.gentoo.org/changes/dev-lang/perl
> 
> and get a changelog spewed out instead of burdening the rsync server
> with generating them for every sync.

Well, we do have one

https://gitweb.gentoo.org/repo/gentoo.git/log/dev-lang/perl

I bet folks want to check out what's new in their local copy of Portage
tree.



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-24 Thread Kent Fredric
On 25 February 2016 at 18:03, Duncan <1i5t5.dun...@cox.net> wrote:
> Which I am (running from the git repo), and that ability to (as a user,
> easily) actually track all that extra data was one of my own biggest
> reasons for so looking forward to the git switch for so long, and is now
> one of the biggest reason's I'm a /huge/ supporter of the new git repo,
> in spite of the time it took and the imperfections it still has.


I'm considering bolting together some Perl that would allow you to run
a small HTTP service rooted in a git repo dir, and would then generate
given changes files on demand and then cache their results somehow.


Then you could have a "Live changes as a service" where interested
parties could simply do:

 curl http://thing.gentoo.org/changes/dev-lang/perl

and get a changelog spewed out instead of burdening the rsync server
with generating them for every sync.

That way the aggregate CPU Load would be grossly reduced because the
sync server wouldn't have to spend time generating changes for every
update/update window, and it wouldn't have to be full-tree aware.

But thinking about it makes me go "eeeh, thats a lot of effort really"

-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-24 Thread Brian Dolbec
On Wed, 24 Feb 2016 21:16:13 +0100
Luis Ressel  wrote:

> On Wed, 24 Feb 2016 11:18:55 -0800
> Raymond Jennings  wrote:
> 
> > As far as changelog generation, what about causing the changelogs to
> > be autogenerated by the end user's computer?  Divide and conquer.  
> 
> That would require a local git clone. And that's exactly what those
> who still want Changelogs are trying to avoid.
> 

Not only that, but their generation along with thick manifests are
already quite resource intensive and time consuming for a relatively
high powered server (a big reason behind this thread).

Now make some older users system or low powered arm system do that with
much lower resources and you are talking about a long time for
completion.

-- 
Brian Dolbec 



pgpF8oAtbYAJh.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-24 Thread Daniel Campbell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 02/24/2016 12:16 PM, Luis Ressel wrote:
> On Wed, 24 Feb 2016 11:18:55 -0800 Raymond Jennings
>  wrote:
> 
>> As far as changelog generation, what about causing the changelogs
>> to be autogenerated by the end user's computer?  Divide and
>> conquer.
> 
> That would require a local git clone. And that's exactly what those
> who still want Changelogs are trying to avoid.
> 
What are some arguments/reasonings for that? Whether it's a dependency
on rsync or a dependency on git, a new Gentoo machine will need one of
them in order to sync.

I can understand mirrors may not want to run git cloning on their
infra, that's a fair point as it requires additional setup (afaict).
And syncing is technically a separate concern than version control,
but if the entire point is to retain a changelog, and we're generating
changelogs from git commits, then it seems to me that version control
is the correct tool for the job.

I'm not advocating for rsync to be done away with, as it has its
benefits, but changelogs are logically related to version control.

- -- 
Daniel Campbell - Gentoo Developer
OpenPGP Key: 0x1EA055D6 @ hkp://keys.gnupg.net
fpr: AE03 9064 AE00 053C 270C  1DE4 6F7A 9091 1EA0 55D6
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJWzh2HAAoJEAEkDpRQOeFwL0QP/jf6pL3ZzKwtYYZBVIhe74eE
09R9zeTdnzSi5yyVUi2nVXDogBe6+bafwLDA/dzS9iskhzrznzKAnaUroOtnx6rN
8QVe3ojy9DxhvmnQSPUGEEjMe70kIyGM3Z+enOg59k6VGl97x+f53xvQkj3oZAId
W6aGYCi7m0ApsqdLrYoNhcE6toNHrpd/YhzS7bJTnhaNezx523EWzYsk5ej/Vyyt
GpjEJMEpiiU3KkjoiVS+sb3SYJ+VneIq7n3mszmw+O/pFbGX76lxoywCVx+P1Z8K
1aGoG9ZcYPij4jQWXdIdB8Rhw+DQF6FIYW3A1aw3hmnQsQFM31tt6V6wJKzWEjO8
xDZ69iIYqQevUHSUanXm/p5BGumF6HOq+DS0A0+gpFz/+FlxULj97eKMotaO0R1v
BPIbGXGNASXz62kYG8SJmA6KU8mc622JnZ9dY1XMLzY6vcTM89vbkudZXBq2Wyyc
CnbEuC9+1eBrOXOIWTPZ0+8XVaScz9kiBvgeQXYOd8VbgKS+GuFGOjyh1JWZdPyY
LAyarpAVmhUpRwBWw3oXeKUm50h2WXiBQWJnELYBWO9sNG7Z4g1u3QAY5zzYuYJr
VdphdxUzRtsMLI86oP/Lr7Hw5v3wMhebwsPeuvHtebU63A4noGpWyeokuZ9tfrjS
ecd156K/a1QezkWwAY0D
=Bw//
-END PGP SIGNATURE-



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-24 Thread Luis Ressel
On Wed, 24 Feb 2016 11:18:55 -0800
Raymond Jennings  wrote:

> As far as changelog generation, what about causing the changelogs to
> be autogenerated by the end user's computer?  Divide and conquer.

That would require a local git clone. And that's exactly what those who
still want Changelogs are trying to avoid.

-- 
Regards,
Luis Ressel


pgpq6zs8rkL_V.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-24 Thread Raymond Jennings
Seems like there's a trade off in resource usage re: git vs rsync

Rsync seems to be relatively cheap, but has a fixed part of its overhead.
Probably one of the reasons that you get temp-banned from the mirrors if
you sync too often.

Git overhead appears ot be higher on the variable parts but lower on the
fixed parts, and from what I gather, the more often you sync, the lower the
overhead.

As far as changelog generation, what about causing the changelogs to be
autogenerated by the end user's computer?  Divide and conquer.


Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-24 Thread Kent Fredric
On 24 February 2016 at 20:29, Duncan <1i5t5.dun...@cox.net> wrote:
> I guess another way of putting it in the context of changelogs, would be
> that if gentoo were using git merges correctly, a changelog summary
> generator could simply take the high-level merge summary comments and
> turn that into its changelog summary.  Instead of ten dozen individual
> "cat-egory/pkg-x.y.z arm-stable" entries, there'd be one or two "arm-
> stable various packages in these categories: xxx, yyy, zzz, aaa" entries,
> and people who don't care about arm could skip the further detail while
> still getting an overall idea of arm activity, while those who do care
> about arm and want further information could drill down further as
> necessary, but would be able to skip the corresponding merge entries for
> x86 and amd64.
>
> With proper git usage, the information would already be there in the git
> log merge commit comments for people like me who like to read those, but
> it would also be not only far simpler, but actually /possible/ to
> automate a summarizer that generates summaries from only those merge
> entries, that then could be stored in the rsync tree or published to
> packages.gentoo.org or the gentoo front page, or wherever.


Indeed, we could probably establish better conventions for identifying
certain kinds of commits such that a static log analyser would be able
to give a good result.

And you can probably trivially filter out  ( or in your case, filter
exclusively for ) changes that relate to a specific arch simply by
examining the "DIFF" data.

And we could probably to much better at formatting merge commits as
well ( I've been encouraging such a thing already because "merged
branch x/y/z" is not informative enough )

"Good" changelog automation pretty much relies on the quality of the
underling data and the ability to identify commits that are to be
included/excluded smartly, and pick the data out of those commits that
are relevant.

Though personally I feel for the goal of stabilization tracking, you
aught to be analysing the git repo. Not only can you then see when a
given package was stabilised, but you can see the other packages that
were stabilized in its proximity, which is way too hard to do with the
Changelogs.


-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-23 Thread Kent Fredric
On 24 February 2016 at 17:24, Duncan <1i5t5.dun...@cox.net> wrote:
> Particularly when the basic changelog information is there, it's simply
> quibbling about chronological or reverse-chronological order we're doing
> now, and people who /really/ care about it by rights should be going
> straight to the git logs in the first place.


Gentoo actually make this problem worse than it should be.

Most of the suffering with having Changelogs in tree is due to the
whole "Every commit must have a changelog entry" madness, and the
natural consequences of having a lot of those leads to lots of merge
collision.

By comparison, in other places ( for example, CPAN ), having a dumb
git -> changelog mapping tends to be right up there on the list of
dumb ideas, and the natural response I have ( and most CPAN hackers
have ) upon seeing a git output based changelog is typically "close
page, assume there was no changes".

Like from a changelog perspective, I don't think anyone cares about
stabilization changes.

Its either stabilized, or it isn't, change logs indicating you tweaked
a flag tends to not be the sort of thing people go looking at
Changelogs for.

If you want granular, commit-by-commit details about what changed,
yes, Git _is_ the right option for that.

But having that level of detail in the changelog is itself the madness
we should avoid.

Changelogs are really supposed to be _for humans_ giving changes that
_humans_ will care about.

Like on Published Open Source software, things you tend to look at the
changelog for is:
- What are the new features in this new version
- What bugs were fixed in this new version
- What security concerns were resolved by this new version.

The point being "If I just look at the diff directly I might not
understand what is happening"

And that's why there's the convention of being recent-first.

Because you open the changelog at the top, and you read down consuming
the aggregate changes of relevance to you mentally, and then you stop
when you reach a version you've already seen.

A big log of "Stabilized X" is just ... a waste of time IME.

But I'm sure at least one person out there has probably gone looking
for a changelog to see when something got stabilized/keyworded.

If we released ourselves from this inanity of annotating every change
at a level beyond which normal people could care about, we could
probably get away with manually maintained Changelogs again.

Because not *every* change warrants telling an end user "Hey, we changed this".

-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-23 Thread Vadim A. Misbakh-Soloviov
> Is this actually true?  For the typical use case of daily or close to
> daily updates I'd think that git would be much more efficient.
As there were noticed multiple times on the list already, this should
not ever happen, at least, until git will support resumable
fetches/clones/whatever. Otherwise you'll make a lot of people, using
bad quality internet access, to frustrate.



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-23 Thread Rich Freeman
On Tue, Feb 23, 2016 at 7:50 PM, Kristian Fiskerstrand  wrote:
>
> On 02/24/2016 01:33 AM, Duncan wrote:
>>
>> IMO, what's actually happening here is the slow deprecation of
>> rsync mirrors in favor of git.  I doubt they'd be created at all
>> if gentoo were
>
> I don't agree to this at all. For one thing git is very resource
> intensive compared to rsync mirroring,

Is this actually true?  For the typical use case of daily or close to
daily updates I'd think that git would be much more efficient.

rsync has to traverse an entire directory tree (both client and
server-side, though of course either could have it cached) and
synchronize across the network the metadata for every file to
determine what has changed, and then figure out what changed in each
file and transfer it.  With a large git repository with only a few
hundred new commits the client just tells the server what its last
commit is, the server walks back in history to find it, and then the
server can quickly identify all the new commits/trees/blobs and send
just those.  With the COW design of git this is very efficient, not
requiring traversing any subdirectory in which no files have changed.

In the degenerate case where nothing has changed, an rsync still needs
to walk the full tree and send a file list, while git just sends a
commit ID and terminates.

Now, for an infrequent sync (think months) where most of the tree has
changed I could certainly buy that a webrsync would be far more
efficient for everybody.

And just like rsync git is easy to mirror, with github being an
example of a service that will mirror anybody's repo for free and they
seem to have no end to their bandwidth (though I've found that pushing
a full historical gentoo git tree to them does make them choke on it
for about 30min before it shows up).

So, while I'll agree with the validity of your other points, I'd be
interested in actual data to back up the resource claim.  I could see
that going either way, and that is likely to be based on how
well-optimized everything is.  Linus did a pretty good job with git.

> For one thing we can't expect users to keep an up
> to date copy of all gentoo developer's OpenPGP keys to verify each git
> commit, additionally this will cause issues with retirement and
> similar situations (certificate revocation, subkey rotations, expiries).

Well, we could do something (eventually) to make tracking keys easier,
but I'll still buy that the thick manifests are more secure.  Git
commit signatures are only bound to their contents with sha1.  I get
that nobody has demonstrated a practical attack on that, but I think
most crypto experts wouldn't heartily endorse the design.

Keep in mind that we do have git mirrors that include metadata/etc
hosted on Github.  I know people have concerns with their software
being proprietary but as far as syncing goes it is just a mirror.  I
doubt most of us audit all the distfiles mirrors we use to make sure
they're only using FOSS ftp/http servers and so on.  There really
isn't any reason that it couldn't be hosted on infra either, assuming
they wanted the extra load (and I don't see the point in it, since it
is just a mirror, and if it ever goes away it is trivial to just point
the scripts that generate it to push to some other mirror instead -
git itself is completely FOSS).

Again, I have nothing against devs maintaining rsync and changelogs,
and users making use of them.  I just don't see it as the end of the
world if devs decide to stop taking care of them.

-- 
Rich



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-23 Thread Rich Freeman
On Tue, Feb 23, 2016 at 7:33 PM, Duncan <1i5t5.dun...@cox.net> wrote:
>
> Which means it's the tools that expect reverse-chronological order that
> must change.  Either that, or people /that/ concerned about the
> changelogs can simply switch to the git repos and use the existing git
> tools to read their changelogs, as many (including me, as I regularly
> check changelog entries, and now that I can, sometimes the actual diff,
> on one or more packages at nearly every update) already are.

Setting aside the whole git-vs-rsync debate, I'd generally recommend
that anybody interested in programmatic analysis of changes in the
tree use git anyway, because there are far better ways to walk git
commits/etc programatically than parsing changelogs.  In python you
can trivially iterate over commits, access the content of files, all
the metadata, and so on.

I'm not against devs doing the work to provide changelogs for those
who prefer them, but I'd just go right to git if I were writing tools.

-- 
Rich



Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?

2016-02-23 Thread Kristian Fiskerstrand
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 02/24/2016 01:33 AM, Duncan wrote:
> 
> IMO, what's actually happening here is the slow deprecation of 
> rsync mirrors in favor of git.  I doubt they'd be created at all
> if gentoo were

I don't agree to this at all. For one thing git is very resource
intensive compared to rsync mirroring, and there are anyways things
that needs to be properly prepared in a staging area before being
presented to a user. For one thing we can't expect users to keep an up
to date copy of all gentoo developer's OpenPGP keys to verify each git
commit, additionally this will cause issues with retirement and
similar situations (certificate revocation, subkey rotations, expiries).

Git is a good tool for revision control (if used properly), but it is
not a panacea

- -- 
Kristian Fiskerstrand
Public PGP key 0xE3EDFAE3 at hkp://pool.sks-keyservers.net
fpr:94CB AFDD 3034 5109 5618 35AA 0B7F 8B60 E3ED FAE3
-BEGIN PGP SIGNATURE-

iQEcBAEBCgAGBQJWzP5gAAoJECULev7WN52F9BsIAJ/0lCFUYEttFkMU4rsQ2mKY
C8fWgtelOxTQoyqDHuQAGnYRbGoxNe8IfgtlYEwfHtH4C0aZfGr/AwDfo6FmM+nm
ChpyQIFX/V4SaoP+kBoK2ER1nhexWYCADMvIweqzgJwOYaPJfD5/dhJj38cmfkaq
5uvredv3UqwZOcMLexqp2N1X29qDneMve4RDElIp8O4hh344H5Ffonhht+AI7hj0
kqXyHXFtsP1Hq3NB7OdkWfkzcZnG9DZwRmFL3DJ6HXRmXcjV8JPeC4SAGt4/Ea/x
3ck8VRlhCeHMKcwC2pqxmBGnuXNpxVkPXfV4D48ukjt8SfaJbkM7EM/asAlN98A=
=2qTt
-END PGP SIGNATURE-