Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation
On 03/02/2016 08:48 PM, malc wrote: > I still fail to understand the bikeshedding here - you really don't > need a git checkout to get something akin to a changelog. Use the > github API directly... > > The following 1-liner could be trivially productised (maybe even parse > $PWD to set the path argument...) > > curl https://api.github.com/repos/gentoo/gentoo/commits?path=app-admin/eselect > | perl -MJSON -e 'foreach $i (@{decode_json(join("",@lines=))}) > { print "$i->{commit}->{author}->{name} - > $i->{commit}->{author}->{date}\n\n $i->{commit}->{message}\n"; }' Requires you to be online, can't grep over multiple packages. This version relies on an unreliable thirdparty service and is thus more of an intellectual curiosity. > Yeah - it's not quite as pretty as our current Changelog, but date, > author/committer, commit-msg etc. are all there and you can filter by > path just the same as you would with native git log... > You could parse the local $PORTDIR/metadata/timestamp* and add an > 'until' param to the URL to filter commits beyond where a user has > rsync'd up to... > It is almost, but not completely unlike it. A simple ChangeLog is a lot easier ... (Why are people now trying to add middleware layers to indirect the problem to become invisible in a huge machinery? This is wonderfully insane ...)
Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation
On Wed, Mar 2, 2016 at 2:48 PM, malc wrote: > I still fail to understand the bikeshedding here - you really don't > need a git checkout to get something akin to a changelog. Use the > github API directly... > The main downside to using github would be that you don't get a combined history pre/post-migration, but it certainly works. Github doesn't work with git replace. I'm not sure if anongit does or if it has a useful API like this. I think you can push git replace references, but whether the web viewer ignores them or not is another matter. They aren't cloned by default I believe (which makes sense since they're references - an explicit fetch does work). Somebody could create one big combined repo without using git replace, but the hashes won't match and that sounds like a recipe for mass confusion. You couldn't directly sync it via pull/push either, since the hashes will never match. -- Rich
Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation
On Wed, Mar 2, 2016 at 1:14 PM, Ulrich Mueller wrote: > > For example, the message of the initial commit 56bd759 appears in some > 18000 files, which accounts for 25 MiB. Not discounting the general issue, I wouldn't count the initial commit. All that space will get taken up the first time something gets committed to all the packages. -- Rich
Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation
I still fail to understand the bikeshedding here - you really don't need a git checkout to get something akin to a changelog. Use the github API directly... The following 1-liner could be trivially productised (maybe even parse $PWD to set the path argument...) curl https://api.github.com/repos/gentoo/gentoo/commits?path=app-admin/eselect | perl -MJSON -e 'foreach $i (@{decode_json(join("",@lines=))}) { print "$i->{commit}->{author}->{name} - $i->{commit}->{author}->{date}\n\n $i->{commit}->{message}\n"; }' Yeah - it's not quite as pretty as our current Changelog, but date, author/committer, commit-msg etc. are all there and you can filter by path just the same as you would with native git log... You could parse the local $PORTDIR/metadata/timestamp* and add an 'until' param to the URL to filter commits beyond where a user has rsync'd up to... Cheers, malc. On Wed, Mar 2, 2016 at 6:14 PM, Ulrich Mueller wrote: >> On Wed, 2 Mar 2016, Ian Stakenvicius wrote: > >> On 02/03/16 03:50 AM, Ulrich Mueller wrote: >>> How is it possible that we have 52 MiB of ChangeLog entries >>> generated in the 0.5 years since the git conversion, whereas we had >>> only a total of 103 MiB in the 13.5 years since ChangeLogs were >>> introduced in 2002? Certainly our commit rate hasn't increased by >>> more than an order of magnitude in the last half year? > >> The content of a changelog entry from git is a lot bigger than it >> was just from echangelog, isn't it? > > Not by a factor of ten. > > I've investigated a bit, and the main problem seems to be that for git > commits that extend over several directories, the commit message is > duplicated into many ChangeLog entries. > > For example, the message of the initial commit 56bd759 appears in some > 18000 files, which accounts for 25 MiB. Then there is commit eaaface > and its revert 1bfb585, again appearing in almost all ChangeLog files > in the tree. These account for another 9 MiB. Last example, commit > 8849b09, another 2 MiB. > > So about 70% of the size is caused by these 4 tree-wide commits alone. > However, there are many more examples of duplication on a smaller > scale. > > Ulrich
Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation
> On Wed, 2 Mar 2016, Ian Stakenvicius wrote: > On 02/03/16 03:50 AM, Ulrich Mueller wrote: >> How is it possible that we have 52 MiB of ChangeLog entries >> generated in the 0.5 years since the git conversion, whereas we had >> only a total of 103 MiB in the 13.5 years since ChangeLogs were >> introduced in 2002? Certainly our commit rate hasn't increased by >> more than an order of magnitude in the last half year? > The content of a changelog entry from git is a lot bigger than it > was just from echangelog, isn't it? Not by a factor of ten. I've investigated a bit, and the main problem seems to be that for git commits that extend over several directories, the commit message is duplicated into many ChangeLog entries. For example, the message of the initial commit 56bd759 appears in some 18000 files, which accounts for 25 MiB. Then there is commit eaaface and its revert 1bfb585, again appearing in almost all ChangeLog files in the tree. These account for another 9 MiB. Last example, commit 8849b09, another 2 MiB. So about 70% of the size is caused by these 4 tree-wide commits alone. However, there are many more examples of duplication on a smaller scale. Ulrich pgpjziZd_Lv5m.pgp Description: PGP signature
Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 02/03/16 03:50 AM, Ulrich Mueller wrote: > How is it possible that we have 52 MiB of ChangeLog entries > generated in the 0.5 years since the git conversion, whereas we > had only a total of 103 MiB in the 13.5 years since ChangeLogs > were introduced in 2002? Certainly our commit rate hasn't > increased by more than an order of magnitude in the last half > year? > The content of a changelog entry from git is a lot bigger than it was just from echangelog, isn't it? -BEGIN PGP SIGNATURE- Version: GnuPG v2 iF4EAREIAAYFAlbXI4kACgkQAJxUfCtlWe2gJwEA6EDRDBa94PuopiPc7lP/GAyw cTyWHzPznQpUGyMPXHsBAMQi+EluVkEkf6ilttjXw+XMqi//C0QyaT1jRhvRAprL =nIHW -END PGP SIGNATURE-
[gentoo-dev] Re: [PATCH dtd] Remove outdated definition of global-scope
> On Wed, 2 Mar 2016, Michał Górny wrote: > Remove the long form of element that was likely used (or > supposed to be used) in the global metadata scope. It is currently > referenced in element only, and judging from the comments, > it is supposed to always be a URL there. Apparently the original plan for metadata.xml was to convert ChangeLogs to XML and include them as part of the file: http://thread.gmane.org/gmane.linux.gentoo.devel/9663 So yes, kill it with fire. Patch LGTM. Ulrich pgpFBMhk3zgk_.pgp Description: PGP signature
[gentoo-dev] Re: [PATCH dtd] Remove outdated definition of global-scope
On Wed, Mar 2, 2016 at 12:38 PM, Michał Górny wrote: > Remove the long form of element that was likely used (or > supposed to be used) in the global metadata scope. It is currently > referenced in element only, and judging from the comments, > it is supposed to always be a URL there. LGTM!
[gentoo-dev] [PATCH dtd] Remove outdated definition of global-scope
Remove the long form of element that was likely used (or supposed to be used) in the global metadata scope. It is currently referenced in element only, and judging from the comments, it is supposed to always be a URL there. --- metadata.dtd | 25 ++--- 1 file changed, 2 insertions(+), 23 deletions(-) diff --git a/metadata.dtd b/metadata.dtd index 101478a..5bc8c39 100644 --- a/metadata.dtd +++ b/metadata.dtd @@ -30,29 +30,6 @@ - - - - - - - - - - - - - - - - - @@ -66,6 +43,8 @@ the usage of the status attribute is nevertheless _only_ allowed in the upstream maintainer element. --> + + -- 2.7.2
[gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation
> On Wed, 2 Mar 2016, Robin H Johnson wrote: > I just hadn't finished putting the results into a long-term format > quite yet, but did so this afternoon: > http://dev.gentoo.org/~robbat2/201602-portage-survey/ Thank you. > Some remarks about question #2 and #3: > Q2: Reduce local disk usage by excluding ChangeLogs? > > It was unfortunately pointed out to me very late that my question #2 > had some confusing text: > - "No, but only if were optional (I do NOT want it, but others might)" > - "Yes, but only if it were optional (I want it, but others might NOT)" > The bracket portion of each answer was interpreted as meaning the > opposite as the start of each answer :-(. > Either way, ~60% are in favour of getting rid of changelogs. Not sure if it can be interpreted this way. This would contradict the results of both Q1 and Q3. For Q1, 45 responses read ChangeLogs in some way (A1.2 to A1.5 or a combination of them), whereas only 17 responses don't read ChangeLogs at all (A1.1 or some combination including it). Disregarding the two responses who at the same time read them and don't read them at all. > IMO this is a BETTER goal than continuing to generate them for > rsync, and bike-shedding about what the order should be; and it > provides a huge benefit by reducing the size of rsync by 155MiB. Hm, that's almost 40% of the total size of the tree. $ find /usr/portage/ -type f -name 'ChangeLog-20*' -printf '%s\n' | awk '{ s+=$1 } END { print s/1024^2 }' 102.961 That's the old ones from CVS. $ find /usr/portage/ -type f -name ChangeLog -printf '%s\n' | awk '{ s+=$1 } END { print s/1024^2 }' 52.0908 That's the new ones autogenerated from git. How is it possible that we have 52 MiB of ChangeLog entries generated in the 0.5 years since the git conversion, whereas we had only a total of 103 MiB in the 13.5 years since ChangeLogs were introduced in 2002? Certainly our commit rate hasn't increased by more than an order of magnitude in the last half year? > Q3: What order should ChangeLog entries be in? > -- > - 85.3% of responses either preferred newest first OR didn't care > (incl so as long as the tools work). > - 2.9% wanted oldest first. > - NOBODY selected "I'd prefer oldest entries first, but do what is > best for distribution" > - 11.8% said get rid of changelogs. Ulrich pgpgMCMWmGHXx.pgp Description: PGP signature