Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation

2016-03-02 Thread Patrick Lauer
On 03/02/2016 08:48 PM, malc wrote:
> I still fail to understand the bikeshedding here - you really don't
> need a git checkout to get something akin to a changelog. Use the
> github API directly...
>
> The following 1-liner could be trivially productised (maybe even parse
> $PWD to set the path argument...)
>
> curl https://api.github.com/repos/gentoo/gentoo/commits?path=app-admin/eselect
> | perl -MJSON -e 'foreach $i (@{decode_json(join("",@lines=))})
> { print "$i->{commit}->{author}->{name} -
> $i->{commit}->{author}->{date}\n\n $i->{commit}->{message}\n"; }'
Requires you to be online, can't grep over multiple packages.

This version relies on an unreliable thirdparty service and is thus more
of an intellectual curiosity.
> Yeah - it's not quite as pretty as our current Changelog, but date,
> author/committer, commit-msg etc. are all there and you can filter by
> path just the same as you would with native git log...
> You could parse the local $PORTDIR/metadata/timestamp* and add an
> 'until' param to the URL to filter commits beyond where a user has
> rsync'd up to...
>
It is almost, but not completely unlike it. A simple ChangeLog is a lot
easier ...


(Why are people now trying to add middleware layers to indirect the
problem to become invisible in a huge machinery? This is wonderfully
insane ...)



Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation

2016-03-02 Thread Rich Freeman
On Wed, Mar 2, 2016 at 2:48 PM, malc  wrote:
> I still fail to understand the bikeshedding here - you really don't
> need a git checkout to get something akin to a changelog. Use the
> github API directly...
>

The main downside to using github would be that you don't get a
combined history pre/post-migration, but it certainly works.  Github
doesn't work with git replace.  I'm not sure if anongit does or if it
has a useful API like this.  I think you can push git replace
references, but whether the web viewer ignores them or not is another
matter.  They aren't cloned by default I believe (which makes sense
since they're references - an explicit fetch does work).

Somebody could create one big combined repo without using git replace,
but the hashes won't match and that sounds like a recipe for mass
confusion.  You couldn't directly sync it via pull/push either, since
the hashes will never match.

-- 
Rich



Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation

2016-03-02 Thread Rich Freeman
On Wed, Mar 2, 2016 at 1:14 PM, Ulrich Mueller  wrote:
>
> For example, the message of the initial commit 56bd759 appears in some
> 18000 files, which accounts for 25 MiB.

Not discounting the general issue, I wouldn't count the initial
commit.  All that space will get taken up the first time something
gets committed to all the packages.

-- 
Rich



Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation

2016-03-02 Thread malc
I still fail to understand the bikeshedding here - you really don't
need a git checkout to get something akin to a changelog. Use the
github API directly...

The following 1-liner could be trivially productised (maybe even parse
$PWD to set the path argument...)

curl https://api.github.com/repos/gentoo/gentoo/commits?path=app-admin/eselect
| perl -MJSON -e 'foreach $i (@{decode_json(join("",@lines=))})
{ print "$i->{commit}->{author}->{name} -
$i->{commit}->{author}->{date}\n\n $i->{commit}->{message}\n"; }'

Yeah - it's not quite as pretty as our current Changelog, but date,
author/committer, commit-msg etc. are all there and you can filter by
path just the same as you would with native git log...
You could parse the local $PORTDIR/metadata/timestamp* and add an
'until' param to the URL to filter commits beyond where a user has
rsync'd up to...

Cheers,
malc.


On Wed, Mar 2, 2016 at 6:14 PM, Ulrich Mueller  wrote:
>> On Wed, 2 Mar 2016, Ian Stakenvicius wrote:
>
>> On 02/03/16 03:50 AM, Ulrich Mueller wrote:
>>> How is it possible that we have 52 MiB of ChangeLog entries
>>> generated in the 0.5 years since the git conversion, whereas we had
>>> only a total of 103 MiB in the 13.5 years since ChangeLogs were
>>> introduced in 2002? Certainly our commit rate hasn't increased by
>>> more than an order of magnitude in the last half year?
>
>> The content of a changelog entry from git is a lot bigger than it
>> was just from echangelog, isn't it?
>
> Not by a factor of ten.
>
> I've investigated a bit, and the main problem seems to be that for git
> commits that extend over several directories, the commit message is
> duplicated into many ChangeLog entries.
>
> For example, the message of the initial commit 56bd759 appears in some
> 18000 files, which accounts for 25 MiB. Then there is commit eaaface
> and its revert 1bfb585, again appearing in almost all ChangeLog files
> in the tree. These account for another 9 MiB. Last example, commit
> 8849b09, another 2 MiB.
>
> So about 70% of the size is caused by these 4 tree-wide commits alone.
> However, there are many more examples of duplication on a smaller
> scale.
>
> Ulrich



Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation

2016-03-02 Thread Ulrich Mueller
> On Wed, 2 Mar 2016, Ian Stakenvicius wrote:

> On 02/03/16 03:50 AM, Ulrich Mueller wrote:
>> How is it possible that we have 52 MiB of ChangeLog entries
>> generated in the 0.5 years since the git conversion, whereas we had
>> only a total of 103 MiB in the 13.5 years since ChangeLogs were
>> introduced in 2002? Certainly our commit rate hasn't increased by
>> more than an order of magnitude in the last half year?

> The content of a changelog entry from git is a lot bigger than it
> was just from echangelog, isn't it?

Not by a factor of ten.

I've investigated a bit, and the main problem seems to be that for git
commits that extend over several directories, the commit message is
duplicated into many ChangeLog entries.

For example, the message of the initial commit 56bd759 appears in some
18000 files, which accounts for 25 MiB. Then there is commit eaaface
and its revert 1bfb585, again appearing in almost all ChangeLog files
in the tree. These account for another 9 MiB. Last example, commit
8849b09, another 2 MiB.

So about 70% of the size is caused by these 4 tree-wide commits alone.
However, there are many more examples of duplication on a smaller
scale.

Ulrich


pgpjziZd_Lv5m.pgp
Description: PGP signature


Re: [gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation

2016-03-02 Thread Ian Stakenvicius
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 02/03/16 03:50 AM, Ulrich Mueller wrote:
> How is it possible that we have 52 MiB of ChangeLog entries
> generated in the 0.5 years since the git conversion, whereas we
> had only a total of 103 MiB in the 13.5 years since ChangeLogs
> were introduced in 2002? Certainly our commit rate hasn't
> increased by more than an order of magnitude in the last half
> year?
> 

The content of a changelog entry from git is a lot bigger than it
was just from echangelog, isn't it?

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iF4EAREIAAYFAlbXI4kACgkQAJxUfCtlWe2gJwEA6EDRDBa94PuopiPc7lP/GAyw
cTyWHzPznQpUGyMPXHsBAMQi+EluVkEkf6ilttjXw+XMqi//C0QyaT1jRhvRAprL
=nIHW
-END PGP SIGNATURE-



[gentoo-dev] Re: [PATCH dtd] Remove outdated definition of global-scope

2016-03-02 Thread Ulrich Mueller
> On Wed, 2 Mar 2016, Michał Górny wrote:

> Remove the long form of  element that was likely used (or
> supposed to be used) in the global metadata scope. It is currently
> referenced in  element only, and judging from the comments,
> it is supposed to always be a URL there.

Apparently the original plan for metadata.xml was to convert
ChangeLogs to XML and include them as part of the file:
http://thread.gmane.org/gmane.linux.gentoo.devel/9663

So yes, kill it with fire. Patch LGTM.

Ulrich


pgpFBMhk3zgk_.pgp
Description: PGP signature


[gentoo-dev] Re: [PATCH dtd] Remove outdated definition of global-scope

2016-03-02 Thread Dirkjan Ochtman
On Wed, Mar 2, 2016 at 12:38 PM, Michał Górny  wrote:
> Remove the long form of  element that was likely used (or
> supposed to be used) in the global metadata scope. It is currently
> referenced in  element only, and judging from the comments,
> it is supposed to always be a URL there.

LGTM!



[gentoo-dev] [PATCH dtd] Remove outdated definition of global-scope

2016-03-02 Thread Michał Górny
Remove the long form of  element that was likely used (or
supposed to be used) in the global metadata scope. It is currently
referenced in  element only, and judging from the comments,
it is supposed to always be a URL there.
---
 metadata.dtd | 25 ++---
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/metadata.dtd b/metadata.dtd
index 101478a..5bc8c39 100644
--- a/metadata.dtd
+++ b/metadata.dtd
@@ -30,29 +30,6 @@
 
 
 
-  
-  
-  
-
-
-  
-
-  
-
-  
-
-  
-  
-
-  
-
-
   
   
 
@@ -66,6 +43,8 @@
   the usage of the status attribute is nevertheless _only_ allowed
   in the upstream maintainer element. -->
 
+
+
 
 
   
-- 
2.7.2




[gentoo-dev] Re: [gentoo-project] Portage repo usage survey and change evaluation

2016-03-02 Thread Ulrich Mueller
> On Wed, 2 Mar 2016, Robin H Johnson wrote:

> I just hadn't finished putting the results into a long-term format
> quite yet, but did so this afternoon:
> http://dev.gentoo.org/~robbat2/201602-portage-survey/

Thank you.

> Some remarks about question #2 and #3:

> Q2: Reduce local disk usage by excluding ChangeLogs?
> 
> It was unfortunately pointed out to me very late that my question #2
> had some confusing text:
> - "No, but only if were optional (I do NOT want it, but others might)"
> - "Yes, but only if it were optional (I want it, but others might NOT)"

> The bracket portion of each answer was interpreted as meaning the
> opposite as the start of each answer :-(.

> Either way, ~60% are in favour of getting rid of changelogs.

Not sure if it can be interpreted this way. This would contradict the
results of both Q1 and Q3.

For Q1, 45 responses read ChangeLogs in some way (A1.2 to A1.5 or a
combination of them), whereas only 17 responses don't read ChangeLogs
at all (A1.1 or some combination including it). Disregarding the two
responses who at the same time read them and don't read them at all.

> IMO this is a BETTER goal than continuing to generate them for
> rsync, and bike-shedding about what the order should be; and it
> provides a huge benefit by reducing the size of rsync by 155MiB.

Hm, that's almost 40% of the total size of the tree.

   $ find /usr/portage/ -type f -name 'ChangeLog-20*' -printf '%s\n' | awk '{ 
s+=$1 } END { print s/1024^2 }'
   102.961

That's the old ones from CVS.

   $ find /usr/portage/ -type f -name ChangeLog -printf '%s\n' | awk '{ s+=$1 } 
END { print s/1024^2 }'
   52.0908

That's the new ones autogenerated from git.

How is it possible that we have 52 MiB of ChangeLog entries generated
in the 0.5 years since the git conversion, whereas we had only a total
of 103 MiB in the 13.5 years since ChangeLogs were introduced in 2002?
Certainly our commit rate hasn't increased by more than an order of
magnitude in the last half year?

> Q3: What order should ChangeLog entries be in?
> --
> - 85.3% of responses either preferred newest first OR didn't care
>   (incl so as long as the tools work).
> - 2.9% wanted oldest first.
> - NOBODY selected "I'd prefer oldest entries first, but do what is
>   best for distribution"
> - 11.8% said get rid of changelogs.

Ulrich


pgpgMCMWmGHXx.pgp
Description: PGP signature