Re: CPAN-river: can graph calculation be modified? Neil Bowers <neil.bow...@cogendo.com>

2018-02-02 Thread James E Keenan

On 02/02/2018 11:08 AM, H.Merijn Brand wrote:

On Fri, 2 Feb 2018 15:51:32 +, Neil Bowers
 wrote:


For the 5.29.* development cycle starting in May of this year, I would like to 
be able to use a ranking of CPAN distros which goes beyond asking:

* "How many other distributions depend on this one?"

... to asking:

* "How many distributions by other authors/maintainers depend on this one?"

Would that be feasible?  Has anyone attempted this already?


When we were discussing the River model at QAH, and in discussions afterwards, 
this came up. In the end we decided to keep things simple and go with the 
current common definition. There are some tools in the CPAN ecosystem that only 
count dependencies written by others.

We’d need to agree which dists get ignored in this alternate scheme. Consider 
this example:



Here MARY has released a bunch of dists, but Foo-Bar is also relied
on by other dists written by MUNGO and MIDGE.

The river count for Foo-Bar would be 2 here (ignoring the whole
branch that contains only dists from MARY), but the Foo river count
should be 3, I think. Foo-Bar “counts”, because it in turn is
depended on by dists from other authors. Otherwise the river count
would be 2 for both Foo and Foo-Bar. Basically we’re starting at the
“bottom" of the dependency graph, and trimming sub-graphs all from
one author.




Also consider this example:

What’s the river count of Plant — 0, 1, or 3? I think it should be 1,
in this alternate measure.


1 or 3: 1 if module chains from the same author are "compressed" to 1,
3 if not

More interesting would be

  Thing - Plant - Fruit - Banana - Silver Banana - Distasteful stuff
  JOHNPAULRINGO   RINGORINGO   GEORGE

would plant now be 1, 2, or 4?


I.e. for sub-graphs by the same author, you only include the dist at
the head of the sub-graph.


I'd suggest to have an option to squeeze any unbranched chain of
modules from the same author to 1



I *think* that's what I'm aiming for.  Let's say I have a CPAN distro 
called Gamma on which nothing else depends.  I refactor code out of 
Gamma into Beta, such that Gamma now depends on Beta.  By the standard 
definition, Beta moves up-river, Gamma down-river.


Next I refactor code out of Beta into Alpha.  Alpha is now farther 
up-river than both Beta and Gamma.


Suppose that Alpha now falls into the "top 1000" of the CPAN river. 
When I then switch Perl community roles and start to play the role of 
"rapid BBC evaluator."  A certain portion of my BBC program is now taken 
up with testing Alpha.  But, assuming I confine my focus to the top 
1000, that means some *other* CPAN distribution -- perhaps one whose 
revdeps are from different authors -- has been pushed out of the top 
1000.  That means the data I generate for P5P has been skewed toward 
myself.  That's what I'd like to avert.




It would be useful to have both measures available: raw-river and
author-river.

When looking at a dist there are (at least) three figures that might
be of interest: the full river count (total number of direct and
indirect dependencies), the author-filtered river count (as above),
and the number of direct dependencies (which could be split in 2 as
well).

Neil




Thank you very much.
Jim Keenan


Re: CPAN-river: can graph calculation be modified?

2018-02-02 Thread H.Merijn Brand
On Fri, 2 Feb 2018 15:51:32 +, Neil Bowers
 wrote:

> > For the 5.29.* development cycle starting in May of this year, I would like 
> > to be able to use a ranking of CPAN distros which goes beyond asking:
> > 
> > * "How many other distributions depend on this one?"
> > 
> > ... to asking:
> > 
> > * "How many distributions by other authors/maintainers depend on this one?"
> > 
> > Would that be feasible?  Has anyone attempted this already?  
> 
> When we were discussing the River model at QAH, and in discussions 
> afterwards, this came up. In the end we decided to keep things simple and go 
> with the current common definition. There are some tools in the CPAN 
> ecosystem that only count dependencies written by others.
> 
> We’d need to agree which dists get ignored in this alternate scheme. Consider 
> this example:
> 
> 
> 
> Here MARY has released a bunch of dists, but Foo-Bar is also relied
> on by other dists written by MUNGO and MIDGE.
> 
> The river count for Foo-Bar would be 2 here (ignoring the whole
> branch that contains only dists from MARY), but the Foo river count
> should be 3, I think. Foo-Bar “counts”, because it in turn is
> depended on by dists from other authors. Otherwise the river count
> would be 2 for both Foo and Foo-Bar. Basically we’re starting at the
> “bottom" of the dependency graph, and trimming sub-graphs all from
> one author.


> Also consider this example:
>
> What’s the river count of Plant — 0, 1, or 3? I think it should be 1,
> in this alternate measure.

1 or 3: 1 if module chains from the same author are "compressed" to 1,
3 if not

More interesting would be

 Thing - Plant - Fruit - Banana - Silver Banana - Distasteful stuff
 JOHNPAULRINGO   RINGORINGO   GEORGE

would plant now be 1, 2, or 4? 

> I.e. for sub-graphs by the same author, you only include the dist at
> the head of the sub-graph.

I'd suggest to have an option to squeeze any unbranched chain of
modules from the same author to 1

> It would be useful to have both measures available: raw-river and
> author-river.
> 
> When looking at a dist there are (at least) three figures that might
> be of interest: the full river count (total number of direct and
> indirect dependencies), the author-filtered river count (as above),
> and the number of direct dependencies (which could be split in 2 as
> well).
> 
> Neil

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.27   porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/http://www.test-smoke.org/
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/


pgpraIMz8N34E.pgp
Description: OpenPGP digital signature