Re: [gentoo-dev] [gentoostats continued] Collected data and justification for it

2020-05-09 Thread Kent Fredric
On Thu, 07 May 2020 09:29:36 +0200
Michał Górny  wrote:

> For example, if OCaml bindings on some package are broken and require
> a lot of work, I would find useful to know how likely it is that anyone
> is using it.  Or if a lot of people are enabling 'frobnicate' flag,
> I could consider employing USE defaults.

For normal reporting, I'd suggest "counts of users" have some default
presentation that encourages people to think of the data as incomplete.

For example, instead of "0", it might print "<10", or say, "10: +/- 10"

Or rank results in terms of relative numbers, "low", "high", etc.

Or maybe incorporate time bounds with the information:

   "0 this month"

Because even the people participating may not be participating
frequently for all the niche things to turn up in every sample.

Just working out a good way to calculate what the "error bars" should
be is the hard part.


pgpUKxGN_q6U5.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] [gentoostats continued] Collected data and justification for it

2020-05-09 Thread Kent Fredric
On Sat, 09 May 2020 15:22:52 +0200
Gerion Entrup  wrote:

> I'm not sure, if Portage is capable of this, but a distinction in USE
> flags needed to fulfil some dependency of another package and USE flags
> actively activated by the user could be useful.

Presently impossible, as how portage implements the former, is by
churning that information via either "plz sir, set this use flag in
your config", or via auto-tweaking config to assert "I wanted this, you
now want it".

After that happens, the information as to /who/ specified that want is
lost. 

At very best you can make some inferences based on the comments
that get injected, but that's not anywhere near 100%, esp in turn-key
approach, or, alternatively, assert that if a flag is specified in
configure *and* something depends on that flag being set, then its the
dependent, not the user  but that really isn't true on a regular
basis. 

For instance, uh, USE="X" (global) -> install Foo (w/ USE="X")

Foo depends on Bar[X?]

So is "Bar:X" required by the user, or by "Foo", or both?

And does the answer to that question depend in any way on whether B (or
Foo) declares IUSE="+X" or IUSE="X" ?




pgpIoRLoR5lPw.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] [gentoostats continued] Collected data and justification for it

2020-05-09 Thread Kent Fredric
On Fri, 8 May 2020 09:49:18 +0200
Jaco Kroon  wrote:

> So we do need the full list of packages installed, filtered to ::gentoo,
> but there needs to be an indicated whether it's installed because it's
> in @world, as a dep of something in @world (which is possibly not in
> ::gentoo), or is some form of no-longer needed dep.

A dedicated report of orphans that are installed, from ::gentoo, would
probably help here.

Because you can't directly assume orphans are "user wanted", but they
*can* be.

That's why its important during --depclean to read the output and
re-add any to @world you need kept.

If you never depclean, then you get to skip that step.

( And I have *many* times added -1 to installation of something I
wanted, out of habit, because I do so much manual hacking via emerge -1
that adding it is an impulse! )


pgpbZvnrigLQ0.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] [gentoostats continued] Collected data and justification for it

2020-05-09 Thread Andreas K . Hüttel
> 
> I think we shouldn't collect any data unless we have a good plan on how
> we'd be able to use it.  In this thread, I'd like to collect ideas
> on what data to collect and how it could realistically be used.
> 

5) CFLAGS and possibly related variables
6) "active" version of slotted system packages (gcc, binutils, python, ???)

I see this as interesting for the toolchain maintenance, but also interesting 
in general since we are a source-based distro.

* How many users are running LTO? doing Profiling? building generic (-
march=x86_64) packages? using -Os or -O3, -funroll-loop (just kidding)
* How quick is gcc / binutils / ... adoption?
* clang usage?


-- 
Andreas K. Hüttel
dilfri...@gentoo.org
Gentoo Linux developer 
(council, qa, toolchain, base-system, perl, libreoffice)

signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] [gentoostats continued] Collected data and justification for it

2020-05-09 Thread Gerion Entrup
Am Donnerstag, 7. Mai 2020, 09:29:36 CEST schrieb Michał Górny:
> I'm going to start with the data and uses I can think of.  Please reply
> with other things you can think of.
> 
> 
> 1) list of selected packages (@world)
> 
> We would use this to determine the popularity of individual packages,
> plus by scanning their dependencies we would be able to make combined
> statistics for direct usage + dependencies of other selected packages. 
> This would allow us to judge which packages need more of our attention.
> 
> For example, as we port Python packages to Python 3.8 the packages with
> more declared users would be ported first.

You may want to collect packages installed per sets, too.
I mainly do what Hans mentioned in his mail to this thread but with sets.
For example I have a KDE-PIM set that installs only my needed subset of
the KDE PIM suite. (I also use this as common workaround for yet missing
runtime dependencies / suggestions made by pkg_postinst.)

Retrieval of this packages would be straight forward: Look at world_sets
and collect all packages that are installed by the set.

 
> 2) USE flags on installed packages (disabled/default/enabled)
> 
> This would allow us to determine which flags users are most likely to
> actually rely on.  This could determine tested flag combinations,
> defaults, and required level of support for individual flags.
> 
> For example, if OCaml bindings on some package are broken and require
> a lot of work, I would find useful to know how likely it is that anyone
> is using it.  Or if a lot of people are enabling 'frobnicate' flag,
> I could consider employing USE defaults.

I'm not sure, if Portage is capable of this, but a distinction in USE
flags needed to fulfil some dependency of another package and USE flags
actively activated by the user could be useful.

Dependency use flags should be treated with a higher priority in my
opinion, since they enable the installation of another package (tree),
while use flags that enable a certain feature that is not used elsewhere
are more "nice to have".


Best
Gerion


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] [gentoostats continued] Collected data and justification for it

2020-05-08 Thread Jaco Kroon
Hi,

On 2020/05/08 08:17, Hans de Graaff wrote:
> On Thu, 2020-05-07 at 09:29 +0200, Michał Górny wrote:
>>
>> 1) list of selected packages (@world)
>>
>> We would use this to determine the popularity of individual packages,
>> plus by scanning their dependencies we would be able to make combined
>> statistics for direct usage + dependencies of other selected
>> packages. 
>> This would allow us to judge which packages need more of our
>> attention.
> At work we install a lot of dependencies through a few company-specific 
> virtual packages, e.g. company/developer for all stuff useful for our
> developers. These packages would then be missed in the statistics. I'm
> not sure how prevalent this is and to what extend it wills skew the
> statistics.

You raise a valid point.

The company/developer package itself I don't think is relevant.

The fact that some/package::gentoo is installed as a dependency of
company/developer may carry some relevance.

So we do need the full list of packages installed, filtered to ::gentoo,
but there needs to be an indicated whether it's installed because it's
in @world, as a dep of something in @world (which is possibly not in
::gentoo), or is some form of no-longer needed dep.

Otherwise I agree with Michał on the four items to be taken.

I do still think that the ability to define additional information sets
would be useful for building more invasive functionality sets, not
necessarily supported by Gentoo.  For an organization if they can define
a set that grabs a certain amount of hardware details for example that
could help with inventory management.

Kind Regards,
Jaco




Re: [gentoo-dev] [gentoostats continued] Collected data and justification for it

2020-05-08 Thread Hans de Graaff
On Thu, 2020-05-07 at 09:29 +0200, Michał Górny wrote:
> 
> 
> 1) list of selected packages (@world)
> 
> We would use this to determine the popularity of individual packages,
> plus by scanning their dependencies we would be able to make combined
> statistics for direct usage + dependencies of other selected
> packages. 
> This would allow us to judge which packages need more of our
> attention.

At work we install a lot of dependencies through a few company-specific 
virtual packages, e.g. company/developer for all stuff useful for our
developers. These packages would then be missed in the statistics. I'm
not sure how prevalent this is and to what extend it wills skew the
statistics.

Hans


signature.asc
Description: This is a digitally signed message part


[gentoo-dev] [gentoostats continued] Collected data and justification for it

2020-05-07 Thread Michał Górny
Hi,

The previous thread covered a few topics, in this one I'd like to focus
on the data collected.  So far people have indicated a few different
kinds of data they'd find useful.  However, I don't think enough
attention has been put on explaining why they need the data and how
they'd use it.

I think we shouldn't collect any data unless we have a good plan on how
we'd be able to use it.  In this thread, I'd like to collect ideas
on what data to collect and how it could realistically be used.

I'm going to start with the data and uses I can think of.  Please reply
with other things you can think of.


1) list of selected packages (@world)

We would use this to determine the popularity of individual packages,
plus by scanning their dependencies we would be able to make combined
statistics for direct usage + dependencies of other selected packages. 
This would allow us to judge which packages need more of our attention.

For example, as we port Python packages to Python 3.8 the packages with
more declared users would be ported first.


2) USE flags on installed packages (disabled/default/enabled)

This would allow us to determine which flags users are most likely to
actually rely on.  This could determine tested flag combinations,
defaults, and required level of support for individual flags.

For example, if OCaml bindings on some package are broken and require
a lot of work, I would find useful to know how likely it is that anyone
is using it.  Or if a lot of people are enabling 'frobnicate' flag,
I could consider employing USE defaults.


3) System profile

This would primarily allow us to establish how transition to new
profiles proceeds and could influence the decision on prolonging
the support for old ones.  As a side effect, we'd have stats on how
popular different architectures are.

For example, it would help us see whether people are moving away from
amd64 17.0 to 17.1.


4) Arch - installed package correlation

This one could be considered a bit invasive but it would help us
determine how important is keeping particular arch keywords
on a package.

For example, package A breaks on SPARC.  Fixing it would require
significant effort.  If we know it has users on SPARC we're more likely
to put that effort; otherwise, we may just drop SPARC keywords and move
on.


That's all really useful stuff I can think of right now.  What's your
angle?

-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part