Re: [gentoo-portage-dev] precisions on installed packages' dependencies

2020-03-30 Thread michael . lienhardt


- Alec Warner  a écrit :

> Sorry I'm being overly academic. My concern earlier is that you mentioned a
> goal of "never breaking installed packages' which I found to be a fairly
> audacious goal. The idea is that we should build tools that achieve this
> practically (e.g. via heuristics such as := slot operators) while
> understanding that the complexities of application deploys are legion and
> the tool will never handle them all. So the goal of never breaking them is
> more an idealistic goal rather than a practical reality.

I agree.
I started this discussion because I thought that the content of the 
/var/db/pkg/* folder was not enough to keep the dependencies.
Then Zack and you showed me that I only saw the tip of the iceberg (in my 
defense, I first wanted to only keep the package's abstract dependencies, not 
the ones of the actual code. And then the discussion was really interesting).
My experience in dependency management is limited to an extended model of 
debian packages, so my question were (and will be) naive.

I understand that with the current status of Portage:
 - we can consider that the dependencies specified in a package ensure that it 
can be installed and run
 - however, package update and package removal is not guaranteed to work. Slot 
operators is an integrated way to capture some update behaviors, but not all. 
In general, a dedicated method (like for perl) can be needed.

I do believe that never breaking dependencies (at the code level) is a nice 
idealistic goal.
It might not always be possible to achieve it, but you did talk of much work 
done to do it where it is possible.
And, to come back to my previous question, I imagine that the slot operator is 
an ad-hoc but very useful to avoid dependency breakage when possible.
However, this operator looks strange to me: my (naive) intuition to express a 
trigger for package recompilation would be the other way around (i.e., it is 
the package that is updated that says what changes, and so, which other 
packages must be recompiled); like you illustrated with perl, an external tool 
(possibly different for each package) that gives which packages must be 
recompiled due to a specific package update.
This is why I asked why is the slot operator better as a recompilation trigger 
compared to other approaches? Is it because it only requires for the developer 
to add a "=" sign (simplicity is very important)? Or for another reason?

Many thanks!
Michael




Re: [gentoo-portage-dev] precisions on installed packages' dependencies

2020-03-27 Thread Alec Warner
On Fri, Mar 27, 2020 at 7:00 AM  wrote:

>
> - Alec Warner  a écrit :
> > On Tue, Mar 24, 2020 at 11:31 AM  wrote:
> > > However, I still doubt that only storing the soname dependencies is
> enough.
> > > Consider package A (that cannot be recompiled) that depends on package
> B
> > > which provides lib L.so.
> > > B is recompiled with different use flags, which put different
> > > functionalities in L.so.
> > > The dependencies of A are still satisfied (B is installed, L.so is
> > > available), but since the content of L.so changed, A cannot execute
> anymore.
> > > Hypothetically, can this scenario occur?
> > > Can this scenario occur in practice?
> > > Is there a way in emerge/portage to avoid it?
>
>
> > > You have far more experience than me on this, and it would be nice for
> me
> > > to know what I'm up against.
> >
> > A lot of this has to do with the specifics of how package managers manage
> > system state, as well as various quirks of subsets of the tree. For
> > example, a perl upgrade (X->Y) will often break perl modules who expect
> > perl-X, but get perl-Y. So one fix is to try to keep perl-X installed (so
> > we SLOT perl and have N perls installed.) Then you need to decide which
> > version of perl to build things against (X or Y, or both?) We took this
> > tactic in the python ecosystem; but perl is not slotted in Gentoo, and so
> > upgrading perl breaks all perl modules. There is a tool
> > (gentoo-perl-cleaner) that will walk the deptree and fix all of these
> > broken packages that you run after an upgrade.
> >
> > I'm not sure it's strictly avoidable. You could build perl-Y, then
> rebuild
> > all perl-modules against perl-Y, then merge the entire result to the
> > livefs. This will reduce the breakage time but likely not eliminate it;
> > plus it seems hard to implement in practice without modern filesystem
> tools
> > (overlayfs, btrfs, zfs or similar tech to make it atomic.) It also
> doesn't
> > account for executing code. What happens to perl-X code that is executing
> > when you unmerge perl-X? The short answer is that code might break and
> > 'proper' management means you should restart services after an upgrade
> > (something Gentoo doesn't typically do; but is common in Debian for
> > example.)
>
> Many thanks for this answer.
> To sum up what I understood, the problem is not really the dependencies,
> but which recompilation (and service restart) are triggered with an update.
>

Yes and no. Assume a spherical package manager that could detect
everything, we basically need to do the following for a perl X -> Y upgrade.

1 User triggers X - Y upgrade.
2 build perl-Y, but don't merge it to the livefs.
3 numerate all deps of perl-Y, build those (but don't merge them to the
livefs, but they need to build against perl-Y, not perl-X)
4 with atomic_transaction:
  4a Merge perl-Y to the livefs
  4b Merge perl-Y's dependencies to the live fs
5 restart anything that is running perl-X, so that it runs with perl-Y
6 unmerge perl-X

In practice we cannot always do 3 or 4 or 5. We will miss dependencies (due
to missing depgraph information) both in the package depgraph as well as
the service depgraph.

So in practice we do:
1 user triggers X -Y upgrade.
2 build perl-Y, merge it to the livefs, unmerge perl-X
3 run gentoo-perl-cleaner to upgrade the dependencies broken by the X - Y
upgrade (via slot deps, or whatever mechanism, it can vary.)
4 restart anything that is running perl-X

And just accept that from 2-4..well stuff will be broken. We can minimize
the time by building binpkgs ahead of time for example, and merging the
binpkgs in a parallel. Note that 5 and 4 are the same problem in both
lists. Note that per the guide linked below, sometimes 2-3 can be done 'at
once', although again practically speaking "During Perl upgrade, packages
that depend on Perl may become unavailable." even if they are all handled
by emerge, because there are many race conditions in the order that
packages get merged to the livefs.


> In gentoo, there is the ":=" slot operator (and others similar) in
> dependencies that trigger the recompilation when the dependency's slot
> change, but this is the only existing mechanism.
> And this is why every time perl changes, the compilation of its modules is
> not triggered and they are most probably broken.
> Correct?
>

They are supposed to be triggered:
https://wiki.gentoo.org/wiki/Perl#Upgrading_.28major_version.29 for example
says this. But this is up to how callers run the tool. For example gentoo
infra executes emerge via puppet, and we never execute emerge -uDNav
--with-bdeps=y --backtrack=100 --autounmask-keep-masks=y @world, we have
our own piece of code that handles perl upgrades.


> And then, in this context, keeping the installed packages' dependencies
> consistent is up to debate: packages will get broken in any case...
> It is clearly impossible to have a tool that automatically detect all
> implementation dependency breakage.
>
> Again, there's so

Re: [gentoo-portage-dev] precisions on installed packages' dependencies

2020-03-27 Thread michael . lienhardt


- Alec Warner  a écrit :
> On Tue, Mar 24, 2020 at 11:31 AM  wrote:
> > However, I still doubt that only storing the soname dependencies is enough.
> > Consider package A (that cannot be recompiled) that depends on package B
> > which provides lib L.so.
> > B is recompiled with different use flags, which put different
> > functionalities in L.so.
> > The dependencies of A are still satisfied (B is installed, L.so is
> > available), but since the content of L.so changed, A cannot execute anymore.
> > Hypothetically, can this scenario occur?
> > Can this scenario occur in practice?
> > Is there a way in emerge/portage to avoid it?


> > You have far more experience than me on this, and it would be nice for me
> > to know what I'm up against.
> 
> A lot of this has to do with the specifics of how package managers manage
> system state, as well as various quirks of subsets of the tree. For
> example, a perl upgrade (X->Y) will often break perl modules who expect
> perl-X, but get perl-Y. So one fix is to try to keep perl-X installed (so
> we SLOT perl and have N perls installed.) Then you need to decide which
> version of perl to build things against (X or Y, or both?) We took this
> tactic in the python ecosystem; but perl is not slotted in Gentoo, and so
> upgrading perl breaks all perl modules. There is a tool
> (gentoo-perl-cleaner) that will walk the deptree and fix all of these
> broken packages that you run after an upgrade.
> 
> I'm not sure it's strictly avoidable. You could build perl-Y, then rebuild
> all perl-modules against perl-Y, then merge the entire result to the
> livefs. This will reduce the breakage time but likely not eliminate it;
> plus it seems hard to implement in practice without modern filesystem tools
> (overlayfs, btrfs, zfs or similar tech to make it atomic.) It also doesn't
> account for executing code. What happens to perl-X code that is executing
> when you unmerge perl-X? The short answer is that code might break and
> 'proper' management means you should restart services after an upgrade
> (something Gentoo doesn't typically do; but is common in Debian for
> example.)

Many thanks for this answer.
To sum up what I understood, the problem is not really the dependencies, but 
which recompilation (and service restart) are triggered with an update.
In gentoo, there is the ":=" slot operator (and others similar) in dependencies 
that trigger the recompilation when the dependency's slot change, but this is 
the only existing mechanism.
And this is why every time perl changes, the compilation of its modules is not 
triggered and they are most probably broken.
Correct?
And then, in this context, keeping the installed packages' dependencies 
consistent is up to debate: packages will get broken in any case...
It is clearly impossible to have a tool that automatically detect all 
implementation dependency breakage.

Again, there's something I probably don't see: why was slot operators chosen 
(among other possibilities) as a mechanism to trigger recompilation?
I'm very grateful to you all for the time you take to read and answer my 
questions.

Best,
Michael



Re: [gentoo-portage-dev] precisions on installed packages' dependencies

2020-03-26 Thread Alec Warner
On Tue, Mar 24, 2020 at 11:31 AM  wrote:

>
> - Zac Medico  a écrit :
>
> > > The goal of my tool is to have correct manipulation of package
> dependencies, and in particular here, I focus on the packages that are
> installed but not in the portage tree/a local overlay anymore (the problem
> does not occur for other packages).
> > > It seems that installed packages do not store which are the actual cpv
> they depend on. Correct?
> >
> > Right, because unfortunately that's something that changes over time.
> >
> > Also, we may not be able to pin it down at any given moment if we have
> > inconsistent || preferences as described here:
> >
> >
> https://archives.gentoo.org/gentoo-dev/message/550d3859dea6d0fb0b39064628992634
>
> Hmm, I think I see what you mean.
> Storing the cpvs that was used during solving the package's dependencies
> would be too restrictive, since two different packages could provide the
> exact same functionalities/libraries.
> And so, during a system update, only looking at the cpv dependencies would
> trigger useless recompilation of the packages that depend on the updated
> packages.
> Correct?
>
> Btw, my tool's solver does not have the problem discussed in the thread
> you're mentioning: atom order in lists has no influence in my solver.
> Would fixing the inconsistent || preferences make storing cpvs for
> installed packages more realistic?
>
>
> > > Also, I wanted to use the ebuild tool to install/uninstall package,
> which is not possible with this solution apparently.
> >
> > Why not? Would the preserve-libs feature solve your problem?
>
> ... I'm sorry, I wasn't aware of this feature.
> It would definitively solve the issue (except, as described in the bug
> 459038, when external tools remove libs).
>
> This discussion is very interesting!
> If I take this double layer of dependencies, I have to check how this
> influences the theory underlying my tool.
>
> However, I still doubt that only storing the soname dependencies is enough.
> Consider package A (that cannot be recompiled) that depends on package B
> which provides lib L.so.
> B is recompiled with different use flags, which put different
> functionalities in L.so.
> The dependencies of A are still satisfied (B is installed, L.so is
> available), but since the content of L.so changed, A cannot execute anymore.
> Hypothetically, can this scenario occur?
> Can this scenario occur in practice?
> Is there a way in emerge/portage to avoid it?
>

>
> > Well, there are a lot of upgrades that can't be performed without
> > temporarily breaking something, so the "forbid broken dependencies" idea
> > doesn't sound feasible to me.
>
> Could you tell me about several instances of such needed dependency
> breakage?
> You have far more experience than me on this, and it would be nice for me
> to know what I'm up against.
>

A lot of this has to do with the specifics of how package managers manage
system state, as well as various quirks of subsets of the tree. For
example, a perl upgrade (X->Y) will often break perl modules who expect
perl-X, but get perl-Y. So one fix is to try to keep perl-X installed (so
we SLOT perl and have N perls installed.) Then you need to decide which
version of perl to build things against (X or Y, or both?) We took this
tactic in the python ecosystem; but perl is not slotted in Gentoo, and so
upgrading perl breaks all perl modules. There is a tool
(gentoo-perl-cleaner) that will walk the deptree and fix all of these
broken packages that you run after an upgrade.

I'm not sure it's strictly avoidable. You could build perl-Y, then rebuild
all perl-modules against perl-Y, then merge the entire result to the
livefs. This will reduce the breakage time but likely not eliminate it;
plus it seems hard to implement in practice without modern filesystem tools
(overlayfs, btrfs, zfs or similar tech to make it atomic.) It also doesn't
account for executing code. What happens to perl-X code that is executing
when you unmerge perl-X? The short answer is that code might break and
'proper' management means you should restart services after an upgrade
(something Gentoo doesn't typically do; but is common in Debian for
example.)

-A


> Many thanks!
> Michael
>
>


Re: [gentoo-portage-dev] precisions on installed packages' dependencies

2020-03-24 Thread michael . lienhardt


- Zac Medico  a écrit :

> > The goal of my tool is to have correct manipulation of package 
> > dependencies, and in particular here, I focus on the packages that are 
> > installed but not in the portage tree/a local overlay anymore (the problem 
> > does not occur for other packages).
> > It seems that installed packages do not store which are the actual cpv they 
> > depend on. Correct?
> 
> Right, because unfortunately that's something that changes over time.
> 
> Also, we may not be able to pin it down at any given moment if we have
> inconsistent || preferences as described here:
> 
> https://archives.gentoo.org/gentoo-dev/message/550d3859dea6d0fb0b39064628992634

Hmm, I think I see what you mean.
Storing the cpvs that was used during solving the package's dependencies would 
be too restrictive, since two different packages could provide the exact same 
functionalities/libraries.
And so, during a system update, only looking at the cpv dependencies would 
trigger useless recompilation of the packages that depend on the updated 
packages.
Correct?

Btw, my tool's solver does not have the problem discussed in the thread you're 
mentioning: atom order in lists has no influence in my solver.
Would fixing the inconsistent || preferences make storing cpvs for installed 
packages more realistic?


> > Also, I wanted to use the ebuild tool to install/uninstall package, which 
> > is not possible with this solution apparently.
> 
> Why not? Would the preserve-libs feature solve your problem?

... I'm sorry, I wasn't aware of this feature.
It would definitively solve the issue (except, as described in the bug 459038, 
when external tools remove libs).

This discussion is very interesting!
If I take this double layer of dependencies, I have to check how this 
influences the theory underlying my tool.

However, I still doubt that only storing the soname dependencies is enough.
Consider package A (that cannot be recompiled) that depends on package B which 
provides lib L.so.
B is recompiled with different use flags, which put different functionalities 
in L.so.
The dependencies of A are still satisfied (B is installed, L.so is available), 
but since the content of L.so changed, A cannot execute anymore.
Hypothetically, can this scenario occur?
Can this scenario occur in practice?
Is there a way in emerge/portage to avoid it?


> Well, there are a lot of upgrades that can't be performed without
> temporarily breaking something, so the "forbid broken dependencies" idea
> doesn't sound feasible to me.

Could you tell me about several instances of such needed dependency breakage?
You have far more experience than me on this, and it would be nice for me to 
know what I'm up against.

Many thanks!
Michael



Re: Re : Re: [gentoo-portage-dev] precisions on installed packages' dependencies

2020-03-23 Thread Zac Medico
On 3/23/20 3:21 PM, michael.lienha...@laposte.net wrote:
> - Zac Medico  a écrit :
> 
>>>  3. before removing a library, "ebuild unmerge" always checks if it is used 
>>> by another package: this means that installed packages' dependencies are 
>>> never broken.
>>
>> That's true if the package is removed via emerge --depclean, but emerge
>> --unmerge does not account for dependencies.
>>
>> Also, it's possible for dependencies of installed packages to be
>> temporarily broken by upgrades. In cases like this, the breakage will
>> eventually be resolved by a rebuild (which occurs automatically for slot
>> operator := deps), upgraded, or by emerge --depclean (which removes
>> unneeded packages).
> 
> Many thanks for your answers.
> They made me realize that the problem I'm facing is a bit more tricky than I 
> first quickly though.
> 
> I'll try to explain the problem, tell me if I'm not clear somewhere.
> 
> The goal of my tool is to have correct manipulation of package dependencies, 
> and in particular here, I focus on the packages that are installed but not in 
> the portage tree/a local overlay anymore (the problem does not occur for 
> other packages).
> It seems that installed packages do not store which are the actual cpv they 
> depend on. Correct?

Right, because unfortunately that's something that changes over time.

Also, we may not be able to pin it down at any given moment if we have
inconsistent || preferences as described here:

https://archives.gentoo.org/gentoo-dev/message/550d3859dea6d0fb0b39064628992634

> Hence, when an installed package cannot be updated/recompiled because it is 
> not in the tree anymore, like you said, its dependencies can be broken (due 
> to the package it depends on being updated).
> Currently, this issue is circumvented (only using depclean) by keeping the 
> libs: the package's dependencies are broken, but it's ok because it can still 
> run (which, in the end of the day, is what we want).
> However, from your answer, it seems that this fix is not entirely integrated 
> in the emerge/portage toolchain (like you said, emerge --unmerge removes 
> everything, and emerge -u removes the old libs).
> 
> To sum up, the problem I'm facing is that with the current way installed 
> packages are managed, we can break dependencies (and the only way to fix them 
> is to remove the installed package with the broken dependencies, that can 
> never be installed again).
> 
> Hence, for my tool, I have two solutions for that problem: either I forbid 
> for dependencies to ever be broken, or I allow it.
> 
> Solution 1: forbid broken dependencies.
> This requires to extend the information stored on installed package with the 
> list of the actual cpvs they depend (or at least the cp+slot, which is enough 
> to get back the cpvs).
> That way, I can say in the solver "if you want to keep that package, you need 
> to keep these packages as they currently are".
> However, I have no idea on how to do that, and doing this only for my tool 
> would mean that one cannot switch between emerge (quick) and my tool 
> (correct), which is a feature I think is essential.
> Do you think adding this new information to installed packages could be 
> integrated into emerge/portage itself? I could work on it (expect question 
> ^^), test it on my prototype, and do a pull request when everything's working.

It's possible to store this information in a cache of the most recently
calculated dependency graph.

> Solution 2: allow broken dependencies.
> Here, the idea is to use the same fix as is currently done with depclean, but 
> in my tool's planner (i.e., the part that install/unistall the packages) 
> directly.
> That way, I say in the solver "that installed package has no dependency", but 
> when I upgrade/remove packages, I say "Oh but wait, that other package still 
> need these libs, let's keep them".
> This solution may not require any change in portage/emerge, but I have no 
> idea on how to know which libs are needed by a package, and how to track 
> these libs owners without looking at every installed package's files (which 
> are stored in the CONTENT file, if I'm not mistaken).

For soname dependencies, we've got PROVIDES and REQUIRES metadata.

> Also, I wanted to use the ebuild tool to install/uninstall package, which is 
> not possible with this solution apparently.

Why not? Would the preserve-libs feature solve your problem?

> In case I need to implement this, could you give me some clue on how to 
> achieve it?
> 
> 
> Among these two solutions, I prefer the first one: we stay at the level of 
> package dependencies (and it looks simpler to implement).
> However, it is maybe easier/better to use the second approach, I don't know.
> Do you have some suggestions?

Well, there are a lot of upgrades that can't be performed without
temporarily breaking something, so the "forbid broken dependencies" idea
doesn't sound feasible to me.

> Thanks!
> Michael
-- 
Thanks,
Zac



signature.a

Re : Re: [gentoo-portage-dev] precisions on installed packages' dependencies

2020-03-23 Thread michael . lienhardt
- Zac Medico  a écrit :

> >  3. before removing a library, "ebuild unmerge" always checks if it is used 
> > by another package: this means that installed packages' dependencies are 
> > never broken.
> 
> That's true if the package is removed via emerge --depclean, but emerge
> --unmerge does not account for dependencies.
> 
> Also, it's possible for dependencies of installed packages to be
> temporarily broken by upgrades. In cases like this, the breakage will
> eventually be resolved by a rebuild (which occurs automatically for slot
> operator := deps), upgraded, or by emerge --depclean (which removes
> unneeded packages).

Many thanks for your answers.
They made me realize that the problem I'm facing is a bit more tricky than I 
first quickly though.

I'll try to explain the problem, tell me if I'm not clear somewhere.

The goal of my tool is to have correct manipulation of package dependencies, 
and in particular here, I focus on the packages that are installed but not in 
the portage tree/a local overlay anymore (the problem does not occur for other 
packages).
It seems that installed packages do not store which are the actual cpv they 
depend on. Correct?
Hence, when an installed package cannot be updated/recompiled because it is not 
in the tree anymore, like you said, its dependencies can be broken (due to the 
package it depends on being updated).
Currently, this issue is circumvented (only using depclean) by keeping the 
libs: the package's dependencies are broken, but it's ok because it can still 
run (which, in the end of the day, is what we want).
However, from your answer, it seems that this fix is not entirely integrated in 
the emerge/portage toolchain (like you said, emerge --unmerge removes 
everything, and emerge -u removes the old libs).

To sum up, the problem I'm facing is that with the current way installed 
packages are managed, we can break dependencies (and the only way to fix them 
is to remove the installed package with the broken dependencies, that can never 
be installed again).

Hence, for my tool, I have two solutions for that problem: either I forbid for 
dependencies to ever be broken, or I allow it.

Solution 1: forbid broken dependencies.
This requires to extend the information stored on installed package with the 
list of the actual cpvs they depend (or at least the cp+slot, which is enough 
to get back the cpvs).
That way, I can say in the solver "if you want to keep that package, you need 
to keep these packages as they currently are".
However, I have no idea on how to do that, and doing this only for my tool 
would mean that one cannot switch between emerge (quick) and my tool (correct), 
which is a feature I think is essential.
Do you think adding this new information to installed packages could be 
integrated into emerge/portage itself? I could work on it (expect question ^^), 
test it on my prototype, and do a pull request when everything's working.

Solution 2: allow broken dependencies.
Here, the idea is to use the same fix as is currently done with depclean, but 
in my tool's planner (i.e., the part that install/unistall the packages) 
directly.
That way, I say in the solver "that installed package has no dependency", but 
when I upgrade/remove packages, I say "Oh but wait, that other package still 
need these libs, let's keep them".
This solution may not require any change in portage/emerge, but I have no idea 
on how to know which libs are needed by a package, and how to track these libs 
owners without looking at every installed package's files (which are stored in 
the CONTENT file, if I'm not mistaken).
Also, I wanted to use the ebuild tool to install/uninstall package, which is 
not possible with this solution apparently.
In case I need to implement this, could you give me some clue on how to achieve 
it?


Among these two solutions, I prefer the first one: we stay at the level of 
package dependencies (and it looks simpler to implement).
However, it is maybe easier/better to use the second approach, I don't know.
Do you have some suggestions?

Thanks!
Michael



Re: [gentoo-portage-dev] precisions on installed packages' dependencies

2020-03-22 Thread Zac Medico
On 3/22/20 5:38 PM, michael.lienha...@laposte.net wrote:
> Dear all,
> 
> Still in the process of improving my solver (and make it a usable tool), I 
> need to have a better idea on how installed packages should be managed.

Great!

> I didn't find anything on that topic in the PMS (if I've missed it, I'm 
> sorry).
> Could you confirm/correct my following understanding:
>  1. installed packages that are still in the portage tree can be 
> unmerged/updated without any restriction (as specified in their .ebuild)

True.

>  2. installed packages that are not in the portage tree can only be kept as 
> is or unmerged

Installed packages may also implement pkg_config and pkg_info phases
that can be executed via emerge --config and emerge --info.

>  3. before removing a library, "ebuild unmerge" always checks if it is used 
> by another package: this means that installed packages' dependencies are 
> never broken.

That's true if the package is removed via emerge --depclean, but emerge
--unmerge does not account for dependencies.

Also, it's possible for dependencies of installed packages to be
temporarily broken by upgrades. In cases like this, the breakage will
eventually be resolved by a rebuild (which occurs automatically for slot
operator := deps), upgraded, or by emerge --depclean (which removes
unneeded packages).

> 
> Many thanks!
> Michael
>
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature