[Distutils] Re: package management - common storage while keeping the versions straight

2020-07-07 Thread David Mathog
Hi all.

"Python devirtualizer" is a preliminary implementation which manages
shared packages so that only one copy of each package version is
required.  It installs into a virtualenv, then migrates the contents
out into the normal OS environment, and while so doing, replaces what
would be duplicate files with soft links to a single copy.  It is
downloadable from here:

https://sourceforge.net/projects/python-devirtualizer/

It is linux (or other POSIX like system, __maybe__ Mac)  specific.  No
way it will run on Windows at this point because the main script is
bash and the paths assume POSIX path syntax.  (Might work in Mingw64
though.)

Anyway,

pdvctrl install packageA
pdvctrl migrate packageA /wherever/packageA
pdvctrl install packageB
pdvctrl migrate packageB /wherever/packageB

will result in a single copy of the shared dependencies on this
system, with both packageA and packageB hooked to them with soft
links.  The import does not go awry because from within each package's
site-packages directory there are only links to the files it needs, so
it never sees any conflicting package versions.

There is also:

pdvctrl preinstall packageC
pdvctrl install packageC
pdvctrl migrate packageC /wherever/packageC

which first uses johnnydep to look up dependencies already on the
system and links those in directly before going on to install any
pieces not so installed.  Unfortunately the johnnydep runs with
"preinstall" have so far been significantly slower than just doing a
normal install and letting the migrate throw out the extra copy.  On
the other hand, the one package I have encountered which has
conflicting requirements (scanpy-scripts) fails in a more
comprehensible manner with "preinstall" than with "install".

Migrate "wraps" the files in the package's "bin" directory, if any, so
that they may be invoked solely by PATH like a regular program.  This
uses libSDL2 to get the absolute path of the wrapper program, and it
defines PYTHONPATH before execve() to the actual target.  So no
messing about with PYTHONPATH in the user's shell or in scripts.  So
far I have not run into a problem with the wrappers, which essentially
just inject a PYTHONPATH into the environment when the program is run.
Well, one package (busco) had a file with no terminal EOL, which
resulted in its last line being dropped while it was being wrapped,
but that case is now handled.  I do expect though at some point to
encounter a package which has several files in its bin, and
first_program will contain some variant of:

python3 /wherever/bin/second_program

The wrapper will break those, since the wrapper is a regular binary
and not a python script.

Regards,

David Mathog


On Mon, Jun 29, 2020 at 1:43 PM John Thorvald Wodder II
 wrote:
>
> On 2020 Jun 29, at 16:09, David Mathog  wrote:
> >
> > In neither case does the egg-info file reference the corresponding
> > directory, but at least the directory in both has the expected package
> > name (other than case).  In the examples you cited at the top, were
> > any of those "different name" cases from packages with a "file"
> > egg-info?
>
> The projects I examined were all in wheel form and thus had *.dist-info 
> directories instead of *.egg-info.  I know very little about how eggs work, 
> other than that they're deprecated and should be avoided in favor of wheels.
>
> -- John Wodder
> --
> Distutils-SIG mailing list -- distutils-sig@python.org
> To unsubscribe send an email to distutils-sig-le...@python.org
> https://mail.python.org/mailman3/lists/distutils-sig.python.org/
> Message archived at 
> https://mail.python.org/archives/list/distutils-sig@python.org/message/DMRPHSWPXPEWJOHFZVBKTJMH34KABHTM/
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/WIWLAD3537K7DYNUBZVIMPE7SFEV6E5L/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-29 Thread John Thorvald Wodder II
On 2020 Jun 29, at 16:09, David Mathog  wrote:
> 
> In neither case does the egg-info file reference the corresponding
> directory, but at least the directory in both has the expected package
> name (other than case).  In the examples you cited at the top, were
> any of those "different name" cases from packages with a "file"
> egg-info?

The projects I examined were all in wheel form and thus had *.dist-info 
directories instead of *.egg-info.  I know very little about how eggs work, 
other than that they're deprecated and should be avoided in favor of wheels.

-- John Wodder
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/DMRPHSWPXPEWJOHFZVBKTJMH34KABHTM/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-29 Thread David Mathog
On Fri, Jun 26, 2020 at 2:51 PM John Thorvald Wodder II
 wrote:

> Of the 32,517 non-matching projects, 7,117 were Odoo projects with project 
> names of the form "odoo{version}_addon_{foo}" containing namespace modules of 
> the form "odoo/addons/{foo}", and 3,175 were Django projects with project 
> names of the form "django_{foo}" containing packages named just "{foo}".  No 
> other major patterns seem to stand out.

In CentOS 8 the RPM

   python3-rhnlib-2.8.6-8.module_el8.1.0+211+ad6c0bc7.noarch

has loaded into the directory

  /usr/lib/python3.6/site-packages

two entries

rhn# a directory
rhnlib-2.8.6-py3.6.egg-info #a file

The latter contains just this text:

Metadata-Version: 1.0
Name: rhnlib
Version: 2.8.6
Summary: Python libraries for the Spacewalk project
Home-page: http://rhn.redhat.com
Author: Mihai Ibanescu
Author-email: m...@redhat.com
License: GPL
Description: rhnlib is a collection of python modules used by the
Spacewalk (http://spacewalk.redhat.com) software.
Platform: UNKNOWN

Nor is there a link in the other direction:

grep -iR rhnlib /usr/lib/python3.6/site-packages/rhn
#nothing

So while "rhn" bears a similarity to "rhnlib" it is neither the
package name nor is it listed in the egg-info.

This was of course installed by dnf (AKA yum) and not by egg.

Is it possible for any python installer (as opposed to dnf, which runs
outside of it) to install an unreferenced directory like this?
Presumably not with a dist-info, but with an egg-info that does not in
any way reference the active part of the installation?  In a small
collection (172 packages) here these were the only two "file" egg-info
entries found, with their associated directories:

busco
BUSCO-4.0.6-py3.6.egg-info
ngs
ngs-1.0-py3.6.egg-info

In neither case does the egg-info file reference the corresponding
directory, but at least the directory in both has the expected package
name (other than case).  In the examples you cited at the top, were
any of those "different name" cases from packages with a "file"
egg-info?

Thanks,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/DRBMPYMLRGGF2WED7AKWSQS7B7EARIVB/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-27 Thread Paul Moore
On Sat, 27 Jun 2020 at 01:37, David Mathog  wrote:
>
> Thanks for that feedback.  Looks like RECORD is the one to use.
>
> The names of the directories ending in dist-info seem to be uniformly:
>
> package-version.dist_info

Note that if you're doing something like this, you should probably
read PEP 376 (https://www.python.org/dev/peps/pep-0376/) which defines
the standard layout of installed packages.

> but the directory names associated with eggs come in a lot of flavors:
>
> anndata-0.6.19-py3.6.egg
> cutadapt-2.10.dev20+g93fb340-py3.6-linux-x86_64.egg
> scanpy-1.5.2.dev7+ge33a2f33-py3.6.egg
> h5py-2.9.0-py3.6-linux-x86_64.egg
> simplejson-3.17.0-py3.6.egg-info

The egg format is an older format that was never standardised, so
details of that format are likely somewhere in the setuptools
documentation. .egg-info directories are the older equivalent of
dist-info directories, but egg directories are a very different format
(they contain the full distribution plus metadata in one directory).
You;d have to find the setuptools documentation of the egg format for
that. (Note that the egg format is obsolete, so you may need to look
at older documentation - I don't know if the current setuptools docs
describe the format).

I'm not aware what other formats tools like conda use, sorry.

Paul
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/MCM2ZMLXHVYNELLQ2CFGKNC2HJXDL5RN/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-26 Thread David Mathog
Thanks for that feedback.  Looks like RECORD is the one to use.

The names of the directories ending in dist-info seem to be uniformly:

package-version.dist_info

but the directory names associated with eggs come in a lot of flavors:

anndata-0.6.19-py3.6.egg
cutadapt-2.10.dev20+g93fb340-py3.6-linux-x86_64.egg
scanpy-1.5.2.dev7+ge33a2f33-py3.6.egg
h5py-2.9.0-py3.6-linux-x86_64.egg
simplejson-3.17.0-py3.6.egg-info

johnnydep does not give any hints that this is coming:

johnnydep --output-format pinned h5py
#relevant part:  h5py==2.10.0

What would be some small examples for other package managers, I would
like to see what they have as equivalents to dist-info and egg-info so
that the script does not choke on it.

Some progress with the test script.  It can now convert a virtualenv
to a regular directory
and migrate the site-packages contents to a shared area.  A second
migration of a copy of the same virtualenv to a different regular
directory correctly makes links to the first set.
(That is, two normal directories both linked to one common set of
packages.)  And the test program (johnnydep) runs in both with
PYTHONPATH set correctly.  But preinstalling, that is setting links to
the common directory before doing a normal install is tricky because
of the name inconsistencies.  To do that it must run johnnydep to get
the necessary information, and that is not very fast.  A normal
install of johnnydep itself, complete with downloads, takes less time
than that programs own analysis!

time johnnydep johnnydep
#21s

vs.

rm -rf ~/.cache/pip #force actual downloads
#too fast to measure
time python3 -m venv johnnydep
#2.3s
source johnnydep/bin/activate
#too fast to measure
time python -m pip install -U pip #update 9.0.3 to 20.1.1
#3.4s
time pip3 install johnnydep
#7.8s

Probably a package with a huge amount of compilation would be a win
for a preinstall, but
it is at this point definitely not an "always faster" option.

Thanks,

David Mathog

On Fri, Jun 26, 2020 at 2:51 PM John Thorvald Wodder II
 wrote:
>
> On 2020 Jun 26, at 15:50, David Mathog  wrote:
>
> > Still, how common is that?  Can anybody offer an estimate about what
> > fraction of packages use different names like that?
>
> Scanning through the wheelodex.org database (specifically, a dump from 
> earlier this week) finds 32,517 projects where the wheel DOES NOT contain a 
> top-level module of the same name as the project (after correcting for 
> differences in case and hyphen vs. underscore vs. period) and 74,073 projects 
> where the wheel DOES contain a module of the same name.  (5,417 projects 
> containing no modules were excluded.)  Note that a project named "foo-bar" 
> containing a namespace package "foo/bar" is counted in the former group.
>
> Of the 32,517 non-matching projects, 7,117 were Odoo projects with project 
> names of the form "odoo{version}_addon_{foo}" containing namespace modules of 
> the form "odoo/addons/{foo}", and 3,175 were Django projects with project 
> names of the form "django_{foo}" containing packages named just "{foo}".  No 
> other major patterns seem to stand out.
>
> -- John Wodder
> --
> Distutils-SIG mailing list -- distutils-sig@python.org
> To unsubscribe send an email to distutils-sig-le...@python.org
> https://mail.python.org/mailman3/lists/distutils-sig.python.org/
> Message archived at 
> https://mail.python.org/archives/list/distutils-sig@python.org/message/V445KCPLKMEVSSEAKX776DMNSPL76JRR/
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/TT6WZTTBMEWHTZD56HXH42JFKEI5VECK/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-26 Thread John Thorvald Wodder II
On 2020 Jun 26, at 15:50, David Mathog  wrote:

> Still, how common is that?  Can anybody offer an estimate about what
> fraction of packages use different names like that?

Scanning through the wheelodex.org database (specifically, a dump from earlier 
this week) finds 32,517 projects where the wheel DOES NOT contain a top-level 
module of the same name as the project (after correcting for differences in 
case and hyphen vs. underscore vs. period) and 74,073 projects where the wheel 
DOES contain a module of the same name.  (5,417 projects containing no modules 
were excluded.)  Note that a project named "foo-bar" containing a namespace 
package "foo/bar" is counted in the former group.

Of the 32,517 non-matching projects, 7,117 were Odoo projects with project 
names of the form "odoo{version}_addon_{foo}" containing namespace modules of 
the form "odoo/addons/{foo}", and 3,175 were Django projects with project names 
of the form "django_{foo}" containing packages named just "{foo}".  No other 
major patterns seem to stand out.

-- John Wodder
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/V445KCPLKMEVSSEAKX776DMNSPL76JRR/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-26 Thread John Thorvald Wodder II
(Sending to the list this time.)

On 2020 Jun 26, at 15:43, David Mathog  wrote:
> So by what method could code working outside of python possibly determine that
> "yaml" goes with "PyYAML"?

By checking all *.dist-info/RECORD files to see which one mentions the "yaml" 
directory.  (top_level.txt could also be checked, but I believe that only 
setuptools creates this file — projects built with flit or poetry don't have it 
— and it's not very helpful when namespace packages are involved.)

>  Is this a common situation?

It happens whenever the project "foo" distributes a module named something 
other than "foo".  Other projects like this that I can think of off the top of 
my head are BeautifulSoup4 (module: bs4), python-dateutil (module: dateutil), 
and attrs (module: attr).

> Is pkg_resources actually a package?

pkg_resources is a module distributed by the setuptools project (alongside the 
modules "setuptools" and "easy_install").

> Does it make sense for a common
> package repository to have a single instance of this directory or
> should each installed python based program retain its own version of
> this?

There should be one instance per each version of setuptools stored in the 
repository.

-- John Wodder
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/IP7LRY5ZGDBIGSW4Q4SMJ7WM6WM6ZSVW/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-26 Thread David Mathog
On Fri, Jun 26, 2020 at 12:43 PM David Mathog  wrote:
> So by what method could code working outside of python possibly determine that
> "yaml" goes with "PyYAML"?

Sorry, I forgot that the information was in
PyYAML-5.3.1-py3.6.egg-info/top_level.txt

Still, how common is that?  Can anybody offer an estimate about what
fraction of packages use different names like that?

Thanks,

David Mathog



Is this a common situation?
>
> Is pkg_resources actually a package?  Does it make sense for a common
> package repository to have a single instance of this directory or
> should each installed python based program retain its own version of
> this?
>
> There are some other files that live in site-packages which are not
> actually packages. The list so far is:
>
> __pycache__
>
> #some dynamic libraries, like
> kiwisolver.cpython-36m-x86_64-linux-gnu.so
>
> #some pth files, but always so far with an explicit version number, like
> sphinxcontrib_applehelp-1.0.2-py3.8-nspkg.pth
> #or associated with a package with a version number like:
> setuptools
> setuptools-46.1.3.dist-info
> setuptools.pth
>
> #some py files, apparently when that package does not make a corresponding
> #directory like:
> zipp-3.1.0.dist-info
> zipp.py
>
> #initialization file "site" as
> site.py
> site.pyc
>
> Any others to look out for?  That is, files which might be installed
> in site-packages but which should not be shared.
>
> Hopefully this next is an appropriate question for this list, since
> the issue arises from how python loads packages.  Is there any way to
> avoid collisions between python based programs other than activating
> and deactivating their virtualenvs, or redefining PYTHONPATH, before
> each is used?  Programs that have the property that their library
> loading is determinate (usually the case with C, fortran, bash
> scripts, etc.)one can construct a bash script (for instance) which
> runs 3 programs in order like so:
>
> prog1
> prog2
> prog3  # spawns subprocesses which run prog2 and prog1
>
> and there are not generally any issues.  (Yes, one can create a mess
> with LD_PRELOAD and the like.)  But if those 3 are python programs
> unless prog1, prog2, prog3 are all built into the same virtualenv,
> which usually means they come from the same software distribution, I
> don't see how to avoid conflicts for the first two cases without
> activating/deactivating each one, which looks like it might be tricky
> in the 3rd case.
>
> If one has a directory like:
>
> TOP/bin/prog
> TOP/lib/python3.6/site-packages
>
> Other than using PYTHONPATH to direct to it with an absolute path, is
> there any way to force prog to only import from that specific
> site-packages?  Let me try that again.  Is there a way to tell prog
> via any environmental variable to look in
> "../lib/python3.6/site-packages" (and nowhere else) for imports, with
> the reference directory being that where prog is installed, not where
> the process PWD might happen to be.  Because if that was possible it
> might allow a sort of "set it and forget it" method like
>
> export PYTHONRELPATHFROMPROG="../lib/python3.6/site-packages
> prog1  #uses prog1 site-package
> prog2  #uses prog2 site-package
> prog3  #uses prog3 site-package
> #  prog1 subprocess  #uses prog1 site-package
> #  prog2 subprocess  #uses prog2 site-package
>
> (None of which would be necessary if python programs could import
> specific versions reliably from a common directory containing multiple
> versions of each package.)
>
> Thanks,
>
> David Mathog
>
>
> On Thu, Jun 25, 2020 at 10:46 AM David Mathog  wrote:
> >
> > On Thu, Jun 25, 2020 at 12:37 AM Paul Moore  wrote:
> >
> > > I think the key message here is that you won't be *re*-inventing the
> > > wheel. This is a wheel that still needs to be invented.
> >
> > It _was_ invented, but it is off round and gives a rough ride.  As
> > noted in the first post this:
> >
> > __requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy
> > <3.0.0,>=2.00', 'h5py <2.10']
> > import pkg_resources
> >
> > was able to load the desired set of package-versions for scanpy, but
> > setting a version number constraint on scanpy itself at the end of
> > that list, one which matched the version that the preceding commands
> > successfully loaded, broke it.  So it is not reliable.
> >
> > And the entire __requires__ kludge is only present because for reasons
> > beyond my pay grade this:
> >
> > import pkg_resources
> > pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.6.20;etc.")
> > import scipy
> > import anndata
> > #etc.
> >
> > cannot work because by default "import pkg_resources" keeps only the
> > most recent version rather than making up a tree (or list or hash or
> > whatever) and waiting to see if there are any version constraints to
> > be applied at the time of actual package import.
> >
> > What I'm doing now is basically duct tape and bailing wire to work
> > around those deeper issues.  In terms of language design, a much
> > better fix 

[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-26 Thread David Mathog
Questions about naming conventions.

The vast majority of packages when they install create in
site-packages two directories with names like:

foobar
foobar-1.2.3.dist-info  (or egg-info)

However PyYAML creates:

yaml
PyYAML-5.3.1-py3.6.egg-info

and there is also this:

pkg_resources

which is not associated with a versioned package.

In python3

>>> import yaml
>>> import pkg_resources
>>> print(yaml.__version__)
5.3.1
>>> print(pkg_resources.__version__)
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: module 'pkg_resources' has no attribute '__version__'

So by what method could code working outside of python possibly determine that
"yaml" goes with "PyYAML"?   Is this a common situation?

Is pkg_resources actually a package?  Does it make sense for a common
package repository to have a single instance of this directory or
should each installed python based program retain its own version of
this?

There are some other files that live in site-packages which are not
actually packages. The list so far is:

__pycache__

#some dynamic libraries, like
kiwisolver.cpython-36m-x86_64-linux-gnu.so

#some pth files, but always so far with an explicit version number, like
sphinxcontrib_applehelp-1.0.2-py3.8-nspkg.pth
#or associated with a package with a version number like:
setuptools
setuptools-46.1.3.dist-info
setuptools.pth

#some py files, apparently when that package does not make a corresponding
#directory like:
zipp-3.1.0.dist-info
zipp.py

#initialization file "site" as
site.py
site.pyc

Any others to look out for?  That is, files which might be installed
in site-packages but which should not be shared.

Hopefully this next is an appropriate question for this list, since
the issue arises from how python loads packages.  Is there any way to
avoid collisions between python based programs other than activating
and deactivating their virtualenvs, or redefining PYTHONPATH, before
each is used?  Programs that have the property that their library
loading is determinate (usually the case with C, fortran, bash
scripts, etc.)one can construct a bash script (for instance) which
runs 3 programs in order like so:

prog1
prog2
prog3  # spawns subprocesses which run prog2 and prog1

and there are not generally any issues.  (Yes, one can create a mess
with LD_PRELOAD and the like.)  But if those 3 are python programs
unless prog1, prog2, prog3 are all built into the same virtualenv,
which usually means they come from the same software distribution, I
don't see how to avoid conflicts for the first two cases without
activating/deactivating each one, which looks like it might be tricky
in the 3rd case.

If one has a directory like:

TOP/bin/prog
TOP/lib/python3.6/site-packages

Other than using PYTHONPATH to direct to it with an absolute path, is
there any way to force prog to only import from that specific
site-packages?  Let me try that again.  Is there a way to tell prog
via any environmental variable to look in
"../lib/python3.6/site-packages" (and nowhere else) for imports, with
the reference directory being that where prog is installed, not where
the process PWD might happen to be.  Because if that was possible it
might allow a sort of "set it and forget it" method like

export PYTHONRELPATHFROMPROG="../lib/python3.6/site-packages
prog1  #uses prog1 site-package
prog2  #uses prog2 site-package
prog3  #uses prog3 site-package
#  prog1 subprocess  #uses prog1 site-package
#  prog2 subprocess  #uses prog2 site-package

(None of which would be necessary if python programs could import
specific versions reliably from a common directory containing multiple
versions of each package.)

Thanks,

David Mathog


On Thu, Jun 25, 2020 at 10:46 AM David Mathog  wrote:
>
> On Thu, Jun 25, 2020 at 12:37 AM Paul Moore  wrote:
>
> > I think the key message here is that you won't be *re*-inventing the
> > wheel. This is a wheel that still needs to be invented.
>
> It _was_ invented, but it is off round and gives a rough ride.  As
> noted in the first post this:
>
> __requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy
> <3.0.0,>=2.00', 'h5py <2.10']
> import pkg_resources
>
> was able to load the desired set of package-versions for scanpy, but
> setting a version number constraint on scanpy itself at the end of
> that list, one which matched the version that the preceding commands
> successfully loaded, broke it.  So it is not reliable.
>
> And the entire __requires__ kludge is only present because for reasons
> beyond my pay grade this:
>
> import pkg_resources
> pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.6.20;etc.")
> import scipy
> import anndata
> #etc.
>
> cannot work because by default "import pkg_resources" keeps only the
> most recent version rather than making up a tree (or list or hash or
> whatever) and waiting to see if there are any version constraints to
> be applied at the time of actual package import.
>
> What I'm doing now is basically duct tape and bailing wire to work

[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-25 Thread David Mathog
On Thu, Jun 25, 2020 at 12:37 AM Paul Moore  wrote:

> I think the key message here is that you won't be *re*-inventing the
> wheel. This is a wheel that still needs to be invented.

It _was_ invented, but it is off round and gives a rough ride.  As
noted in the first post this:

__requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy
<3.0.0,>=2.00', 'h5py <2.10']
import pkg_resources

was able to load the desired set of package-versions for scanpy, but
setting a version number constraint on scanpy itself at the end of
that list, one which matched the version that the preceding commands
successfully loaded, broke it.  So it is not reliable.

And the entire __requires__ kludge is only present because for reasons
beyond my pay grade this:

import pkg_resources
pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.6.20;etc.")
import scipy
import anndata
#etc.

cannot work because by default "import pkg_resources" keeps only the
most recent version rather than making up a tree (or list or hash or
whatever) and waiting to see if there are any version constraints to
be applied at the time of actual package import.

What I'm doing now is basically duct tape and bailing wire to work
around those deeper issues.  In terms of language design, a much
better fix would be to modify pkg_resources so that it will always
successfully load the required versions from a designated directory
which contains multiple versions of packages, and modify the package
maintenance tools so that they can maintain such a directory.

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/X23JIVPWU74HW3GBMVJEKAC2XUFROKAL/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-25 Thread Jason Madden
On Tue, 2020-06-23 at 15:51 -0700, David Mathog wrote:
> What I am after is some method of keeping exactly one copy of each
> package-version in the common area (ie, one might find foo-1.2,
> foo-1.7, and foo-2.3 there), while also presenting only the one
> version of each (let's say foo-1.7) to a particular installed program.
> On linux it might do that by making soft links to the common
> PYTHONPATH area from another directory for which it sets PYTHONPATH
> for the application. Finally, this has to be usable by any account
> which has read execute access to the main directory.
> 
> Does such a beast exist?  If so, please point me to it!

zc.buildout[1] and zc.recipe.egg[2] can do something very much like this. 
zc.buildout tries hard to maintain reproducibility and isolation, and one of 
the ways it does this is by keeping each package in its own .egg directory. It 
then generates the entry-point scripts and a REPL with ``sys.path`` explicitly 
set to reference exactly the versions specified. There's no virtualenv-like 
activation step that puts all those scripts on the path, though, so one must 
either do that manually or just invoke the generated scripts directly.

For example, here's a buildout configuration that specifies a shared directory 
to store eggs in. It also has some parts using zc.recipe.egg, one that will use 
zope.interface 4 and one that will use zope.interface 5 (egg specifications can 
be arbitrarily complex, of course, dependencies are followed, etc):

  [buildout]
  eggs-directory = //buildout-eggs
  abi-tag-eggs = true
  parts =
old-interface
new-interface
new-interface-plus

  [old-interface]
  recipe = zc.recipe.egg
  eggs =
zope.interface == 4.0
  interpreter = old-py

  [new-interface]
  recipe = zc.recipe.egg
  eggs =
zope.interface == 5.1
  interpreter = new-py

  [new-interface-plus]
  recipe = zc.recipe.egg
  eggs =
zope.interface == 5.1
zope.component
  interpreter = new-py-plus


After running `buildout`, I have three REPL files (if zope.interface had 
defined any entry point scripts, I could also have had those generated):

  $ head bin/new-py
  #!

  import sys

  sys.path[0:0] = [
  '//buildout-eggs/pypy_73/zope.interface-5.1.0-py2.7.egg',
  '//site-packages',
  ]

  $ head bin/old-py
  #!

  import sys

  sys.path[0:0] = [
  '//buildout-eggs/pypy_73/zope.interface-4.0.0-py2.7.egg',
  '//site-packages',
  ]
  
  $ head bin/new-py-plus
  #!

  import sys

  sys.path[0:0] = [
  '//buildout-eggs/pypy_73/zope.interface-5.1.0-py2.7.egg',
  '//buildout-eggs/pypy_73/zope.component-4.6.1-py2.7.egg',
  '//site-packages',
  ]


I've got a collection of zope.interface eggs referenced from a variety of 
different buildouts (at least at one point in time) and from a variety of 
different Python implementations, but only ever one copy of each:

  $ ls -ld buildout-eggs/*/zope.interface*
  Permissions Size UserDate ModifiedName
  drwxr-xr-x - jmadden 2017-05-04 06:53 
buildout-eggs/pypy_41/zope.interface-4.4.0-py2.7.egg/
  drwxr-xr-x - jmadden 2017-05-04 07:00 
buildout-eggs/cp27m/zope.interface-4.4.0-py2.7-macosx-10.12-x86_64.egg/
  drwxr-xr-x - jmadden 2017-05-09 17:55 
buildout-eggs/cp34m/zope.interface-4.4.0-py3.4-macosx-10.12-x86_64.egg/
  drwxr-xr-x - jmadden 2017-06-08 10:20 
buildout-eggs/cp27m/zope.interface-4.4.1-py2.7-macosx-10.12-x86_64.egg/
  drwxr-xr-x - jmadden 2017-07-11 10:07 
buildout-eggs/cp36m/zope.interface-4.4.2-py3.6-macosx-10.12-x86_64.egg/
  drwxr-xr-x - jmadden 2017-12-08 11:10 
buildout-eggs/cp27m/zope.interface-3.6.7-py2.7-macosx-10.13-x86_64.egg/
  drwxr-xr-x - jmadden 2018-05-07 11:05 
buildout-eggs/cp27m/zope.interface-4.1.3-py2.7-macosx-10.13-x86_64.egg/
  drwxr-xr-x - jmadden 2020-03-13 07:53 
buildout-eggs/cp27m/zope.interface-4.6.0-py2.7-macosx-10.15-x86_64.egg/
  drwxr-xr-x - jmadden 2020-04-08 07:32 
buildout-eggs/cp38/zope.interface-5.1.0-py3.8-macosx-10.15-x86_64.egg/
  drwxr-xr-x - jmadden 2020-05-17 09:30 
buildout-eggs/cp27m/zope.interface-5.1.0-py2.7.egg/
  drwxr-xr-x - jmadden 2020-06-11 07:42 
buildout-eggs/pypy_73/zope.interface-5.1.0-py2.7.egg/
  drwxr-xr-x - jmadden 2020-06-25 11:58 
buildout-eggs/pypy_73/zope.interface-4.0.0-py2.7.egg/


~Jason

[1] http://www.buildout.org/ ; I recommend version 3, which uses pip to find 
eggs
[2] https://pypi.org/project/zc.recipe.egg/
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/G52C7F6QHH3TINSOYBMI5CV54SWCZMPT/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-25 Thread Paul Moore
On Thu, 25 Jun 2020 at 00:06, David Mathog  wrote:
>
> Thanks for the link.  Unfortunately there was not a reference to a
> completed package that actually did this. As in, I really do not want
> to reinvent the wheel.  Ugh, sorry, that's a pun in this context.

I think the key message here is that you won't be *re*-inventing the
wheel. This is a wheel that still needs to be invented.

Paul

(It was *way* too hard trying to write the above without tripping over
the extended "wheel" pun ;-))
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/JHCOMW4KYYKNHDA5KDNAMOS33OWM5BSM/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread Filipe Laíns
On Tue, 2020-06-23 at 15:51 -0700, David Mathog wrote:
> What I am after is some method of keeping exactly one copy of each
> package-version in the common area (ie, one might find foo-1.2,
> foo-1.7, and foo-2.3 there), while also presenting only the one
> version of each (let's say foo-1.7) to a particular installed program.
> On linux it might do that by making soft links to the common
> PYTHONPATH area from another directory for which it sets PYTHONPATH
> for the application. Finally, this has to be usable by any account
> which has read execute access to the main directory.
> 
> Does such a beast exist?  If so, please point me to it!

I have been meaning to do something like this for a while now! But
unfortunately I can't find the time.

If you do choose of start implementing it, please let me know. I would
be happy to help out.

Cheers,
Filipe Laíns


signature.asc
Description: This is a digitally signed message part
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/ZIDMTET7JPOQJGCJR3L6EUDKOLGWYLRW/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread David Mathog
It turned out that the second install was not the cause of the
timestamp change in the original.  On reviewing "history" it turned
out that I had accidentally run the link generation twice.  That
turned up this (for me) unexpected behavior:

mkdir /tmp/foo
ls -al /tmp/foo
total 16
drwxrwxr-x.   2 modules modules 6 Jun 24 16:49 .
drwxrwxrwt. 173 rootroot12288 Jun 24 16:49 ..
ln -s /tmp/foo /tmp/bar
ls -al /tmp/foo
drwxrwxr-x.   2 modules modules 6 Jun 24 16:49 .
drwxrwxrwt. 173 rootroot12288 Jun 24 16:49 ..
ln -s /tmp/foo /tmp/bar
ls -al /tmp/foo
total 16
drwxrwxr-x.   2 modules modules17 Jun 24 16:51 .
drwxrwxrwt. 173 rootroot12288 Jun 24 16:50 ..
lrwxrwxrwx.   1 modules modules 8 Jun 24 16:51 foo -> /tmp/foo

The repeated soft link actually put a file under the target.  Strange.
Apparently it is expected behavior.  The problem can be avoided by
using this form:

 ln -sn $TARGET $LINK

The later installs are much faster than the first one, since putting
in the links is very fast and building the packages is not.  This was
the trivial case though, since having done one install all the
prerequisites were just "there".  The johnnydep package will list the
dependencies without doing the install.  Guess I will throw something
together based on that and the above results and see how it goes.

Regards,

David Mathog



On Wed, Jun 24, 2020 at 4:23 PM Filipe Laíns  wrote:
>
> On Tue, 2020-06-23 at 15:51 -0700, David Mathog wrote:
> > What I am after is some method of keeping exactly one copy of each
> > package-version in the common area (ie, one might find foo-1.2,
> > foo-1.7, and foo-2.3 there), while also presenting only the one
> > version of each (let's say foo-1.7) to a particular installed program.
> > On linux it might do that by making soft links to the common
> > PYTHONPATH area from another directory for which it sets PYTHONPATH
> > for the application. Finally, this has to be usable by any account
> > which has read execute access to the main directory.
> >
> > Does such a beast exist?  If so, please point me to it!
>
> I have been meaning to do something like this for a while now! But
> unfortunately I can't find the time.
>
> If you do choose of start implementing it, please let me know. I would
> be happy to help out.
>
> Cheers,
> Filipe Laíns
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/63NJKSY7BLPJZXLK5DJFWROGQUKJ7RVF/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread David Mathog
Thanks for the link.  Unfortunately there was not a reference to a
completed package that actually did this. As in, I really do not want
to reinvent the wheel.  Ugh, sorry, that's a pun in this context.

Here is a first shot at this, just installing a moderately complicated
package in a virtualenv and then reinstalling it in another
virtualenv.  Extract and execinput are my own programs (from drm_tools
on sourceforge) but it is obvious from the context what they are
doing. The links had to be soft because linux does not actually allow
a normal user (or maybe even root) to make a hard link to a directory.

cd /usr/common/lib/python3.6/Envs
rm -rf ~/.cache/pip #make download clearer
python3 -m venv scanpy
source scanpy/bin/activate
python -m pip install -U pip #update 9.0.3 to 20.1.1
which python3 #using the one in scanpy
pip3 install scanpy
scanpy -h #seems to start
deactivate
rm -rf ~/.cache/pip #make download clearer
python3 -m venv scanpy2
source scanpy2/bin/activate
python -m pip install -U pip #update 9.0.3 to 20.1.1
export DST=/usr/common/lib/python3.6/Envs/scanpy/lib/python3.6/site-packages
export SRC=/usr/common/lib/python3.6/Envs/scanpy2/lib/python3.6/site-packages
ls -1 $DST \
| grep -v __pycache__ \
| grep -v scanpy \
| grep -v easy_install.py \
| extract -fmt "ln -s $DST/[1,] $SRC/[1,]" \
| execinput
pip3 install scanpy
#downloaded scanpy, "Requirement already satisfied" for all the others
#Installing collected packages: scanpy
# Successfully installed scanpy-1.5.1
scanpy -h #seems to start
deactivate
source scanpy/bin/activate
scanpy -h #seems to start (still)
deactivate

So that method seems to have some promise.  It saved a considerable
amount of space too:

du -k scanpy | tail -1
457408  scanpy
du -k scanpy2 | tail -1
24900   scanpy2


However, two potential problems are evident on inspection.

The first is that when the 2nd scanpy installation was performed it
updated the dates on all the directories in $DST.  A workaround would
be to copy all of those directories into the virtualenv temporarily,
just for the installation, and then remove them and put the links in
afterwards.  That strikes me as awfully cludgy.  Setting them read
only would likely break the install.

The second issue is that each package install creates two directories like:

llvmlite
llvmlite-0.33.0.dist-info

where the latter contains top_level.txt which in turn contains one line:
  llvmlite
pointing to the first directory.

If another version must cohabit with it the "llvmlite" directories
will conflict.  For this sort of approach to work easily the llvmlite
directory should be named "llvmlite-0.33.0" and top_level.txt should
reference that too.  It would be possible (probably) to work around it
though by having llvmlite-0.33.0 only in the common area and use:

ln -s $COMMON/llvmlite-0.33.0 $VENVAREA/llvmlite

The top_level.txt in each could then reference the unversioned name.

Unknown if this soft link approach will work on Windows.

Regards,

David Mathog

On Wed, Jun 24, 2020 at 1:26 PM Steve Dower  wrote:
>
> On 24Jun2020 1923, David Mathog wrote:
> > I think I will experiment a little with pipenv and if necessary after
> > each package install use a script to remove the installed libraries
> > and replace them with a hard link to the one in the common area.
> > Maybe it will be possible to put in those links before installing the
> > package of interest (like for scanpy, see first post), which will
> > hopefully keep it from having to rebuild all those packages too.
>
> Here's a recent discussion about this exact idea (with a link to an
> earlier discussion on this list):
> https://discuss.python.org/t/proposal-sharing-distrbution-installations-in-general/2524
>
> It's totally possible, though it's always a balance of trade-offs. Some
> of the people on that post may be interested in developing a tool to
> automate parts of the process.
>
> Cheers,
> Steve
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/2EMFGUE6QDTWBLPWDPE2TTOZOX3OFAOA/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread Steve Dower

On 24Jun2020 1923, David Mathog wrote:

I think I will experiment a little with pipenv and if necessary after
each package install use a script to remove the installed libraries
and replace them with a hard link to the one in the common area.
Maybe it will be possible to put in those links before installing the
package of interest (like for scanpy, see first post), which will
hopefully keep it from having to rebuild all those packages too.


Here's a recent discussion about this exact idea (with a link to an 
earlier discussion on this list): 
https://discuss.python.org/t/proposal-sharing-distrbution-installations-in-general/2524


It's totally possible, though it's always a balance of trade-offs. Some 
of the people on that post may be interested in developing a tool to 
automate parts of the process.


Cheers,
Steve
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/IUGVEOU5U7VYJMEUC6E3VWBE6OFTLPHR/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread David Mathog
On Wed, Jun 24, 2020 at 1:36 AM Thomas Kluyver  wrote:
>
> On Tue, 23 Jun 2020, at 23:51, David Mathog wrote:
> > What I am after is some method of keeping exactly one copy of each
> > package-version in the common area (ie, one might find foo-1.2,
> > foo-1.7, and foo-2.3 there), while also presenting only the one
> > version of each (let's say foo-1.7) to a particular installed program.
> Conda environments work somewhat like this - all the packages are stored in a 
> central place, and the structure of selected ones is replicated using 
> hardlinks in a site-packages directory belonging to the environment. So if 
> your concern is not to waste disk space by storing copies of the same 
> packages, that might be an option.

I experimented with that one a little. It installs its own copies of
python and things like openssl and openblas which are already present
from the linux distribution.  Similarly, if some python script needs
"bwa" it will install its own even though that program is already
available.  Basically it is yet another "replicate everything we might
need whether or not it is already present" type of solution. (The
extreme end of that spectrum are systems like docker, which
effectively replaces the entire OS.)  So there might be only the one
version of each python package (not counting duplicates with the OS's
python3) but now there are also duplicate copies of system libraries
and utilities.

I think I will experiment a little with pipenv and if necessary after
each package install use a script to remove the installed libraries
and replace them with a hard link to the one in the common area.
Maybe it will be possible to put in those links before installing the
package of interest (like for scanpy, see first post), which will
hopefully keep it from having to rebuild all those packages too.

Thanks,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/QBIRYI767AVZ2FCFHVTP56XIKOX4TTYQ/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread Thomas Kluyver
On Tue, 23 Jun 2020, at 23:51, David Mathog wrote:
> What I am after is some method of keeping exactly one copy of each
> package-version in the common area (ie, one might find foo-1.2,
> foo-1.7, and foo-2.3 there), while also presenting only the one
> version of each (let's say foo-1.7) to a particular installed program.
> On linux it might do that by making soft links to the common
> PYTHONPATH area from another directory for which it sets PYTHONPATH
> for the application. Finally, this has to be usable by any account
> which has read execute access to the main directory.

Conda environments work somewhat like this - all the packages are stored in a 
central place, and the structure of selected ones is replicated using hardlinks 
in a site-packages directory belonging to the environment. So if your concern 
is not to waste disk space by storing copies of the same packages, that might 
be an option.

Thomas
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/AIMQGCP5TSCYAC2DOGZMUQ36L3MZ7K55/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread Paul Moore
On Wed, 24 Jun 2020 at 00:00, David Mathog  wrote:
> Does such a beast exist?  If so, please point me to it!

Basically no, or at least not to my knowledge. The mechanisms exist,
in the form of import hooks and similar, to build something like this,
but it's not proved to be a common enough requirement that there's a
well-known/standard library for it. I believe that setuptools
(pkg_resources) had a mechanism to do something along these lines, but
it never really became popular and I don't know if it's still
considered as supported by the setuptools maintainers.

So I think you're going to have to either accept the need for multiple
copies, or write something specific for your situation.

Sorry,
Paul
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/4P27HIKDPPEV5J4HQXW2D6N4VBCJJUUF/