[Distutils] Re: pip and missing shared system system library

2020-08-09 Thread David Mathog
On Sun, Aug 9, 2020 at 10:21 AM Ned Deily  wrote:
> Just to be clear, pkg-config is not part of any Posix standard, AFAIK, so you 
> cannot depend on it being available.

Understood.  However, if that is not employed what reasonable method
remains for implementing "Requires-External"?  The only thing I can
think of is to specify exact library or program names, like

Requires-External gcc
Requires-External libpng.so

and those could be found by searching the whole directory tree.  That
might even be efficient if updatedb/locate are available.  However
going that way, how would one determine version compatibility on a
library?  Doing it through the package manager may be possible, but it
is a multistep process:

1.  lookup libpng.so -> PATHPNG
2.  rpm -q --whatprovides $PATHPNG -> name of package
3.  analyze "name of package" for version information

Much easier one suspects to install pkg-config on systems which do not
yet have it than to completely reimplement it.

Does OS X have something which is equivalent to pkg-config, or is
there just no way to look up this sort of information on that OS?

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/BCYVPMEGXLU7YQJUCCQDV5BT7E22EH7M/


[Distutils] Re: pip and missing shared system system library

2020-08-09 Thread David Mathog
On Sat, Aug 8, 2020 at 8:15 PM Jonathan DEKHTIAR
 wrote:
>
> So do you plan on "managing" which version of GCC or g++ people have and 
> issue a warning if they don't have the good one?

A setup.py will always be written for a particular compiler, or maybe
it will handle a couple, but they never handle a "general compiler".
That was why the example in spec

Requires-External C

never made sense.  It always should have been something like

Requires-External gcc (>4.0)

There is no logic available at that level, as far as I can tell.  So
if a package needed gcc on Posix or an MS compiler on windows how
would one specify that?  For that matter, if it could use either gcc
or Intel's compiler on Posix how would that be indicated?  Maybe there
is some specification level logic which can be used to wrap these
statements?


>How are you even supposed to find out?

pkg-config, in any Posix environment.  Within a pure Windows
environment or on some obscure OS, I have no idea.  Just skip this
test if it is not supported in a given environment?   Better that it
works in some environments than in none.

>
> Don't get me wrong would be awesome if it worked. I just don't see a way to 
> handle all these contraints ...

I would be happy if it handled _any_ of these constraints.  At the
moment adding these lines does nothing.

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/MPHOUVI5WOEX7J5HWFBK5JQBQS46T3NX/


[Distutils] Re: pip and missing shared system system library

2020-08-07 Thread David Mathog
> Unfortunately, successfully building C libraries is way, way more
> complicated than that. There are nearly as many ways to detect and
> configure C libraries as there are C libraries; tools like pkg-config
> help a bit but they're far from universal.

Agreed that building a library is more complicated.  (Building a
library, or anything for that matter, which depends on boost is even
worse.)  Nevertheless, to do so the information provided by pkg-config
will always be required.  It might not be sufficient, of course.  As
for looking up this information, I am only aware of pkg-config and
pkgconf, and on many systems one is just a soft link to the other.
That is also what is used on Windows within Mingw.  So it would not be
unreasonable to specify that this is the source of the information in
all Posix environments.

> There can be multiple versions of libpng on the same system, with different 
> ABIs.

Requires-External supports version ranges and pkg-config will show the
version which is installed.  If Requires-External is to ever have a
real usage presumably it would have to be compatible with pkg-config
in Posix environments.  That is, how would it ever work otherwise?
Users who have placed pc files in odd locations would have to modify
PKG_CONFIG_PATH before running pip
or these would not be found.  They would also have to specify
"libname" or "libname2", as appropriate, in some cases.

> doesn't even know what compiler the package will want to use (which
> also affects which libraries are available).

I had wondered about that.  In the spec it has an example:

Requires-External C

which seems to be a requirement for a C compiler, but if it does not
specify which one, then the test could pass if it finds the Intel
compiler even though setup.py only knows how to build with gcc.  Or
vice versa.

> day, the only thing pip could do with this information is print a
> slightly nicer error message than you would get otherwise.

In the case that started this thread a simple "The igraph library is
required but not installed on this operating system" and then exit
would have saved a considerable amount of time.  So while it isn't
much, it is more than we have currently.

> What pip *has* done in the last few years is made it possible to pull
> in packages from PyPI when building packages from source, so you can
> make your own pkg-config-handling library and put it on PyPI and
> encourage everyone to use it instead of reinventing the wheel. Or use
> more powerful build systems that have already solved these problems,
> e.g. scikit-build lets you use CMake to build python packages.

I think that is what happened this time, but there was no test to see
if the package it built could be installed where it wanted to put it,
so it failed.  At least I think that is what happened.  In any case,
it did pull igraph down from PyPI but the installation failed.

One other point about "Requires-External" - as described, it lacks a
special case "none".   (None really means "just the python version
which is running pip".) That is, there is currently no way to
distinguish between "this package has no external requirements" and
"the external requirement specification is incomplete".  This
information really should be mandatory, even if it is just to tell a
person what must be installed in the OS before running pip.   One can
imagine a utility analogous to  "johnnydep" which would traverse a
proposed package install and verify that all the Requires-External
entries are in fact satisfied, or minimally, just list them.  Pip
should warn when no "Requires-External" entries are present, and
"Requires-External none" would always suppress that warning.

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/BMLRH2JXM2B5EQAJ5NA44LGWBRTX64HY/


[Distutils] Re: pip and missing shared system system library

2020-08-06 Thread David Mathog
On Thu, Aug 6, 2020 at 11:54 AM Nathaniel Smith  wrote:

> If the code that failed to give a good error message is in
> louvain-igraph, then you should probably talk to them about that :-).
> There's no way for the core packaging libraries to guess what this
> kind of arbitrary package-specific code is going to do.

That was the point I was trying to make, albeit not very well I guess.
Because Requires-External was not supplied, and pip would not have
done anything with it even if it had been, the package had to roll its
own.  The documentation for Requires-External says what it requires,
but it does not indicate that anything else happens besides (I assume)
the installation halting if the condition is not met.  That is, if
there is:

Requires-External: libpng

and pip acts on it that meant it found libpng.so, but there does not
seem to be any requirement that it communicate any further information
about libpng to setup.py in any standard way.  Which is why the
setup.py for louvain rolled its own.  For posixy OS's it would be
sufficient to know that if the "Requires-External" passed that
"pkg-config --cflags libpng" and the like will work.  But again, that
pushes the work into setup.py where it will not be standardized nor
platform agnostic.  So for better portability passing one of these
tests should also set some standard variables like:

   RE_libpng_cflags="-lpng16 -lz"
   RE_libpng_includedir="/usr/include"
   RE_libpng_libdir="/usr/lib64"
   (and so forth).

which are then seen in setup.py.  Yes, these are just the various
values already in the libpng.pc file, no reason to reinvent that
wheel.  The result should be simpler setup.py's which are portable
without requiring all the conditional "if it is this OS then look
here" that they must currently contain.

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/VLAYCVDATGUMOH2E7GJNIOKQ3LOZVFDN/


[Distutils] Re: pip and missing shared system system library

2020-08-06 Thread David Mathog
On Wed, Aug 5, 2020 at 5:05 PM Tzu-ping Chung  wrote:
>
> Exactly. Python actually specifies metadata around this (Requires-External), 
> but I don’t believe pip implements it at all since there’re almost no 
> sensible rules available on how the external libraries can be located in a 
> cross-platform way.

Locating the libraries would have to be platform specific, but pip
could easily know to try pkgconfig on linux and if that fails
run a tiny test which does nothing but attempt to link.  If any of
that fails then the package in question will likely fail too.

Neither louvain nor python-igraph contain a Requires-External in their
dist-info files.  Looking at the setup.py for louvain here:

  https://github.com/vtraag/louvain-igraph/blob/master/setup.py

around line 491 is the code for pkg-config and the "core" message .
It looks like it should exit when pkg-config failed, and that is not
what happened. That is 0.8.0, installed is 0.6.1.  Pulled the later
down with:

  pip3 download louvain==0.6.1

and unpacked it, and found starting at line 416

def detect_from_pkgconfig(self):
"""Detects the igraph include directory, library directory and the
list of libraries to link to using ``pkg-config``."""
if not buildcfg.has_pkgconfig:
print("Cannot find the C core of igraph on this system
using pkg-config.")
return False

So as observed, it would not immediately abort when it could not find
the installed library.  This shows the problem with leaving
Requires-External to each package's setup.py.  Doing that means the
warnings will differ from package to package, or possibly even version
to version of the same package.

> Conda is probably the best bet when you need to deal with tight 
> cross-language package integration like this, by punting the whole idea of 
> system libraries and installing a separate copy of everything you need.

I have been trying very hard NOT to have multiple copies of
everything, hence my prior work on python_devirtualizer, which allows
venv installs which are then unpacked the common pieces reduced to a
single copy, and the "programs" wrapped so that they will start and
run properly when they are found on PATH:

   https://sourceforge.net/projects/python-devirtualizer/

I suppose an equivalent set of scripts for "conda" would be possible,
but I think much more difficult since it does more.

Anyway, why is Requires-External (apparently) so little used?  Is this
a chicken/egg problem, where nobody specifies it because pip ignores
it, and pip ignores it because nobody uses it?

One can see how the Requires-External could be automatically
generated.  For instance, louvain has only one .so
which might be processed starting something like this:

ldd _c_louvain.cpython-36m-x86_64-linux-gnu.so  | grep -v
linux-vdso.so | grep -v ld-linux | grep -v libpython
libigraph.so.0 => /lib64/libigraph.so.0 (0x7f42bb622000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x7f42bad42000)
libm.so.6 => /lib64/libm.so.6 (0x7f42ba9c)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f42ba7a8000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7f42ba588000)
libc.so.6 => /lib64/libc.so.6 (0x7f42ba1c6000)
libxml2.so.2 => /lib64/libxml2.so.2 (0x7f42b9e5e000)
libz.so.1 => /lib64/libz.so.1 (0x7f42b9c47000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x7f42b9a2)
libdl.so.2 => /lib64/libdl.so.2 (0x7f42b981c000)
libgmp.so.10 => /lib64/libgmp.so.10 (0x7f42b9584000)
libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x7f42b90a1000)
libutil.so.1 => /lib64/libutil.so.1 (0x7f42b8e9d000)

which is processed to become:

  Requires-External: libigraph
  Requires-External: libstdc++
  (etc)
  Requires-External: libutil

For a more complicated package run the same method on all dynamic
binaries and libraries and reduce the result to one copy
of each.  Determining versions would be harder though, perhaps
impossible to do automatically.   igraph on my system is 0.8.2, so
that is sufficient, but there would be no way of knowing if 0.8.1
would also work, or if 0.9.0 would break things.

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/A7UND4KBY3NVCNU5WIZ6YNKIV46ILUPO/


[Distutils] pip and missing shared system system library

2020-08-05 Thread David Mathog
pip install package

often results in compiling (using gcc, g++, whatever) to produce a
binary. Usually that proceeds without issue.  However, there seems to
be no checking that the libraries required to link that binary are
already on the system.  Or at least the message which results when
they are not is not at all clear about what is missing.

I discovered that today by wasting several hours figuring out why
scanpy-scripts was failing trying to build dependency "louvain", which
would not install into a venv with pip.  It had something to do with
"igraph", but pip had downloaded python-igraph before it got to
louvain.  When louvain tried to build there was a mysterious message
about pkgconfig and igraph

Cannot find the C core of igraph on this system using pkg-config.

(Note that when python-igraph installs it places an igraph directory
in site-packages, so which it is referring to is fairly ambiguous.)
Then it tried to install a different version number of igraph, failed,
and the install failed.  This was very confusing because the second
igraph install was not (it turned out) a different version of
python-igraph but a system level igraph library, which it could not
install either because the process was not privileged and could not
write to the target directories.  Yet it tried to install anyway.
This is discussed in the louvain documentation here (it turns out):

https://github.com/vtraag/louvain-igraph

but since I was actually trying to install a different package, of
course I had not read the louvain documentation.

In short form the problem was "cannot build a binary because required
library libigraph.so is not present in the operating system" but that
was less than obvious in the barrage of warnings and error messages.

Is it possible to tell pip or setup.py to fail immediately when a
required system library like this is not found, here presumably after
that "C core" message, rather than confusing the matter further  with
a failed partial build and install of the same component?

More generally, is there anything in the python installation methods
which could list system libraries as dependencies and give a more
informative error message when they are missing?

Thanks,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/MSS42UYQ7FJWHID54FXSW5M5KCMK7ZQI/


[Distutils] Re: Fwd: Re: Use of "python" shebang an installation error?

2020-07-23 Thread David Mathog
On Wed, Jul 22, 2020 at 4:34 PM Tzu-ping Chung  wrote:
> If the shebang needs to care about compatibility, something is already going 
> very wrong.

We agree there, and it has.

That python3 was not completely backwards compatible with python2
meant that it broke a lot of code.  The EOL of python2 and the
apparent intent of the major distros to drop it means that
unmaintained python code will become unusable code.   Neither of these
outcomes is common for a major computer language.  For instance, old
K style C or F77 code from the 1990's will still compile with modern
compilers (albeit with a blizzard of warning messages and possibly 32
bit to 64 bit issues).  This matters quite a bit in scientific circles
because published computational work becomes unreproducible if the
tools break even when the input data is still available.

When these issues are encountered I notify the program's author,
assuming that there is still somebody maintaining the code.  The most
recent instance of this was "lastz"

   http://www.bx.psu.edu/~rsharris/lastz/

which in addition to the lastz program itself contains a bunch of
python scripts.  The shebang's used "python", they were Python2 code,
and so they didn't work.  The author in this case agreed that was a
problem and is currently working on upgrading those scripts.

I think the intent of the first quoted section was to say that if a
script used a feature in Python 3.N that was absent in 3.(N-1) and
below then 3.N should be used.  That is perfectly reasonable.  What
isn't reasonable is the assumption that using just "python" is not a
problem in a language which demonstrably does not maintain backwards
compatibility between major versions (see above).

Perhaps this circle could be squared if python had a "-r" (single
letter for standard) command line parameter, then this:

#!/usr/bin/env python -r N.M

could conceivably be handled gracefully by the single "python", even
if only to throw an error and state that version "N.M" is not
supported.  That would be far better than responding to version
incompatibility with a slew of syntax errors, which is what happens
now.   It would handle both "2.7 is too old" and "3.9 required but
this is a 3.8 installation".

Regards,

David Mathog





> TP
>
>
> >
> > Regards,
> >
> > David Mathog
> > --
> > Distutils-SIG mailing list -- distutils-sig@python.org
> > To unsubscribe send an email to distutils-sig-le...@python.org
> > https://mail.python.org/mailman3/lists/distutils-sig.python.org/
> > Message archived at 
> > https://mail.python.org/archives/list/distutils-sig@python.org/message/HAZUEGH7D7Y3PDMSYVNXHLYT6YMQLYUW/
>
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/YZB5UZ4UUBQZMAFQEYENA3BY4JNASZND/


[Distutils] Re: Fwd: Re: Use of "python" shebang an installation error?

2020-07-22 Thread David Mathog
On Wed, Jul 22, 2020 at 1:27 PM Paul Moore  wrote:
>
> On Wed, 22 Jul 2020 at 19:31, David Mathog  wrote:
> > but that shebang has to be corrected when the installation is moved to a 
> > normal
> > environment, which my code is doing now.)
>
> Moving files that are installed by Python packaging tools isn't
> supported. It might work, and you can probably make it work with some
> effort, but it's very much a case of "don't do it unless you know what
> you're doing". Correcting shebang lines is definitely something you
> will need to do.

I understand that moving files is iffy.  However, given that I want
only 1 copy of each installed python package on the system and I need
to be able to install different versions of the same package (to
resolve module version number conflicts between packages), moving the
files around and replacing most copies with links to the single copy
seemed like the only way to go.

Here:

https://www.python.org/dev/peps/pep-0394/#recommendation

It says:

When packaging third party Python scripts, distributors are encouraged
to change less specific shebangs to more specific ones. This ensures
software is used with the latest version of Python available, and it
can remove a dependency on Python 2. The details on what specifics to
set are left to the distributors; though. Example specifics could
include:

Changing python shebangs to python3 when Python 3.x is supported.
Changing python shebangs to python2 when Python 3.x is not yet supported.
Changing python3 shebangs to python3.8 if the software is built with Python 3.8.

and then immediately after it says:

When a virtual environment (created by the PEP 405 venv package or a
similar tool such as virtualenv or conda) is active, the python
command should refer to the virtual environment's interpreter and
should always be available. The python3 or python2 command (according
to the environment's interpreter version) should also be available.

Which seems to be exactly the opposite of the preceding stanza.  Ie,

  "always be as specific as possible"

then

  "be general, and also provide specific"

Personally I think the generic use of "python" both in shebangs and
when invoking scripts as "python script" should be deprecated, with
warnings from the installers to force developers to strip it out.  It
only works now by chance.  Sure, there is a high probability it will
work, but if one is on the wrong system it fails.  If python4
(whenever it arrives) is not fully backwards compatible with python3
the generic use of "python" is going to cause untold grief.  Whereas
in that scenario all the code which uses "python3" should continue to
function normally.

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/HAZUEGH7D7Y3PDMSYVNXHLYT6YMQLYUW/


[Distutils] Re: Fwd: Re: Use of "python" shebang an installation error?

2020-07-22 Thread David Mathog
On Wed, Jul 22, 2020 at 3:41 AM Thomas Kluyver  wrote:
>
> On Tue, 21 Jul 2020, at 21:50, David Mathog wrote:
> > ./lib/python3.6/site-packages/pip/_vendor/appdirs.py:#!/usr/bin/env python
>
> Python packaging tools like pip generally differentiate between *scripts*, 
> which are installed to be run from the command line, and *modules*, which are 
> imported from other Python code. Files under site-packages are modules. Any 
> special handling for shebangs, execute bits, or Windows .exe wrappers is 
> usually done only for scripts.
>
> It's not unusual to see a shebang in modules - I think some editors put it in 
> whenever you create a new Python file. But it doesn't usually do anything. If 
> you want to run a module directly, the normal way now is with "python -m", 
> which doesn't use the shebang.

So in summary:

1.  Invalid shebangs for modules in site-packages "should" be harmless - ignore
them and hope for the best.

2.  Shebangs for scripts "should" be correct.  (They are while still
inside a venv,
but that shebang has to be corrected when the installation is moved to a normal
environment, which my code is doing now.)

Scripts usually end up in a "bin" directory on linux.  Is that part of
the installation standard or could a package put them in an arbitrary
path (other than under "site-packages") under the venv's root, for
instance in a directory named "scripts"?   Fixing the shebangs by
processing only "bin" is easy, traversing the entire tree is a bit
messier.  It would be good not to have to do so if that will never
find an invalid shebang.

Thanks,

David Mathog


Thanks,
>
> Thomas
> --
> Distutils-SIG mailing list -- distutils-sig@python.org
> To unsubscribe send an email to distutils-sig-le...@python.org
> https://mail.python.org/mailman3/lists/distutils-sig.python.org/
> Message archived at 
> https://mail.python.org/archives/list/distutils-sig@python.org/message/HPTRB3S55WNAOUEHLQTWD2QDG5BL3HM6/
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/UL3LLIEDEG34ZNAUAXWI5KJHKU2NS3O6/


[Distutils] Fwd: Re: Use of "python" shebang an installation error?

2020-07-21 Thread David Mathog
(oops, had to resend, forgot to change the destination to
)

On Mon, Jul 20, 2020 at 12:38 PM John Thorvald Wodder II
 wrote:
>
> On 2020 Jul 20, at 15:25, David Mathog  wrote:
> > Lately I have been working on a CentOS 8 machine, and it has "python2"
> > and "python3", but no "python".  Many packages install scripts with a
> > shebang like:
> >
> >   #!/usr/bin/env python
> >
> > and those do not work on this OS.  Seems like rather a large missing
> > dependency which goes by without triggering a fatal error.
>
> How exactly are these packages getting installed?  Last time I checked, both 
> pip and setuptools automatically set the shebang in scripts (both 
> console_script entry points and scripts declared with the "scripts" argument 
> to `setup()`) to use the path of the running Python interpreter.  Are these 
> packages installed using your system package manager?  If so, you should take 
> the problem up with its maintainers.

Good point, I have been installing so many packages I get confused
about which installer was used for which package.   It turned out that
many (but not all) of the files which contained

   #!/usr/bin/env python

shebangs were installed using standard OS level tools (cmake,
configure, make and the like).  Example package, hisat2. I guess there
isn't much choice for those but to scan the directories for python
scripts and fix the shebangs.

Installs that are initially into venvs and used pip3 are still an
issue.  Example:

python3  -m venv johnnydep
cd johnnydep
grep -r '/usr/bin/env python$' .
#finds:
./lib/python3.6/site-packages/pip/_vendor/appdirs.py:#!/usr/bin/env python
./lib/python3.6/site-packages/pip/_vendor/chardet/cli/chardetect.py:#!/usr/bin/env
python
./lib/python3.6/site-packages/pip/_vendor/requests/certs.py:#!/usr/bin/env
python
./lib/python3.6/site-packages/pkg_resources/_vendor/appdirs.py:#!/usr/bin/env
python
./lib/python3.6/site-packages/johnnydep/pipper.py:#!/usr/bin/env python
cd bin
ls -1 | grep python
lrwxrwxrwx. 1 modules modules7 Jul 20 14:09 python -> python3
lrwxrwxrwx. 1 modules modules   16 Jul 20 14:09 python3 -> /usr/bin/python3
source activate
pip3 install johnnydep
head -1 johnnydep
#!/home/common/lib/python3.6/Envs/johnnydep/bin/python
#same for "tabulate" and all other shebangs in bin.
cd ..
grep -r '/usr/bin/env python$' .
#same as before
grep -r '/home/common/lib/python3.6/Envs/johnnydep/bin/python' .
#just the files in the bin directory.

It looks like none of the "#!/usr/bin/env python" shebangs within the
venv are going to be used after the install, so perhaps those are
harmless.

The shebangs like

   #!/home/common/lib/python3.6/Envs/johnnydep/bin/python

are OK within the venv, but once they are "devirtualized" they become
a problem.  That was a known problem though - my devirtualizer code
already patches all of the ones in the bin directory.  I have not seen
any elsewhere (yet) within the venv, but there is probably no rule
that keeps them from appearing in "share" or elsewhere.

The "python" in use in the venv is just a symbolic link to "python3"
which is itself a symbolic link to the actual program
"/usr/bin/python3". It is constructed that way based on "python -m
venv" which uses pieces which come from the CentOS 8
python3-libs-3.6.8-23.el8.x86_64 RPM.  Is there some requirement that
a venv have a "python"?  Odd that RedHat (and so CentOS) provide a
"python" there, but not in the OS itself.

Thanks,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/XI7PKGVQFW63ZFMWMLR554VPAGWDHWZ6/


[Distutils] Fwd: Re: Use of "python" shebang an installation error?

2020-07-21 Thread David Mathog
On Mon, Jul 20, 2020 at 4:40 PM John Thorvald Wodder II
 wrote:
>
> First of all, your last two messages only went to me, not to the list.  The 
> mailing list doesn't set Reply-To on messages or the like, so you have to 
> manually set "To: distutils-sig@python.org" when replying.

Aargh, right.  I use gmail for my home mail, and since I'm stuck
working at home, that is what I used here.  Gmail likes to hide, well,
pretty much everything.  I will repost those responses.

>
> As to your e-mail, though, are any of those files even meant to be executed?  
> They're not in bin/; they just appear to be regular source files that some 
> developer slapped a shebang on.

That in a sense is the issue.  I don't know, you don't know, maybe the
developer knows (if he/she still remembers).  I really don't want to
do the work to dig through the code for every package I install to
determine if a shebang is used or not.  Yet if I don't figure this out
some end user will run a script (one of a hundred in some package I
installed for their use) which will blow up because of this issue.

The best I can do now is run

   pdvctrl reshebang $TARGET_DIR

or

   pdvctrl reshebang $ROOT_DIR...

and fix them up after the fact.  (pdvctrl from python_devirtualizer here:
https://sourceforge.net/projects/python-devirtualizer/).  Even then it
usually has to guess that "python" means "python3" and not "python2",
and sometimes it guesses wrong.  Today's version of that recurring
issue:

  https://github.com/lastz/lastz/issues/30

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/LRLC63NI26ZGSCPIOYPA4MCKSM6JHXKS/


[Distutils] Fwd: Re: Use of "python" shebang an installation error?

2020-07-21 Thread David Mathog
(oops, had to resend, forgot to change the destination to
)

biopython-1.77, for instance, when installed into a virtualenv with
pip3, has many of these shebangs:

   #!/usr/bin/env python

And they are all over the place.  They are:

./site-packages/Bio/bgzf.py:
./site-packages/Bio/PDB/parse_pdb_header.py:
./site-packages/Bio/PDB/PDBList.py:
./site-packages/Bio/Restriction/__init__.py:
./site-packages/Bio/Restriction/Restriction.py:
./site-packages/Bio/Restriction/PrintFormat.py:
./site-packages/Bio/Restriction/Restriction_Dictionary.py:
./site-packages/Bio/Wise/__init__.py:
./site-packages/Bio/Wise/psw.py:
./site-packages/Bio/Wise/dnal.py:
./site-packages/Bio/UniProt/GOA.py:
./site-packages/Bio/SeqUtils/__init__.py:

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/EPFOIYHVK62GYUWUVFMFGYFHZZDOHVRW/


[Distutils] Use of "python" shebang an installation error?

2020-07-20 Thread David Mathog
Lately I have been working on a CentOS 8 machine, and it has "python2"
and "python3", but no "python".  Many packages install scripts with a
shebang like:

   #!/usr/bin/env python

and those do not work on this OS.  Seems like rather a large missing
dependency which goes by without triggering a fatal error.

In bioinformatics pipelines it is common for one package to invoke a
script from another.  So while the package which supplied a particular
script might have avoided this issue by only invoking it with:

   python3 path/script

that does not prevent another package from doing one of these:

   A  path/script
   B  python path/script

In terms of analysis, it is trivial to find all python scripts
installed by a package and examine the shebang line (if present) to
see if this is an issue.  I am adding a "reshebang" function to my
python_devirtualizer specifically to handle the issue for scripts
which are invoked directly.  It is, however, not at all trivial to
analyze all a package's code to see which scripts are called by other
scripts, and how they are called.  Moreover, they might be called from
perl, or C, or some other language.  So dealing with "B" above is not
trivial.

So, my question is, should the use of "python" (as opposed to
"python2" or "python3") in a shebang be considered an installation
error on a system for which "python" does not exist?

I would argue yes, because we already know that python3 was not fully
backwards compatible with python2, so we have reason to suspect that
python4 (whenever that appears) might also not be fully backwards
compatible with python3.  By being picky about the python version now,
that should prevent a lot of problems later.

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/CBKBFRY3CIRSHTY7LF2D5NF5XPVQPERV/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-07-07 Thread David Mathog
Hi all.

"Python devirtualizer" is a preliminary implementation which manages
shared packages so that only one copy of each package version is
required.  It installs into a virtualenv, then migrates the contents
out into the normal OS environment, and while so doing, replaces what
would be duplicate files with soft links to a single copy.  It is
downloadable from here:

https://sourceforge.net/projects/python-devirtualizer/

It is linux (or other POSIX like system, __maybe__ Mac)  specific.  No
way it will run on Windows at this point because the main script is
bash and the paths assume POSIX path syntax.  (Might work in Mingw64
though.)

Anyway,

pdvctrl install packageA
pdvctrl migrate packageA /wherever/packageA
pdvctrl install packageB
pdvctrl migrate packageB /wherever/packageB

will result in a single copy of the shared dependencies on this
system, with both packageA and packageB hooked to them with soft
links.  The import does not go awry because from within each package's
site-packages directory there are only links to the files it needs, so
it never sees any conflicting package versions.

There is also:

pdvctrl preinstall packageC
pdvctrl install packageC
pdvctrl migrate packageC /wherever/packageC

which first uses johnnydep to look up dependencies already on the
system and links those in directly before going on to install any
pieces not so installed.  Unfortunately the johnnydep runs with
"preinstall" have so far been significantly slower than just doing a
normal install and letting the migrate throw out the extra copy.  On
the other hand, the one package I have encountered which has
conflicting requirements (scanpy-scripts) fails in a more
comprehensible manner with "preinstall" than with "install".

Migrate "wraps" the files in the package's "bin" directory, if any, so
that they may be invoked solely by PATH like a regular program.  This
uses libSDL2 to get the absolute path of the wrapper program, and it
defines PYTHONPATH before execve() to the actual target.  So no
messing about with PYTHONPATH in the user's shell or in scripts.  So
far I have not run into a problem with the wrappers, which essentially
just inject a PYTHONPATH into the environment when the program is run.
Well, one package (busco) had a file with no terminal EOL, which
resulted in its last line being dropped while it was being wrapped,
but that case is now handled.  I do expect though at some point to
encounter a package which has several files in its bin, and
first_program will contain some variant of:

python3 /wherever/bin/second_program

The wrapper will break those, since the wrapper is a regular binary
and not a python script.

Regards,

David Mathog


On Mon, Jun 29, 2020 at 1:43 PM John Thorvald Wodder II
 wrote:
>
> On 2020 Jun 29, at 16:09, David Mathog  wrote:
> >
> > In neither case does the egg-info file reference the corresponding
> > directory, but at least the directory in both has the expected package
> > name (other than case).  In the examples you cited at the top, were
> > any of those "different name" cases from packages with a "file"
> > egg-info?
>
> The projects I examined were all in wheel form and thus had *.dist-info 
> directories instead of *.egg-info.  I know very little about how eggs work, 
> other than that they're deprecated and should be avoided in favor of wheels.
>
> -- John Wodder
> --
> Distutils-SIG mailing list -- distutils-sig@python.org
> To unsubscribe send an email to distutils-sig-le...@python.org
> https://mail.python.org/mailman3/lists/distutils-sig.python.org/
> Message archived at 
> https://mail.python.org/archives/list/distutils-sig@python.org/message/DMRPHSWPXPEWJOHFZVBKTJMH34KABHTM/
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/WIWLAD3537K7DYNUBZVIMPE7SFEV6E5L/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-29 Thread David Mathog
On Fri, Jun 26, 2020 at 2:51 PM John Thorvald Wodder II
 wrote:

> Of the 32,517 non-matching projects, 7,117 were Odoo projects with project 
> names of the form "odoo{version}_addon_{foo}" containing namespace modules of 
> the form "odoo/addons/{foo}", and 3,175 were Django projects with project 
> names of the form "django_{foo}" containing packages named just "{foo}".  No 
> other major patterns seem to stand out.

In CentOS 8 the RPM

   python3-rhnlib-2.8.6-8.module_el8.1.0+211+ad6c0bc7.noarch

has loaded into the directory

  /usr/lib/python3.6/site-packages

two entries

rhn# a directory
rhnlib-2.8.6-py3.6.egg-info #a file

The latter contains just this text:

Metadata-Version: 1.0
Name: rhnlib
Version: 2.8.6
Summary: Python libraries for the Spacewalk project
Home-page: http://rhn.redhat.com
Author: Mihai Ibanescu
Author-email: m...@redhat.com
License: GPL
Description: rhnlib is a collection of python modules used by the
Spacewalk (http://spacewalk.redhat.com) software.
Platform: UNKNOWN

Nor is there a link in the other direction:

grep -iR rhnlib /usr/lib/python3.6/site-packages/rhn
#nothing

So while "rhn" bears a similarity to "rhnlib" it is neither the
package name nor is it listed in the egg-info.

This was of course installed by dnf (AKA yum) and not by egg.

Is it possible for any python installer (as opposed to dnf, which runs
outside of it) to install an unreferenced directory like this?
Presumably not with a dist-info, but with an egg-info that does not in
any way reference the active part of the installation?  In a small
collection (172 packages) here these were the only two "file" egg-info
entries found, with their associated directories:

busco
BUSCO-4.0.6-py3.6.egg-info
ngs
ngs-1.0-py3.6.egg-info

In neither case does the egg-info file reference the corresponding
directory, but at least the directory in both has the expected package
name (other than case).  In the examples you cited at the top, were
any of those "different name" cases from packages with a "file"
egg-info?

Thanks,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/DRBMPYMLRGGF2WED7AKWSQS7B7EARIVB/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-26 Thread David Mathog
Thanks for that feedback.  Looks like RECORD is the one to use.

The names of the directories ending in dist-info seem to be uniformly:

package-version.dist_info

but the directory names associated with eggs come in a lot of flavors:

anndata-0.6.19-py3.6.egg
cutadapt-2.10.dev20+g93fb340-py3.6-linux-x86_64.egg
scanpy-1.5.2.dev7+ge33a2f33-py3.6.egg
h5py-2.9.0-py3.6-linux-x86_64.egg
simplejson-3.17.0-py3.6.egg-info

johnnydep does not give any hints that this is coming:

johnnydep --output-format pinned h5py
#relevant part:  h5py==2.10.0

What would be some small examples for other package managers, I would
like to see what they have as equivalents to dist-info and egg-info so
that the script does not choke on it.

Some progress with the test script.  It can now convert a virtualenv
to a regular directory
and migrate the site-packages contents to a shared area.  A second
migration of a copy of the same virtualenv to a different regular
directory correctly makes links to the first set.
(That is, two normal directories both linked to one common set of
packages.)  And the test program (johnnydep) runs in both with
PYTHONPATH set correctly.  But preinstalling, that is setting links to
the common directory before doing a normal install is tricky because
of the name inconsistencies.  To do that it must run johnnydep to get
the necessary information, and that is not very fast.  A normal
install of johnnydep itself, complete with downloads, takes less time
than that programs own analysis!

time johnnydep johnnydep
#21s

vs.

rm -rf ~/.cache/pip #force actual downloads
#too fast to measure
time python3 -m venv johnnydep
#2.3s
source johnnydep/bin/activate
#too fast to measure
time python -m pip install -U pip #update 9.0.3 to 20.1.1
#3.4s
time pip3 install johnnydep
#7.8s

Probably a package with a huge amount of compilation would be a win
for a preinstall, but
it is at this point definitely not an "always faster" option.

Thanks,

David Mathog

On Fri, Jun 26, 2020 at 2:51 PM John Thorvald Wodder II
 wrote:
>
> On 2020 Jun 26, at 15:50, David Mathog  wrote:
>
> > Still, how common is that?  Can anybody offer an estimate about what
> > fraction of packages use different names like that?
>
> Scanning through the wheelodex.org database (specifically, a dump from 
> earlier this week) finds 32,517 projects where the wheel DOES NOT contain a 
> top-level module of the same name as the project (after correcting for 
> differences in case and hyphen vs. underscore vs. period) and 74,073 projects 
> where the wheel DOES contain a module of the same name.  (5,417 projects 
> containing no modules were excluded.)  Note that a project named "foo-bar" 
> containing a namespace package "foo/bar" is counted in the former group.
>
> Of the 32,517 non-matching projects, 7,117 were Odoo projects with project 
> names of the form "odoo{version}_addon_{foo}" containing namespace modules of 
> the form "odoo/addons/{foo}", and 3,175 were Django projects with project 
> names of the form "django_{foo}" containing packages named just "{foo}".  No 
> other major patterns seem to stand out.
>
> -- John Wodder
> --
> Distutils-SIG mailing list -- distutils-sig@python.org
> To unsubscribe send an email to distutils-sig-le...@python.org
> https://mail.python.org/mailman3/lists/distutils-sig.python.org/
> Message archived at 
> https://mail.python.org/archives/list/distutils-sig@python.org/message/V445KCPLKMEVSSEAKX776DMNSPL76JRR/
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/TT6WZTTBMEWHTZD56HXH42JFKEI5VECK/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-26 Thread David Mathog
On Fri, Jun 26, 2020 at 12:43 PM David Mathog  wrote:
> So by what method could code working outside of python possibly determine that
> "yaml" goes with "PyYAML"?

Sorry, I forgot that the information was in
PyYAML-5.3.1-py3.6.egg-info/top_level.txt

Still, how common is that?  Can anybody offer an estimate about what
fraction of packages use different names like that?

Thanks,

David Mathog



Is this a common situation?
>
> Is pkg_resources actually a package?  Does it make sense for a common
> package repository to have a single instance of this directory or
> should each installed python based program retain its own version of
> this?
>
> There are some other files that live in site-packages which are not
> actually packages. The list so far is:
>
> __pycache__
>
> #some dynamic libraries, like
> kiwisolver.cpython-36m-x86_64-linux-gnu.so
>
> #some pth files, but always so far with an explicit version number, like
> sphinxcontrib_applehelp-1.0.2-py3.8-nspkg.pth
> #or associated with a package with a version number like:
> setuptools
> setuptools-46.1.3.dist-info
> setuptools.pth
>
> #some py files, apparently when that package does not make a corresponding
> #directory like:
> zipp-3.1.0.dist-info
> zipp.py
>
> #initialization file "site" as
> site.py
> site.pyc
>
> Any others to look out for?  That is, files which might be installed
> in site-packages but which should not be shared.
>
> Hopefully this next is an appropriate question for this list, since
> the issue arises from how python loads packages.  Is there any way to
> avoid collisions between python based programs other than activating
> and deactivating their virtualenvs, or redefining PYTHONPATH, before
> each is used?  Programs that have the property that their library
> loading is determinate (usually the case with C, fortran, bash
> scripts, etc.)one can construct a bash script (for instance) which
> runs 3 programs in order like so:
>
> prog1
> prog2
> prog3  # spawns subprocesses which run prog2 and prog1
>
> and there are not generally any issues.  (Yes, one can create a mess
> with LD_PRELOAD and the like.)  But if those 3 are python programs
> unless prog1, prog2, prog3 are all built into the same virtualenv,
> which usually means they come from the same software distribution, I
> don't see how to avoid conflicts for the first two cases without
> activating/deactivating each one, which looks like it might be tricky
> in the 3rd case.
>
> If one has a directory like:
>
> TOP/bin/prog
> TOP/lib/python3.6/site-packages
>
> Other than using PYTHONPATH to direct to it with an absolute path, is
> there any way to force prog to only import from that specific
> site-packages?  Let me try that again.  Is there a way to tell prog
> via any environmental variable to look in
> "../lib/python3.6/site-packages" (and nowhere else) for imports, with
> the reference directory being that where prog is installed, not where
> the process PWD might happen to be.  Because if that was possible it
> might allow a sort of "set it and forget it" method like
>
> export PYTHONRELPATHFROMPROG="../lib/python3.6/site-packages
> prog1  #uses prog1 site-package
> prog2  #uses prog2 site-package
> prog3  #uses prog3 site-package
> #  prog1 subprocess  #uses prog1 site-package
> #  prog2 subprocess  #uses prog2 site-package
>
> (None of which would be necessary if python programs could import
> specific versions reliably from a common directory containing multiple
> versions of each package.)
>
> Thanks,
>
> David Mathog
>
>
> On Thu, Jun 25, 2020 at 10:46 AM David Mathog  wrote:
> >
> > On Thu, Jun 25, 2020 at 12:37 AM Paul Moore  wrote:
> >
> > > I think the key message here is that you won't be *re*-inventing the
> > > wheel. This is a wheel that still needs to be invented.
> >
> > It _was_ invented, but it is off round and gives a rough ride.  As
> > noted in the first post this:
> >
> > __requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy
> > <3.0.0,>=2.00', 'h5py <2.10']
> > import pkg_resources
> >
> > was able to load the desired set of package-versions for scanpy, but
> > setting a version number constraint on scanpy itself at the end of
> > that list, one which matched the version that the preceding commands
> > successfully loaded, broke it.  So it is not reliable.
> >
> > And the entire __requires__ kludge is only present because for reasons
> > beyond my pay grade this:
> >
> > import pkg_resources
> > pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.

[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-26 Thread David Mathog
Questions about naming conventions.

The vast majority of packages when they install create in
site-packages two directories with names like:

foobar
foobar-1.2.3.dist-info  (or egg-info)

However PyYAML creates:

yaml
PyYAML-5.3.1-py3.6.egg-info

and there is also this:

pkg_resources

which is not associated with a versioned package.

In python3

>>> import yaml
>>> import pkg_resources
>>> print(yaml.__version__)
5.3.1
>>> print(pkg_resources.__version__)
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: module 'pkg_resources' has no attribute '__version__'

So by what method could code working outside of python possibly determine that
"yaml" goes with "PyYAML"?   Is this a common situation?

Is pkg_resources actually a package?  Does it make sense for a common
package repository to have a single instance of this directory or
should each installed python based program retain its own version of
this?

There are some other files that live in site-packages which are not
actually packages. The list so far is:

__pycache__

#some dynamic libraries, like
kiwisolver.cpython-36m-x86_64-linux-gnu.so

#some pth files, but always so far with an explicit version number, like
sphinxcontrib_applehelp-1.0.2-py3.8-nspkg.pth
#or associated with a package with a version number like:
setuptools
setuptools-46.1.3.dist-info
setuptools.pth

#some py files, apparently when that package does not make a corresponding
#directory like:
zipp-3.1.0.dist-info
zipp.py

#initialization file "site" as
site.py
site.pyc

Any others to look out for?  That is, files which might be installed
in site-packages but which should not be shared.

Hopefully this next is an appropriate question for this list, since
the issue arises from how python loads packages.  Is there any way to
avoid collisions between python based programs other than activating
and deactivating their virtualenvs, or redefining PYTHONPATH, before
each is used?  Programs that have the property that their library
loading is determinate (usually the case with C, fortran, bash
scripts, etc.)one can construct a bash script (for instance) which
runs 3 programs in order like so:

prog1
prog2
prog3  # spawns subprocesses which run prog2 and prog1

and there are not generally any issues.  (Yes, one can create a mess
with LD_PRELOAD and the like.)  But if those 3 are python programs
unless prog1, prog2, prog3 are all built into the same virtualenv,
which usually means they come from the same software distribution, I
don't see how to avoid conflicts for the first two cases without
activating/deactivating each one, which looks like it might be tricky
in the 3rd case.

If one has a directory like:

TOP/bin/prog
TOP/lib/python3.6/site-packages

Other than using PYTHONPATH to direct to it with an absolute path, is
there any way to force prog to only import from that specific
site-packages?  Let me try that again.  Is there a way to tell prog
via any environmental variable to look in
"../lib/python3.6/site-packages" (and nowhere else) for imports, with
the reference directory being that where prog is installed, not where
the process PWD might happen to be.  Because if that was possible it
might allow a sort of "set it and forget it" method like

export PYTHONRELPATHFROMPROG="../lib/python3.6/site-packages
prog1  #uses prog1 site-package
prog2  #uses prog2 site-package
prog3  #uses prog3 site-package
#  prog1 subprocess  #uses prog1 site-package
#  prog2 subprocess  #uses prog2 site-package

(None of which would be necessary if python programs could import
specific versions reliably from a common directory containing multiple
versions of each package.)

Thanks,

David Mathog


On Thu, Jun 25, 2020 at 10:46 AM David Mathog  wrote:
>
> On Thu, Jun 25, 2020 at 12:37 AM Paul Moore  wrote:
>
> > I think the key message here is that you won't be *re*-inventing the
> > wheel. This is a wheel that still needs to be invented.
>
> It _was_ invented, but it is off round and gives a rough ride.  As
> noted in the first post this:
>
> __requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy
> <3.0.0,>=2.00', 'h5py <2.10']
> import pkg_resources
>
> was able to load the desired set of package-versions for scanpy, but
> setting a version number constraint on scanpy itself at the end of
> that list, one which matched the version that the preceding commands
> successfully loaded, broke it.  So it is not reliable.
>
> And the entire __requires__ kludge is only present because for reasons
> beyond my pay grade this:
>
> import pkg_resources
> pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.6.20;etc.")
> import scipy
> import anndata
> #etc.
>
> cannot work because by default "import pkg_resources" keeps only the
> most recent version 

[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-25 Thread David Mathog
On Thu, Jun 25, 2020 at 12:37 AM Paul Moore  wrote:

> I think the key message here is that you won't be *re*-inventing the
> wheel. This is a wheel that still needs to be invented.

It _was_ invented, but it is off round and gives a rough ride.  As
noted in the first post this:

__requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy
<3.0.0,>=2.00', 'h5py <2.10']
import pkg_resources

was able to load the desired set of package-versions for scanpy, but
setting a version number constraint on scanpy itself at the end of
that list, one which matched the version that the preceding commands
successfully loaded, broke it.  So it is not reliable.

And the entire __requires__ kludge is only present because for reasons
beyond my pay grade this:

import pkg_resources
pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.6.20;etc.")
import scipy
import anndata
#etc.

cannot work because by default "import pkg_resources" keeps only the
most recent version rather than making up a tree (or list or hash or
whatever) and waiting to see if there are any version constraints to
be applied at the time of actual package import.

What I'm doing now is basically duct tape and bailing wire to work
around those deeper issues.  In terms of language design, a much
better fix would be to modify pkg_resources so that it will always
successfully load the required versions from a designated directory
which contains multiple versions of packages, and modify the package
maintenance tools so that they can maintain such a directory.

Regards,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/X23JIVPWU74HW3GBMVJEKAC2XUFROKAL/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread David Mathog
It turned out that the second install was not the cause of the
timestamp change in the original.  On reviewing "history" it turned
out that I had accidentally run the link generation twice.  That
turned up this (for me) unexpected behavior:

mkdir /tmp/foo
ls -al /tmp/foo
total 16
drwxrwxr-x.   2 modules modules 6 Jun 24 16:49 .
drwxrwxrwt. 173 rootroot12288 Jun 24 16:49 ..
ln -s /tmp/foo /tmp/bar
ls -al /tmp/foo
drwxrwxr-x.   2 modules modules 6 Jun 24 16:49 .
drwxrwxrwt. 173 rootroot12288 Jun 24 16:49 ..
ln -s /tmp/foo /tmp/bar
ls -al /tmp/foo
total 16
drwxrwxr-x.   2 modules modules17 Jun 24 16:51 .
drwxrwxrwt. 173 rootroot12288 Jun 24 16:50 ..
lrwxrwxrwx.   1 modules modules 8 Jun 24 16:51 foo -> /tmp/foo

The repeated soft link actually put a file under the target.  Strange.
Apparently it is expected behavior.  The problem can be avoided by
using this form:

 ln -sn $TARGET $LINK

The later installs are much faster than the first one, since putting
in the links is very fast and building the packages is not.  This was
the trivial case though, since having done one install all the
prerequisites were just "there".  The johnnydep package will list the
dependencies without doing the install.  Guess I will throw something
together based on that and the above results and see how it goes.

Regards,

David Mathog



On Wed, Jun 24, 2020 at 4:23 PM Filipe Laíns  wrote:
>
> On Tue, 2020-06-23 at 15:51 -0700, David Mathog wrote:
> > What I am after is some method of keeping exactly one copy of each
> > package-version in the common area (ie, one might find foo-1.2,
> > foo-1.7, and foo-2.3 there), while also presenting only the one
> > version of each (let's say foo-1.7) to a particular installed program.
> > On linux it might do that by making soft links to the common
> > PYTHONPATH area from another directory for which it sets PYTHONPATH
> > for the application. Finally, this has to be usable by any account
> > which has read execute access to the main directory.
> >
> > Does such a beast exist?  If so, please point me to it!
>
> I have been meaning to do something like this for a while now! But
> unfortunately I can't find the time.
>
> If you do choose of start implementing it, please let me know. I would
> be happy to help out.
>
> Cheers,
> Filipe Laíns
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/63NJKSY7BLPJZXLK5DJFWROGQUKJ7RVF/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread David Mathog
Thanks for the link.  Unfortunately there was not a reference to a
completed package that actually did this. As in, I really do not want
to reinvent the wheel.  Ugh, sorry, that's a pun in this context.

Here is a first shot at this, just installing a moderately complicated
package in a virtualenv and then reinstalling it in another
virtualenv.  Extract and execinput are my own programs (from drm_tools
on sourceforge) but it is obvious from the context what they are
doing. The links had to be soft because linux does not actually allow
a normal user (or maybe even root) to make a hard link to a directory.

cd /usr/common/lib/python3.6/Envs
rm -rf ~/.cache/pip #make download clearer
python3 -m venv scanpy
source scanpy/bin/activate
python -m pip install -U pip #update 9.0.3 to 20.1.1
which python3 #using the one in scanpy
pip3 install scanpy
scanpy -h #seems to start
deactivate
rm -rf ~/.cache/pip #make download clearer
python3 -m venv scanpy2
source scanpy2/bin/activate
python -m pip install -U pip #update 9.0.3 to 20.1.1
export DST=/usr/common/lib/python3.6/Envs/scanpy/lib/python3.6/site-packages
export SRC=/usr/common/lib/python3.6/Envs/scanpy2/lib/python3.6/site-packages
ls -1 $DST \
| grep -v __pycache__ \
| grep -v scanpy \
| grep -v easy_install.py \
| extract -fmt "ln -s $DST/[1,] $SRC/[1,]" \
| execinput
pip3 install scanpy
#downloaded scanpy, "Requirement already satisfied" for all the others
#Installing collected packages: scanpy
# Successfully installed scanpy-1.5.1
scanpy -h #seems to start
deactivate
source scanpy/bin/activate
scanpy -h #seems to start (still)
deactivate

So that method seems to have some promise.  It saved a considerable
amount of space too:

du -k scanpy | tail -1
457408  scanpy
du -k scanpy2 | tail -1
24900   scanpy2


However, two potential problems are evident on inspection.

The first is that when the 2nd scanpy installation was performed it
updated the dates on all the directories in $DST.  A workaround would
be to copy all of those directories into the virtualenv temporarily,
just for the installation, and then remove them and put the links in
afterwards.  That strikes me as awfully cludgy.  Setting them read
only would likely break the install.

The second issue is that each package install creates two directories like:

llvmlite
llvmlite-0.33.0.dist-info

where the latter contains top_level.txt which in turn contains one line:
  llvmlite
pointing to the first directory.

If another version must cohabit with it the "llvmlite" directories
will conflict.  For this sort of approach to work easily the llvmlite
directory should be named "llvmlite-0.33.0" and top_level.txt should
reference that too.  It would be possible (probably) to work around it
though by having llvmlite-0.33.0 only in the common area and use:

ln -s $COMMON/llvmlite-0.33.0 $VENVAREA/llvmlite

The top_level.txt in each could then reference the unversioned name.

Unknown if this soft link approach will work on Windows.

Regards,

David Mathog

On Wed, Jun 24, 2020 at 1:26 PM Steve Dower  wrote:
>
> On 24Jun2020 1923, David Mathog wrote:
> > I think I will experiment a little with pipenv and if necessary after
> > each package install use a script to remove the installed libraries
> > and replace them with a hard link to the one in the common area.
> > Maybe it will be possible to put in those links before installing the
> > package of interest (like for scanpy, see first post), which will
> > hopefully keep it from having to rebuild all those packages too.
>
> Here's a recent discussion about this exact idea (with a link to an
> earlier discussion on this list):
> https://discuss.python.org/t/proposal-sharing-distrbution-installations-in-general/2524
>
> It's totally possible, though it's always a balance of trade-offs. Some
> of the people on that post may be interested in developing a tool to
> automate parts of the process.
>
> Cheers,
> Steve
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/2EMFGUE6QDTWBLPWDPE2TTOZOX3OFAOA/


[Distutils] Re: package management - common storage while keeping the versions straight

2020-06-24 Thread David Mathog
On Wed, Jun 24, 2020 at 1:36 AM Thomas Kluyver  wrote:
>
> On Tue, 23 Jun 2020, at 23:51, David Mathog wrote:
> > What I am after is some method of keeping exactly one copy of each
> > package-version in the common area (ie, one might find foo-1.2,
> > foo-1.7, and foo-2.3 there), while also presenting only the one
> > version of each (let's say foo-1.7) to a particular installed program.
> Conda environments work somewhat like this - all the packages are stored in a 
> central place, and the structure of selected ones is replicated using 
> hardlinks in a site-packages directory belonging to the environment. So if 
> your concern is not to waste disk space by storing copies of the same 
> packages, that might be an option.

I experimented with that one a little. It installs its own copies of
python and things like openssl and openblas which are already present
from the linux distribution.  Similarly, if some python script needs
"bwa" it will install its own even though that program is already
available.  Basically it is yet another "replicate everything we might
need whether or not it is already present" type of solution. (The
extreme end of that spectrum are systems like docker, which
effectively replaces the entire OS.)  So there might be only the one
version of each python package (not counting duplicates with the OS's
python3) but now there are also duplicate copies of system libraries
and utilities.

I think I will experiment a little with pipenv and if necessary after
each package install use a script to remove the installed libraries
and replace them with a hard link to the one in the common area.
Maybe it will be possible to put in those links before installing the
package of interest (like for scanpy, see first post), which will
hopefully keep it from having to rebuild all those packages too.

Thanks,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/QBIRYI767AVZ2FCFHVTP56XIKOX4TTYQ/


[Distutils] package management - common storage while keeping the versions straight

2020-06-23 Thread David Mathog
Hi all,

First post here.

I have a cluster where the common software is NFS shared from the file
server to other nodes.  All the python packages are kept in a
directory which is referenced by PYTHONPATH. The good part of that is
that there is just one copy of each package-version.  The bad part, as
you have all no doubt guessed, is that python by itself is really bad
at specifying and loading a set of particular library versions (see
below), so upgrading one program will break another due to conflicting
installed versions.  Hence the common use of virtualenv's.  But as far
as I can tell each virtualenv installs a copy of each package-version
it needs, resulting in multiple copies of the same package-version for
common packages on the same disk.

What I am after is some method of keeping exactly one copy of each
package-version in the common area (ie, one might find foo-1.2,
foo-1.7, and foo-2.3 there), while also presenting only the one
version of each (let's say foo-1.7) to a particular installed program.
On linux it might do that by making soft links to the common
PYTHONPATH area from another directory for which it sets PYTHONPATH
for the application. Finally, this has to be usable by any account
which has read execute access to the main directory.

Does such a beast exist?  If so, please point me to it!

The limitations of python version handling to which I refer above can
be illustrated for "scanpy-scripts"'s dependencies.  Given all the
needed libraries in one place (plus incompatible versions) the right
set can be loaded (and verified) like this:

export PYTHONPATH=/path/to_common_area
python3
__requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy
<3.0.0,>=2.00', 'h5py <2.10']
import pkg_resources
import scipy
import anndata
import loompy
import h5py
import scanpy
print(scipy.__version__)
print(anndata.__version__)
print(loompy.__version__)
print(h5py.__version__)
print(scanpy.__version__)
quit()

which emits exactly the versions scanpy-scripts needs:

1.2.3
0.6.19
2.0.17
2.9.0
1.4.3

However, adding

  , 'scanpy <1.4.4,>=1.4.2'

at the end of __requires__ makes the whole thing fail at

import pkg_resources

with

(many lines deleted)
 792, in resolve
raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (scipy 1.2.3
(/home/common/lib/python3.6/site-packages/scipy-1.2.3-py3.6-linux-x86_64.egg),
Requirement.parse('scipy>=1.3.1'), {'umap-learn'})

even though the scanpy it loaded in the first case was within the
desired range.  Moreover, specifying the desired versions as
parameters to

import pkg_resources

does not work at all since pkg_resources only keeps the highest
version of each package it finds when imported.  (A limitation that
never made the least bit of sense to me.)

The test system is CentOS 8 with python 3.6.8.

Thanks,

David Mathog
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/C44E6LUGKGNKKXCEZJMOJUG3HMZKUYG2/