[Distutils] Re: pip and missing shared system system library
On Sun, Aug 9, 2020 at 10:21 AM Ned Deily wrote: > Just to be clear, pkg-config is not part of any Posix standard, AFAIK, so you > cannot depend on it being available. Understood. However, if that is not employed what reasonable method remains for implementing "Requires-External"? The only thing I can think of is to specify exact library or program names, like Requires-External gcc Requires-External libpng.so and those could be found by searching the whole directory tree. That might even be efficient if updatedb/locate are available. However going that way, how would one determine version compatibility on a library? Doing it through the package manager may be possible, but it is a multistep process: 1. lookup libpng.so -> PATHPNG 2. rpm -q --whatprovides $PATHPNG -> name of package 3. analyze "name of package" for version information Much easier one suspects to install pkg-config on systems which do not yet have it than to completely reimplement it. Does OS X have something which is equivalent to pkg-config, or is there just no way to look up this sort of information on that OS? Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/BCYVPMEGXLU7YQJUCCQDV5BT7E22EH7M/
[Distutils] Re: pip and missing shared system system library
On Sat, Aug 8, 2020 at 8:15 PM Jonathan DEKHTIAR wrote: > > So do you plan on "managing" which version of GCC or g++ people have and > issue a warning if they don't have the good one? A setup.py will always be written for a particular compiler, or maybe it will handle a couple, but they never handle a "general compiler". That was why the example in spec Requires-External C never made sense. It always should have been something like Requires-External gcc (>4.0) There is no logic available at that level, as far as I can tell. So if a package needed gcc on Posix or an MS compiler on windows how would one specify that? For that matter, if it could use either gcc or Intel's compiler on Posix how would that be indicated? Maybe there is some specification level logic which can be used to wrap these statements? >How are you even supposed to find out? pkg-config, in any Posix environment. Within a pure Windows environment or on some obscure OS, I have no idea. Just skip this test if it is not supported in a given environment? Better that it works in some environments than in none. > > Don't get me wrong would be awesome if it worked. I just don't see a way to > handle all these contraints ... I would be happy if it handled _any_ of these constraints. At the moment adding these lines does nothing. Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/MPHOUVI5WOEX7J5HWFBK5JQBQS46T3NX/
[Distutils] Re: pip and missing shared system system library
> Unfortunately, successfully building C libraries is way, way more > complicated than that. There are nearly as many ways to detect and > configure C libraries as there are C libraries; tools like pkg-config > help a bit but they're far from universal. Agreed that building a library is more complicated. (Building a library, or anything for that matter, which depends on boost is even worse.) Nevertheless, to do so the information provided by pkg-config will always be required. It might not be sufficient, of course. As for looking up this information, I am only aware of pkg-config and pkgconf, and on many systems one is just a soft link to the other. That is also what is used on Windows within Mingw. So it would not be unreasonable to specify that this is the source of the information in all Posix environments. > There can be multiple versions of libpng on the same system, with different > ABIs. Requires-External supports version ranges and pkg-config will show the version which is installed. If Requires-External is to ever have a real usage presumably it would have to be compatible with pkg-config in Posix environments. That is, how would it ever work otherwise? Users who have placed pc files in odd locations would have to modify PKG_CONFIG_PATH before running pip or these would not be found. They would also have to specify "libname" or "libname2", as appropriate, in some cases. > doesn't even know what compiler the package will want to use (which > also affects which libraries are available). I had wondered about that. In the spec it has an example: Requires-External C which seems to be a requirement for a C compiler, but if it does not specify which one, then the test could pass if it finds the Intel compiler even though setup.py only knows how to build with gcc. Or vice versa. > day, the only thing pip could do with this information is print a > slightly nicer error message than you would get otherwise. In the case that started this thread a simple "The igraph library is required but not installed on this operating system" and then exit would have saved a considerable amount of time. So while it isn't much, it is more than we have currently. > What pip *has* done in the last few years is made it possible to pull > in packages from PyPI when building packages from source, so you can > make your own pkg-config-handling library and put it on PyPI and > encourage everyone to use it instead of reinventing the wheel. Or use > more powerful build systems that have already solved these problems, > e.g. scikit-build lets you use CMake to build python packages. I think that is what happened this time, but there was no test to see if the package it built could be installed where it wanted to put it, so it failed. At least I think that is what happened. In any case, it did pull igraph down from PyPI but the installation failed. One other point about "Requires-External" - as described, it lacks a special case "none". (None really means "just the python version which is running pip".) That is, there is currently no way to distinguish between "this package has no external requirements" and "the external requirement specification is incomplete". This information really should be mandatory, even if it is just to tell a person what must be installed in the OS before running pip. One can imagine a utility analogous to "johnnydep" which would traverse a proposed package install and verify that all the Requires-External entries are in fact satisfied, or minimally, just list them. Pip should warn when no "Requires-External" entries are present, and "Requires-External none" would always suppress that warning. Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/BMLRH2JXM2B5EQAJ5NA44LGWBRTX64HY/
[Distutils] Re: pip and missing shared system system library
On Thu, Aug 6, 2020 at 11:54 AM Nathaniel Smith wrote: > If the code that failed to give a good error message is in > louvain-igraph, then you should probably talk to them about that :-). > There's no way for the core packaging libraries to guess what this > kind of arbitrary package-specific code is going to do. That was the point I was trying to make, albeit not very well I guess. Because Requires-External was not supplied, and pip would not have done anything with it even if it had been, the package had to roll its own. The documentation for Requires-External says what it requires, but it does not indicate that anything else happens besides (I assume) the installation halting if the condition is not met. That is, if there is: Requires-External: libpng and pip acts on it that meant it found libpng.so, but there does not seem to be any requirement that it communicate any further information about libpng to setup.py in any standard way. Which is why the setup.py for louvain rolled its own. For posixy OS's it would be sufficient to know that if the "Requires-External" passed that "pkg-config --cflags libpng" and the like will work. But again, that pushes the work into setup.py where it will not be standardized nor platform agnostic. So for better portability passing one of these tests should also set some standard variables like: RE_libpng_cflags="-lpng16 -lz" RE_libpng_includedir="/usr/include" RE_libpng_libdir="/usr/lib64" (and so forth). which are then seen in setup.py. Yes, these are just the various values already in the libpng.pc file, no reason to reinvent that wheel. The result should be simpler setup.py's which are portable without requiring all the conditional "if it is this OS then look here" that they must currently contain. Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/VLAYCVDATGUMOH2E7GJNIOKQ3LOZVFDN/
[Distutils] Re: pip and missing shared system system library
On Wed, Aug 5, 2020 at 5:05 PM Tzu-ping Chung wrote: > > Exactly. Python actually specifies metadata around this (Requires-External), > but I don’t believe pip implements it at all since there’re almost no > sensible rules available on how the external libraries can be located in a > cross-platform way. Locating the libraries would have to be platform specific, but pip could easily know to try pkgconfig on linux and if that fails run a tiny test which does nothing but attempt to link. If any of that fails then the package in question will likely fail too. Neither louvain nor python-igraph contain a Requires-External in their dist-info files. Looking at the setup.py for louvain here: https://github.com/vtraag/louvain-igraph/blob/master/setup.py around line 491 is the code for pkg-config and the "core" message . It looks like it should exit when pkg-config failed, and that is not what happened. That is 0.8.0, installed is 0.6.1. Pulled the later down with: pip3 download louvain==0.6.1 and unpacked it, and found starting at line 416 def detect_from_pkgconfig(self): """Detects the igraph include directory, library directory and the list of libraries to link to using ``pkg-config``.""" if not buildcfg.has_pkgconfig: print("Cannot find the C core of igraph on this system using pkg-config.") return False So as observed, it would not immediately abort when it could not find the installed library. This shows the problem with leaving Requires-External to each package's setup.py. Doing that means the warnings will differ from package to package, or possibly even version to version of the same package. > Conda is probably the best bet when you need to deal with tight > cross-language package integration like this, by punting the whole idea of > system libraries and installing a separate copy of everything you need. I have been trying very hard NOT to have multiple copies of everything, hence my prior work on python_devirtualizer, which allows venv installs which are then unpacked the common pieces reduced to a single copy, and the "programs" wrapped so that they will start and run properly when they are found on PATH: https://sourceforge.net/projects/python-devirtualizer/ I suppose an equivalent set of scripts for "conda" would be possible, but I think much more difficult since it does more. Anyway, why is Requires-External (apparently) so little used? Is this a chicken/egg problem, where nobody specifies it because pip ignores it, and pip ignores it because nobody uses it? One can see how the Requires-External could be automatically generated. For instance, louvain has only one .so which might be processed starting something like this: ldd _c_louvain.cpython-36m-x86_64-linux-gnu.so | grep -v linux-vdso.so | grep -v ld-linux | grep -v libpython libigraph.so.0 => /lib64/libigraph.so.0 (0x7f42bb622000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x7f42bad42000) libm.so.6 => /lib64/libm.so.6 (0x7f42ba9c) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f42ba7a8000) libpthread.so.0 => /lib64/libpthread.so.0 (0x7f42ba588000) libc.so.6 => /lib64/libc.so.6 (0x7f42ba1c6000) libxml2.so.2 => /lib64/libxml2.so.2 (0x7f42b9e5e000) libz.so.1 => /lib64/libz.so.1 (0x7f42b9c47000) liblzma.so.5 => /lib64/liblzma.so.5 (0x7f42b9a2) libdl.so.2 => /lib64/libdl.so.2 (0x7f42b981c000) libgmp.so.10 => /lib64/libgmp.so.10 (0x7f42b9584000) libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x7f42b90a1000) libutil.so.1 => /lib64/libutil.so.1 (0x7f42b8e9d000) which is processed to become: Requires-External: libigraph Requires-External: libstdc++ (etc) Requires-External: libutil For a more complicated package run the same method on all dynamic binaries and libraries and reduce the result to one copy of each. Determining versions would be harder though, perhaps impossible to do automatically. igraph on my system is 0.8.2, so that is sufficient, but there would be no way of knowing if 0.8.1 would also work, or if 0.9.0 would break things. Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/A7UND4KBY3NVCNU5WIZ6YNKIV46ILUPO/
[Distutils] pip and missing shared system system library
pip install package often results in compiling (using gcc, g++, whatever) to produce a binary. Usually that proceeds without issue. However, there seems to be no checking that the libraries required to link that binary are already on the system. Or at least the message which results when they are not is not at all clear about what is missing. I discovered that today by wasting several hours figuring out why scanpy-scripts was failing trying to build dependency "louvain", which would not install into a venv with pip. It had something to do with "igraph", but pip had downloaded python-igraph before it got to louvain. When louvain tried to build there was a mysterious message about pkgconfig and igraph Cannot find the C core of igraph on this system using pkg-config. (Note that when python-igraph installs it places an igraph directory in site-packages, so which it is referring to is fairly ambiguous.) Then it tried to install a different version number of igraph, failed, and the install failed. This was very confusing because the second igraph install was not (it turned out) a different version of python-igraph but a system level igraph library, which it could not install either because the process was not privileged and could not write to the target directories. Yet it tried to install anyway. This is discussed in the louvain documentation here (it turns out): https://github.com/vtraag/louvain-igraph but since I was actually trying to install a different package, of course I had not read the louvain documentation. In short form the problem was "cannot build a binary because required library libigraph.so is not present in the operating system" but that was less than obvious in the barrage of warnings and error messages. Is it possible to tell pip or setup.py to fail immediately when a required system library like this is not found, here presumably after that "C core" message, rather than confusing the matter further with a failed partial build and install of the same component? More generally, is there anything in the python installation methods which could list system libraries as dependencies and give a more informative error message when they are missing? Thanks, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/MSS42UYQ7FJWHID54FXSW5M5KCMK7ZQI/
[Distutils] Re: Fwd: Re: Use of "python" shebang an installation error?
On Wed, Jul 22, 2020 at 4:34 PM Tzu-ping Chung wrote: > If the shebang needs to care about compatibility, something is already going > very wrong. We agree there, and it has. That python3 was not completely backwards compatible with python2 meant that it broke a lot of code. The EOL of python2 and the apparent intent of the major distros to drop it means that unmaintained python code will become unusable code. Neither of these outcomes is common for a major computer language. For instance, old K style C or F77 code from the 1990's will still compile with modern compilers (albeit with a blizzard of warning messages and possibly 32 bit to 64 bit issues). This matters quite a bit in scientific circles because published computational work becomes unreproducible if the tools break even when the input data is still available. When these issues are encountered I notify the program's author, assuming that there is still somebody maintaining the code. The most recent instance of this was "lastz" http://www.bx.psu.edu/~rsharris/lastz/ which in addition to the lastz program itself contains a bunch of python scripts. The shebang's used "python", they were Python2 code, and so they didn't work. The author in this case agreed that was a problem and is currently working on upgrading those scripts. I think the intent of the first quoted section was to say that if a script used a feature in Python 3.N that was absent in 3.(N-1) and below then 3.N should be used. That is perfectly reasonable. What isn't reasonable is the assumption that using just "python" is not a problem in a language which demonstrably does not maintain backwards compatibility between major versions (see above). Perhaps this circle could be squared if python had a "-r" (single letter for standard) command line parameter, then this: #!/usr/bin/env python -r N.M could conceivably be handled gracefully by the single "python", even if only to throw an error and state that version "N.M" is not supported. That would be far better than responding to version incompatibility with a slew of syntax errors, which is what happens now. It would handle both "2.7 is too old" and "3.9 required but this is a 3.8 installation". Regards, David Mathog > TP > > > > > > Regards, > > > > David Mathog > > -- > > Distutils-SIG mailing list -- distutils-sig@python.org > > To unsubscribe send an email to distutils-sig-le...@python.org > > https://mail.python.org/mailman3/lists/distutils-sig.python.org/ > > Message archived at > > https://mail.python.org/archives/list/distutils-sig@python.org/message/HAZUEGH7D7Y3PDMSYVNXHLYT6YMQLYUW/ > -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/YZB5UZ4UUBQZMAFQEYENA3BY4JNASZND/
[Distutils] Re: Fwd: Re: Use of "python" shebang an installation error?
On Wed, Jul 22, 2020 at 1:27 PM Paul Moore wrote: > > On Wed, 22 Jul 2020 at 19:31, David Mathog wrote: > > but that shebang has to be corrected when the installation is moved to a > > normal > > environment, which my code is doing now.) > > Moving files that are installed by Python packaging tools isn't > supported. It might work, and you can probably make it work with some > effort, but it's very much a case of "don't do it unless you know what > you're doing". Correcting shebang lines is definitely something you > will need to do. I understand that moving files is iffy. However, given that I want only 1 copy of each installed python package on the system and I need to be able to install different versions of the same package (to resolve module version number conflicts between packages), moving the files around and replacing most copies with links to the single copy seemed like the only way to go. Here: https://www.python.org/dev/peps/pep-0394/#recommendation It says: When packaging third party Python scripts, distributors are encouraged to change less specific shebangs to more specific ones. This ensures software is used with the latest version of Python available, and it can remove a dependency on Python 2. The details on what specifics to set are left to the distributors; though. Example specifics could include: Changing python shebangs to python3 when Python 3.x is supported. Changing python shebangs to python2 when Python 3.x is not yet supported. Changing python3 shebangs to python3.8 if the software is built with Python 3.8. and then immediately after it says: When a virtual environment (created by the PEP 405 venv package or a similar tool such as virtualenv or conda) is active, the python command should refer to the virtual environment's interpreter and should always be available. The python3 or python2 command (according to the environment's interpreter version) should also be available. Which seems to be exactly the opposite of the preceding stanza. Ie, "always be as specific as possible" then "be general, and also provide specific" Personally I think the generic use of "python" both in shebangs and when invoking scripts as "python script" should be deprecated, with warnings from the installers to force developers to strip it out. It only works now by chance. Sure, there is a high probability it will work, but if one is on the wrong system it fails. If python4 (whenever it arrives) is not fully backwards compatible with python3 the generic use of "python" is going to cause untold grief. Whereas in that scenario all the code which uses "python3" should continue to function normally. Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/HAZUEGH7D7Y3PDMSYVNXHLYT6YMQLYUW/
[Distutils] Re: Fwd: Re: Use of "python" shebang an installation error?
On Wed, Jul 22, 2020 at 3:41 AM Thomas Kluyver wrote: > > On Tue, 21 Jul 2020, at 21:50, David Mathog wrote: > > ./lib/python3.6/site-packages/pip/_vendor/appdirs.py:#!/usr/bin/env python > > Python packaging tools like pip generally differentiate between *scripts*, > which are installed to be run from the command line, and *modules*, which are > imported from other Python code. Files under site-packages are modules. Any > special handling for shebangs, execute bits, or Windows .exe wrappers is > usually done only for scripts. > > It's not unusual to see a shebang in modules - I think some editors put it in > whenever you create a new Python file. But it doesn't usually do anything. If > you want to run a module directly, the normal way now is with "python -m", > which doesn't use the shebang. So in summary: 1. Invalid shebangs for modules in site-packages "should" be harmless - ignore them and hope for the best. 2. Shebangs for scripts "should" be correct. (They are while still inside a venv, but that shebang has to be corrected when the installation is moved to a normal environment, which my code is doing now.) Scripts usually end up in a "bin" directory on linux. Is that part of the installation standard or could a package put them in an arbitrary path (other than under "site-packages") under the venv's root, for instance in a directory named "scripts"? Fixing the shebangs by processing only "bin" is easy, traversing the entire tree is a bit messier. It would be good not to have to do so if that will never find an invalid shebang. Thanks, David Mathog Thanks, > > Thomas > -- > Distutils-SIG mailing list -- distutils-sig@python.org > To unsubscribe send an email to distutils-sig-le...@python.org > https://mail.python.org/mailman3/lists/distutils-sig.python.org/ > Message archived at > https://mail.python.org/archives/list/distutils-sig@python.org/message/HPTRB3S55WNAOUEHLQTWD2QDG5BL3HM6/ -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/UL3LLIEDEG34ZNAUAXWI5KJHKU2NS3O6/
[Distutils] Fwd: Re: Use of "python" shebang an installation error?
(oops, had to resend, forgot to change the destination to ) On Mon, Jul 20, 2020 at 12:38 PM John Thorvald Wodder II wrote: > > On 2020 Jul 20, at 15:25, David Mathog wrote: > > Lately I have been working on a CentOS 8 machine, and it has "python2" > > and "python3", but no "python". Many packages install scripts with a > > shebang like: > > > > #!/usr/bin/env python > > > > and those do not work on this OS. Seems like rather a large missing > > dependency which goes by without triggering a fatal error. > > How exactly are these packages getting installed? Last time I checked, both > pip and setuptools automatically set the shebang in scripts (both > console_script entry points and scripts declared with the "scripts" argument > to `setup()`) to use the path of the running Python interpreter. Are these > packages installed using your system package manager? If so, you should take > the problem up with its maintainers. Good point, I have been installing so many packages I get confused about which installer was used for which package. It turned out that many (but not all) of the files which contained #!/usr/bin/env python shebangs were installed using standard OS level tools (cmake, configure, make and the like). Example package, hisat2. I guess there isn't much choice for those but to scan the directories for python scripts and fix the shebangs. Installs that are initially into venvs and used pip3 are still an issue. Example: python3 -m venv johnnydep cd johnnydep grep -r '/usr/bin/env python$' . #finds: ./lib/python3.6/site-packages/pip/_vendor/appdirs.py:#!/usr/bin/env python ./lib/python3.6/site-packages/pip/_vendor/chardet/cli/chardetect.py:#!/usr/bin/env python ./lib/python3.6/site-packages/pip/_vendor/requests/certs.py:#!/usr/bin/env python ./lib/python3.6/site-packages/pkg_resources/_vendor/appdirs.py:#!/usr/bin/env python ./lib/python3.6/site-packages/johnnydep/pipper.py:#!/usr/bin/env python cd bin ls -1 | grep python lrwxrwxrwx. 1 modules modules7 Jul 20 14:09 python -> python3 lrwxrwxrwx. 1 modules modules 16 Jul 20 14:09 python3 -> /usr/bin/python3 source activate pip3 install johnnydep head -1 johnnydep #!/home/common/lib/python3.6/Envs/johnnydep/bin/python #same for "tabulate" and all other shebangs in bin. cd .. grep -r '/usr/bin/env python$' . #same as before grep -r '/home/common/lib/python3.6/Envs/johnnydep/bin/python' . #just the files in the bin directory. It looks like none of the "#!/usr/bin/env python" shebangs within the venv are going to be used after the install, so perhaps those are harmless. The shebangs like #!/home/common/lib/python3.6/Envs/johnnydep/bin/python are OK within the venv, but once they are "devirtualized" they become a problem. That was a known problem though - my devirtualizer code already patches all of the ones in the bin directory. I have not seen any elsewhere (yet) within the venv, but there is probably no rule that keeps them from appearing in "share" or elsewhere. The "python" in use in the venv is just a symbolic link to "python3" which is itself a symbolic link to the actual program "/usr/bin/python3". It is constructed that way based on "python -m venv" which uses pieces which come from the CentOS 8 python3-libs-3.6.8-23.el8.x86_64 RPM. Is there some requirement that a venv have a "python"? Odd that RedHat (and so CentOS) provide a "python" there, but not in the OS itself. Thanks, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/XI7PKGVQFW63ZFMWMLR554VPAGWDHWZ6/
[Distutils] Fwd: Re: Use of "python" shebang an installation error?
On Mon, Jul 20, 2020 at 4:40 PM John Thorvald Wodder II wrote: > > First of all, your last two messages only went to me, not to the list. The > mailing list doesn't set Reply-To on messages or the like, so you have to > manually set "To: distutils-sig@python.org" when replying. Aargh, right. I use gmail for my home mail, and since I'm stuck working at home, that is what I used here. Gmail likes to hide, well, pretty much everything. I will repost those responses. > > As to your e-mail, though, are any of those files even meant to be executed? > They're not in bin/; they just appear to be regular source files that some > developer slapped a shebang on. That in a sense is the issue. I don't know, you don't know, maybe the developer knows (if he/she still remembers). I really don't want to do the work to dig through the code for every package I install to determine if a shebang is used or not. Yet if I don't figure this out some end user will run a script (one of a hundred in some package I installed for their use) which will blow up because of this issue. The best I can do now is run pdvctrl reshebang $TARGET_DIR or pdvctrl reshebang $ROOT_DIR... and fix them up after the fact. (pdvctrl from python_devirtualizer here: https://sourceforge.net/projects/python-devirtualizer/). Even then it usually has to guess that "python" means "python3" and not "python2", and sometimes it guesses wrong. Today's version of that recurring issue: https://github.com/lastz/lastz/issues/30 Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/LRLC63NI26ZGSCPIOYPA4MCKSM6JHXKS/
[Distutils] Fwd: Re: Use of "python" shebang an installation error?
(oops, had to resend, forgot to change the destination to ) biopython-1.77, for instance, when installed into a virtualenv with pip3, has many of these shebangs: #!/usr/bin/env python And they are all over the place. They are: ./site-packages/Bio/bgzf.py: ./site-packages/Bio/PDB/parse_pdb_header.py: ./site-packages/Bio/PDB/PDBList.py: ./site-packages/Bio/Restriction/__init__.py: ./site-packages/Bio/Restriction/Restriction.py: ./site-packages/Bio/Restriction/PrintFormat.py: ./site-packages/Bio/Restriction/Restriction_Dictionary.py: ./site-packages/Bio/Wise/__init__.py: ./site-packages/Bio/Wise/psw.py: ./site-packages/Bio/Wise/dnal.py: ./site-packages/Bio/UniProt/GOA.py: ./site-packages/Bio/SeqUtils/__init__.py: Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/EPFOIYHVK62GYUWUVFMFGYFHZZDOHVRW/
[Distutils] Use of "python" shebang an installation error?
Lately I have been working on a CentOS 8 machine, and it has "python2" and "python3", but no "python". Many packages install scripts with a shebang like: #!/usr/bin/env python and those do not work on this OS. Seems like rather a large missing dependency which goes by without triggering a fatal error. In bioinformatics pipelines it is common for one package to invoke a script from another. So while the package which supplied a particular script might have avoided this issue by only invoking it with: python3 path/script that does not prevent another package from doing one of these: A path/script B python path/script In terms of analysis, it is trivial to find all python scripts installed by a package and examine the shebang line (if present) to see if this is an issue. I am adding a "reshebang" function to my python_devirtualizer specifically to handle the issue for scripts which are invoked directly. It is, however, not at all trivial to analyze all a package's code to see which scripts are called by other scripts, and how they are called. Moreover, they might be called from perl, or C, or some other language. So dealing with "B" above is not trivial. So, my question is, should the use of "python" (as opposed to "python2" or "python3") in a shebang be considered an installation error on a system for which "python" does not exist? I would argue yes, because we already know that python3 was not fully backwards compatible with python2, so we have reason to suspect that python4 (whenever that appears) might also not be fully backwards compatible with python3. By being picky about the python version now, that should prevent a lot of problems later. Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/CBKBFRY3CIRSHTY7LF2D5NF5XPVQPERV/
[Distutils] Re: package management - common storage while keeping the versions straight
Hi all. "Python devirtualizer" is a preliminary implementation which manages shared packages so that only one copy of each package version is required. It installs into a virtualenv, then migrates the contents out into the normal OS environment, and while so doing, replaces what would be duplicate files with soft links to a single copy. It is downloadable from here: https://sourceforge.net/projects/python-devirtualizer/ It is linux (or other POSIX like system, __maybe__ Mac) specific. No way it will run on Windows at this point because the main script is bash and the paths assume POSIX path syntax. (Might work in Mingw64 though.) Anyway, pdvctrl install packageA pdvctrl migrate packageA /wherever/packageA pdvctrl install packageB pdvctrl migrate packageB /wherever/packageB will result in a single copy of the shared dependencies on this system, with both packageA and packageB hooked to them with soft links. The import does not go awry because from within each package's site-packages directory there are only links to the files it needs, so it never sees any conflicting package versions. There is also: pdvctrl preinstall packageC pdvctrl install packageC pdvctrl migrate packageC /wherever/packageC which first uses johnnydep to look up dependencies already on the system and links those in directly before going on to install any pieces not so installed. Unfortunately the johnnydep runs with "preinstall" have so far been significantly slower than just doing a normal install and letting the migrate throw out the extra copy. On the other hand, the one package I have encountered which has conflicting requirements (scanpy-scripts) fails in a more comprehensible manner with "preinstall" than with "install". Migrate "wraps" the files in the package's "bin" directory, if any, so that they may be invoked solely by PATH like a regular program. This uses libSDL2 to get the absolute path of the wrapper program, and it defines PYTHONPATH before execve() to the actual target. So no messing about with PYTHONPATH in the user's shell or in scripts. So far I have not run into a problem with the wrappers, which essentially just inject a PYTHONPATH into the environment when the program is run. Well, one package (busco) had a file with no terminal EOL, which resulted in its last line being dropped while it was being wrapped, but that case is now handled. I do expect though at some point to encounter a package which has several files in its bin, and first_program will contain some variant of: python3 /wherever/bin/second_program The wrapper will break those, since the wrapper is a regular binary and not a python script. Regards, David Mathog On Mon, Jun 29, 2020 at 1:43 PM John Thorvald Wodder II wrote: > > On 2020 Jun 29, at 16:09, David Mathog wrote: > > > > In neither case does the egg-info file reference the corresponding > > directory, but at least the directory in both has the expected package > > name (other than case). In the examples you cited at the top, were > > any of those "different name" cases from packages with a "file" > > egg-info? > > The projects I examined were all in wheel form and thus had *.dist-info > directories instead of *.egg-info. I know very little about how eggs work, > other than that they're deprecated and should be avoided in favor of wheels. > > -- John Wodder > -- > Distutils-SIG mailing list -- distutils-sig@python.org > To unsubscribe send an email to distutils-sig-le...@python.org > https://mail.python.org/mailman3/lists/distutils-sig.python.org/ > Message archived at > https://mail.python.org/archives/list/distutils-sig@python.org/message/DMRPHSWPXPEWJOHFZVBKTJMH34KABHTM/ -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/WIWLAD3537K7DYNUBZVIMPE7SFEV6E5L/
[Distutils] Re: package management - common storage while keeping the versions straight
On Fri, Jun 26, 2020 at 2:51 PM John Thorvald Wodder II wrote: > Of the 32,517 non-matching projects, 7,117 were Odoo projects with project > names of the form "odoo{version}_addon_{foo}" containing namespace modules of > the form "odoo/addons/{foo}", and 3,175 were Django projects with project > names of the form "django_{foo}" containing packages named just "{foo}". No > other major patterns seem to stand out. In CentOS 8 the RPM python3-rhnlib-2.8.6-8.module_el8.1.0+211+ad6c0bc7.noarch has loaded into the directory /usr/lib/python3.6/site-packages two entries rhn# a directory rhnlib-2.8.6-py3.6.egg-info #a file The latter contains just this text: Metadata-Version: 1.0 Name: rhnlib Version: 2.8.6 Summary: Python libraries for the Spacewalk project Home-page: http://rhn.redhat.com Author: Mihai Ibanescu Author-email: m...@redhat.com License: GPL Description: rhnlib is a collection of python modules used by the Spacewalk (http://spacewalk.redhat.com) software. Platform: UNKNOWN Nor is there a link in the other direction: grep -iR rhnlib /usr/lib/python3.6/site-packages/rhn #nothing So while "rhn" bears a similarity to "rhnlib" it is neither the package name nor is it listed in the egg-info. This was of course installed by dnf (AKA yum) and not by egg. Is it possible for any python installer (as opposed to dnf, which runs outside of it) to install an unreferenced directory like this? Presumably not with a dist-info, but with an egg-info that does not in any way reference the active part of the installation? In a small collection (172 packages) here these were the only two "file" egg-info entries found, with their associated directories: busco BUSCO-4.0.6-py3.6.egg-info ngs ngs-1.0-py3.6.egg-info In neither case does the egg-info file reference the corresponding directory, but at least the directory in both has the expected package name (other than case). In the examples you cited at the top, were any of those "different name" cases from packages with a "file" egg-info? Thanks, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/DRBMPYMLRGGF2WED7AKWSQS7B7EARIVB/
[Distutils] Re: package management - common storage while keeping the versions straight
Thanks for that feedback. Looks like RECORD is the one to use. The names of the directories ending in dist-info seem to be uniformly: package-version.dist_info but the directory names associated with eggs come in a lot of flavors: anndata-0.6.19-py3.6.egg cutadapt-2.10.dev20+g93fb340-py3.6-linux-x86_64.egg scanpy-1.5.2.dev7+ge33a2f33-py3.6.egg h5py-2.9.0-py3.6-linux-x86_64.egg simplejson-3.17.0-py3.6.egg-info johnnydep does not give any hints that this is coming: johnnydep --output-format pinned h5py #relevant part: h5py==2.10.0 What would be some small examples for other package managers, I would like to see what they have as equivalents to dist-info and egg-info so that the script does not choke on it. Some progress with the test script. It can now convert a virtualenv to a regular directory and migrate the site-packages contents to a shared area. A second migration of a copy of the same virtualenv to a different regular directory correctly makes links to the first set. (That is, two normal directories both linked to one common set of packages.) And the test program (johnnydep) runs in both with PYTHONPATH set correctly. But preinstalling, that is setting links to the common directory before doing a normal install is tricky because of the name inconsistencies. To do that it must run johnnydep to get the necessary information, and that is not very fast. A normal install of johnnydep itself, complete with downloads, takes less time than that programs own analysis! time johnnydep johnnydep #21s vs. rm -rf ~/.cache/pip #force actual downloads #too fast to measure time python3 -m venv johnnydep #2.3s source johnnydep/bin/activate #too fast to measure time python -m pip install -U pip #update 9.0.3 to 20.1.1 #3.4s time pip3 install johnnydep #7.8s Probably a package with a huge amount of compilation would be a win for a preinstall, but it is at this point definitely not an "always faster" option. Thanks, David Mathog On Fri, Jun 26, 2020 at 2:51 PM John Thorvald Wodder II wrote: > > On 2020 Jun 26, at 15:50, David Mathog wrote: > > > Still, how common is that? Can anybody offer an estimate about what > > fraction of packages use different names like that? > > Scanning through the wheelodex.org database (specifically, a dump from > earlier this week) finds 32,517 projects where the wheel DOES NOT contain a > top-level module of the same name as the project (after correcting for > differences in case and hyphen vs. underscore vs. period) and 74,073 projects > where the wheel DOES contain a module of the same name. (5,417 projects > containing no modules were excluded.) Note that a project named "foo-bar" > containing a namespace package "foo/bar" is counted in the former group. > > Of the 32,517 non-matching projects, 7,117 were Odoo projects with project > names of the form "odoo{version}_addon_{foo}" containing namespace modules of > the form "odoo/addons/{foo}", and 3,175 were Django projects with project > names of the form "django_{foo}" containing packages named just "{foo}". No > other major patterns seem to stand out. > > -- John Wodder > -- > Distutils-SIG mailing list -- distutils-sig@python.org > To unsubscribe send an email to distutils-sig-le...@python.org > https://mail.python.org/mailman3/lists/distutils-sig.python.org/ > Message archived at > https://mail.python.org/archives/list/distutils-sig@python.org/message/V445KCPLKMEVSSEAKX776DMNSPL76JRR/ -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/TT6WZTTBMEWHTZD56HXH42JFKEI5VECK/
[Distutils] Re: package management - common storage while keeping the versions straight
On Fri, Jun 26, 2020 at 12:43 PM David Mathog wrote: > So by what method could code working outside of python possibly determine that > "yaml" goes with "PyYAML"? Sorry, I forgot that the information was in PyYAML-5.3.1-py3.6.egg-info/top_level.txt Still, how common is that? Can anybody offer an estimate about what fraction of packages use different names like that? Thanks, David Mathog Is this a common situation? > > Is pkg_resources actually a package? Does it make sense for a common > package repository to have a single instance of this directory or > should each installed python based program retain its own version of > this? > > There are some other files that live in site-packages which are not > actually packages. The list so far is: > > __pycache__ > > #some dynamic libraries, like > kiwisolver.cpython-36m-x86_64-linux-gnu.so > > #some pth files, but always so far with an explicit version number, like > sphinxcontrib_applehelp-1.0.2-py3.8-nspkg.pth > #or associated with a package with a version number like: > setuptools > setuptools-46.1.3.dist-info > setuptools.pth > > #some py files, apparently when that package does not make a corresponding > #directory like: > zipp-3.1.0.dist-info > zipp.py > > #initialization file "site" as > site.py > site.pyc > > Any others to look out for? That is, files which might be installed > in site-packages but which should not be shared. > > Hopefully this next is an appropriate question for this list, since > the issue arises from how python loads packages. Is there any way to > avoid collisions between python based programs other than activating > and deactivating their virtualenvs, or redefining PYTHONPATH, before > each is used? Programs that have the property that their library > loading is determinate (usually the case with C, fortran, bash > scripts, etc.)one can construct a bash script (for instance) which > runs 3 programs in order like so: > > prog1 > prog2 > prog3 # spawns subprocesses which run prog2 and prog1 > > and there are not generally any issues. (Yes, one can create a mess > with LD_PRELOAD and the like.) But if those 3 are python programs > unless prog1, prog2, prog3 are all built into the same virtualenv, > which usually means they come from the same software distribution, I > don't see how to avoid conflicts for the first two cases without > activating/deactivating each one, which looks like it might be tricky > in the 3rd case. > > If one has a directory like: > > TOP/bin/prog > TOP/lib/python3.6/site-packages > > Other than using PYTHONPATH to direct to it with an absolute path, is > there any way to force prog to only import from that specific > site-packages? Let me try that again. Is there a way to tell prog > via any environmental variable to look in > "../lib/python3.6/site-packages" (and nowhere else) for imports, with > the reference directory being that where prog is installed, not where > the process PWD might happen to be. Because if that was possible it > might allow a sort of "set it and forget it" method like > > export PYTHONRELPATHFROMPROG="../lib/python3.6/site-packages > prog1 #uses prog1 site-package > prog2 #uses prog2 site-package > prog3 #uses prog3 site-package > # prog1 subprocess #uses prog1 site-package > # prog2 subprocess #uses prog2 site-package > > (None of which would be necessary if python programs could import > specific versions reliably from a common directory containing multiple > versions of each package.) > > Thanks, > > David Mathog > > > On Thu, Jun 25, 2020 at 10:46 AM David Mathog wrote: > > > > On Thu, Jun 25, 2020 at 12:37 AM Paul Moore wrote: > > > > > I think the key message here is that you won't be *re*-inventing the > > > wheel. This is a wheel that still needs to be invented. > > > > It _was_ invented, but it is off round and gives a rough ride. As > > noted in the first post this: > > > > __requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy > > <3.0.0,>=2.00', 'h5py <2.10'] > > import pkg_resources > > > > was able to load the desired set of package-versions for scanpy, but > > setting a version number constraint on scanpy itself at the end of > > that list, one which matched the version that the preceding commands > > successfully loaded, broke it. So it is not reliable. > > > > And the entire __requires__ kludge is only present because for reasons > > beyond my pay grade this: > > > > import pkg_resources > > pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.
[Distutils] Re: package management - common storage while keeping the versions straight
Questions about naming conventions. The vast majority of packages when they install create in site-packages two directories with names like: foobar foobar-1.2.3.dist-info (or egg-info) However PyYAML creates: yaml PyYAML-5.3.1-py3.6.egg-info and there is also this: pkg_resources which is not associated with a versioned package. In python3 >>> import yaml >>> import pkg_resources >>> print(yaml.__version__) 5.3.1 >>> print(pkg_resources.__version__) Traceback (most recent call last): File "", line 1, in AttributeError: module 'pkg_resources' has no attribute '__version__' So by what method could code working outside of python possibly determine that "yaml" goes with "PyYAML"? Is this a common situation? Is pkg_resources actually a package? Does it make sense for a common package repository to have a single instance of this directory or should each installed python based program retain its own version of this? There are some other files that live in site-packages which are not actually packages. The list so far is: __pycache__ #some dynamic libraries, like kiwisolver.cpython-36m-x86_64-linux-gnu.so #some pth files, but always so far with an explicit version number, like sphinxcontrib_applehelp-1.0.2-py3.8-nspkg.pth #or associated with a package with a version number like: setuptools setuptools-46.1.3.dist-info setuptools.pth #some py files, apparently when that package does not make a corresponding #directory like: zipp-3.1.0.dist-info zipp.py #initialization file "site" as site.py site.pyc Any others to look out for? That is, files which might be installed in site-packages but which should not be shared. Hopefully this next is an appropriate question for this list, since the issue arises from how python loads packages. Is there any way to avoid collisions between python based programs other than activating and deactivating their virtualenvs, or redefining PYTHONPATH, before each is used? Programs that have the property that their library loading is determinate (usually the case with C, fortran, bash scripts, etc.)one can construct a bash script (for instance) which runs 3 programs in order like so: prog1 prog2 prog3 # spawns subprocesses which run prog2 and prog1 and there are not generally any issues. (Yes, one can create a mess with LD_PRELOAD and the like.) But if those 3 are python programs unless prog1, prog2, prog3 are all built into the same virtualenv, which usually means they come from the same software distribution, I don't see how to avoid conflicts for the first two cases without activating/deactivating each one, which looks like it might be tricky in the 3rd case. If one has a directory like: TOP/bin/prog TOP/lib/python3.6/site-packages Other than using PYTHONPATH to direct to it with an absolute path, is there any way to force prog to only import from that specific site-packages? Let me try that again. Is there a way to tell prog via any environmental variable to look in "../lib/python3.6/site-packages" (and nowhere else) for imports, with the reference directory being that where prog is installed, not where the process PWD might happen to be. Because if that was possible it might allow a sort of "set it and forget it" method like export PYTHONRELPATHFROMPROG="../lib/python3.6/site-packages prog1 #uses prog1 site-package prog2 #uses prog2 site-package prog3 #uses prog3 site-package # prog1 subprocess #uses prog1 site-package # prog2 subprocess #uses prog2 site-package (None of which would be necessary if python programs could import specific versions reliably from a common directory containing multiple versions of each package.) Thanks, David Mathog On Thu, Jun 25, 2020 at 10:46 AM David Mathog wrote: > > On Thu, Jun 25, 2020 at 12:37 AM Paul Moore wrote: > > > I think the key message here is that you won't be *re*-inventing the > > wheel. This is a wheel that still needs to be invented. > > It _was_ invented, but it is off round and gives a rough ride. As > noted in the first post this: > > __requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy > <3.0.0,>=2.00', 'h5py <2.10'] > import pkg_resources > > was able to load the desired set of package-versions for scanpy, but > setting a version number constraint on scanpy itself at the end of > that list, one which matched the version that the preceding commands > successfully loaded, broke it. So it is not reliable. > > And the entire __requires__ kludge is only present because for reasons > beyond my pay grade this: > > import pkg_resources > pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.6.20;etc.") > import scipy > import anndata > #etc. > > cannot work because by default "import pkg_resources" keeps only the > most recent version
[Distutils] Re: package management - common storage while keeping the versions straight
On Thu, Jun 25, 2020 at 12:37 AM Paul Moore wrote: > I think the key message here is that you won't be *re*-inventing the > wheel. This is a wheel that still needs to be invented. It _was_ invented, but it is off round and gives a rough ride. As noted in the first post this: __requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy <3.0.0,>=2.00', 'h5py <2.10'] import pkg_resources was able to load the desired set of package-versions for scanpy, but setting a version number constraint on scanpy itself at the end of that list, one which matched the version that the preceding commands successfully loaded, broke it. So it is not reliable. And the entire __requires__ kludge is only present because for reasons beyond my pay grade this: import pkg_resources pkg_resources.require("scipy<1.3.0,>=1.2.0;anndata<0.6.20;etc.") import scipy import anndata #etc. cannot work because by default "import pkg_resources" keeps only the most recent version rather than making up a tree (or list or hash or whatever) and waiting to see if there are any version constraints to be applied at the time of actual package import. What I'm doing now is basically duct tape and bailing wire to work around those deeper issues. In terms of language design, a much better fix would be to modify pkg_resources so that it will always successfully load the required versions from a designated directory which contains multiple versions of packages, and modify the package maintenance tools so that they can maintain such a directory. Regards, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/X23JIVPWU74HW3GBMVJEKAC2XUFROKAL/
[Distutils] Re: package management - common storage while keeping the versions straight
It turned out that the second install was not the cause of the timestamp change in the original. On reviewing "history" it turned out that I had accidentally run the link generation twice. That turned up this (for me) unexpected behavior: mkdir /tmp/foo ls -al /tmp/foo total 16 drwxrwxr-x. 2 modules modules 6 Jun 24 16:49 . drwxrwxrwt. 173 rootroot12288 Jun 24 16:49 .. ln -s /tmp/foo /tmp/bar ls -al /tmp/foo drwxrwxr-x. 2 modules modules 6 Jun 24 16:49 . drwxrwxrwt. 173 rootroot12288 Jun 24 16:49 .. ln -s /tmp/foo /tmp/bar ls -al /tmp/foo total 16 drwxrwxr-x. 2 modules modules17 Jun 24 16:51 . drwxrwxrwt. 173 rootroot12288 Jun 24 16:50 .. lrwxrwxrwx. 1 modules modules 8 Jun 24 16:51 foo -> /tmp/foo The repeated soft link actually put a file under the target. Strange. Apparently it is expected behavior. The problem can be avoided by using this form: ln -sn $TARGET $LINK The later installs are much faster than the first one, since putting in the links is very fast and building the packages is not. This was the trivial case though, since having done one install all the prerequisites were just "there". The johnnydep package will list the dependencies without doing the install. Guess I will throw something together based on that and the above results and see how it goes. Regards, David Mathog On Wed, Jun 24, 2020 at 4:23 PM Filipe Laíns wrote: > > On Tue, 2020-06-23 at 15:51 -0700, David Mathog wrote: > > What I am after is some method of keeping exactly one copy of each > > package-version in the common area (ie, one might find foo-1.2, > > foo-1.7, and foo-2.3 there), while also presenting only the one > > version of each (let's say foo-1.7) to a particular installed program. > > On linux it might do that by making soft links to the common > > PYTHONPATH area from another directory for which it sets PYTHONPATH > > for the application. Finally, this has to be usable by any account > > which has read execute access to the main directory. > > > > Does such a beast exist? If so, please point me to it! > > I have been meaning to do something like this for a while now! But > unfortunately I can't find the time. > > If you do choose of start implementing it, please let me know. I would > be happy to help out. > > Cheers, > Filipe Laíns -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/63NJKSY7BLPJZXLK5DJFWROGQUKJ7RVF/
[Distutils] Re: package management - common storage while keeping the versions straight
Thanks for the link. Unfortunately there was not a reference to a completed package that actually did this. As in, I really do not want to reinvent the wheel. Ugh, sorry, that's a pun in this context. Here is a first shot at this, just installing a moderately complicated package in a virtualenv and then reinstalling it in another virtualenv. Extract and execinput are my own programs (from drm_tools on sourceforge) but it is obvious from the context what they are doing. The links had to be soft because linux does not actually allow a normal user (or maybe even root) to make a hard link to a directory. cd /usr/common/lib/python3.6/Envs rm -rf ~/.cache/pip #make download clearer python3 -m venv scanpy source scanpy/bin/activate python -m pip install -U pip #update 9.0.3 to 20.1.1 which python3 #using the one in scanpy pip3 install scanpy scanpy -h #seems to start deactivate rm -rf ~/.cache/pip #make download clearer python3 -m venv scanpy2 source scanpy2/bin/activate python -m pip install -U pip #update 9.0.3 to 20.1.1 export DST=/usr/common/lib/python3.6/Envs/scanpy/lib/python3.6/site-packages export SRC=/usr/common/lib/python3.6/Envs/scanpy2/lib/python3.6/site-packages ls -1 $DST \ | grep -v __pycache__ \ | grep -v scanpy \ | grep -v easy_install.py \ | extract -fmt "ln -s $DST/[1,] $SRC/[1,]" \ | execinput pip3 install scanpy #downloaded scanpy, "Requirement already satisfied" for all the others #Installing collected packages: scanpy # Successfully installed scanpy-1.5.1 scanpy -h #seems to start deactivate source scanpy/bin/activate scanpy -h #seems to start (still) deactivate So that method seems to have some promise. It saved a considerable amount of space too: du -k scanpy | tail -1 457408 scanpy du -k scanpy2 | tail -1 24900 scanpy2 However, two potential problems are evident on inspection. The first is that when the 2nd scanpy installation was performed it updated the dates on all the directories in $DST. A workaround would be to copy all of those directories into the virtualenv temporarily, just for the installation, and then remove them and put the links in afterwards. That strikes me as awfully cludgy. Setting them read only would likely break the install. The second issue is that each package install creates two directories like: llvmlite llvmlite-0.33.0.dist-info where the latter contains top_level.txt which in turn contains one line: llvmlite pointing to the first directory. If another version must cohabit with it the "llvmlite" directories will conflict. For this sort of approach to work easily the llvmlite directory should be named "llvmlite-0.33.0" and top_level.txt should reference that too. It would be possible (probably) to work around it though by having llvmlite-0.33.0 only in the common area and use: ln -s $COMMON/llvmlite-0.33.0 $VENVAREA/llvmlite The top_level.txt in each could then reference the unversioned name. Unknown if this soft link approach will work on Windows. Regards, David Mathog On Wed, Jun 24, 2020 at 1:26 PM Steve Dower wrote: > > On 24Jun2020 1923, David Mathog wrote: > > I think I will experiment a little with pipenv and if necessary after > > each package install use a script to remove the installed libraries > > and replace them with a hard link to the one in the common area. > > Maybe it will be possible to put in those links before installing the > > package of interest (like for scanpy, see first post), which will > > hopefully keep it from having to rebuild all those packages too. > > Here's a recent discussion about this exact idea (with a link to an > earlier discussion on this list): > https://discuss.python.org/t/proposal-sharing-distrbution-installations-in-general/2524 > > It's totally possible, though it's always a balance of trade-offs. Some > of the people on that post may be interested in developing a tool to > automate parts of the process. > > Cheers, > Steve -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/2EMFGUE6QDTWBLPWDPE2TTOZOX3OFAOA/
[Distutils] Re: package management - common storage while keeping the versions straight
On Wed, Jun 24, 2020 at 1:36 AM Thomas Kluyver wrote: > > On Tue, 23 Jun 2020, at 23:51, David Mathog wrote: > > What I am after is some method of keeping exactly one copy of each > > package-version in the common area (ie, one might find foo-1.2, > > foo-1.7, and foo-2.3 there), while also presenting only the one > > version of each (let's say foo-1.7) to a particular installed program. > Conda environments work somewhat like this - all the packages are stored in a > central place, and the structure of selected ones is replicated using > hardlinks in a site-packages directory belonging to the environment. So if > your concern is not to waste disk space by storing copies of the same > packages, that might be an option. I experimented with that one a little. It installs its own copies of python and things like openssl and openblas which are already present from the linux distribution. Similarly, if some python script needs "bwa" it will install its own even though that program is already available. Basically it is yet another "replicate everything we might need whether or not it is already present" type of solution. (The extreme end of that spectrum are systems like docker, which effectively replaces the entire OS.) So there might be only the one version of each python package (not counting duplicates with the OS's python3) but now there are also duplicate copies of system libraries and utilities. I think I will experiment a little with pipenv and if necessary after each package install use a script to remove the installed libraries and replace them with a hard link to the one in the common area. Maybe it will be possible to put in those links before installing the package of interest (like for scanpy, see first post), which will hopefully keep it from having to rebuild all those packages too. Thanks, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/QBIRYI767AVZ2FCFHVTP56XIKOX4TTYQ/
[Distutils] package management - common storage while keeping the versions straight
Hi all, First post here. I have a cluster where the common software is NFS shared from the file server to other nodes. All the python packages are kept in a directory which is referenced by PYTHONPATH. The good part of that is that there is just one copy of each package-version. The bad part, as you have all no doubt guessed, is that python by itself is really bad at specifying and loading a set of particular library versions (see below), so upgrading one program will break another due to conflicting installed versions. Hence the common use of virtualenv's. But as far as I can tell each virtualenv installs a copy of each package-version it needs, resulting in multiple copies of the same package-version for common packages on the same disk. What I am after is some method of keeping exactly one copy of each package-version in the common area (ie, one might find foo-1.2, foo-1.7, and foo-2.3 there), while also presenting only the one version of each (let's say foo-1.7) to a particular installed program. On linux it might do that by making soft links to the common PYTHONPATH area from another directory for which it sets PYTHONPATH for the application. Finally, this has to be usable by any account which has read execute access to the main directory. Does such a beast exist? If so, please point me to it! The limitations of python version handling to which I refer above can be illustrated for "scanpy-scripts"'s dependencies. Given all the needed libraries in one place (plus incompatible versions) the right set can be loaded (and verified) like this: export PYTHONPATH=/path/to_common_area python3 __requires__ = ['scipy <1.3.0,>=1.2.0', 'anndata <0.6.20', 'loompy <3.0.0,>=2.00', 'h5py <2.10'] import pkg_resources import scipy import anndata import loompy import h5py import scanpy print(scipy.__version__) print(anndata.__version__) print(loompy.__version__) print(h5py.__version__) print(scanpy.__version__) quit() which emits exactly the versions scanpy-scripts needs: 1.2.3 0.6.19 2.0.17 2.9.0 1.4.3 However, adding , 'scanpy <1.4.4,>=1.4.2' at the end of __requires__ makes the whole thing fail at import pkg_resources with (many lines deleted) 792, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.ContextualVersionConflict: (scipy 1.2.3 (/home/common/lib/python3.6/site-packages/scipy-1.2.3-py3.6-linux-x86_64.egg), Requirement.parse('scipy>=1.3.1'), {'umap-learn'}) even though the scanpy it loaded in the first case was within the desired range. Moreover, specifying the desired versions as parameters to import pkg_resources does not work at all since pkg_resources only keeps the highest version of each package it finds when imported. (A limitation that never made the least bit of sense to me.) The test system is CentOS 8 with python 3.6.8. Thanks, David Mathog -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/C44E6LUGKGNKKXCEZJMOJUG3HMZKUYG2/