[issue16112] platform.architecture does not correctly escape argument to /usr/bin/file
Marc-Andre Lemburg added the comment: Jesús Cea Avión wrote: > > Jesús Cea Avión added the comment: > > Thanks for the heads-up, Victor. > > I have added Marc-Andre Lemburg to the nosy list, so he can know about this > issue and can provide feedback (or request a backout for 2.7). > > Marc-Andre?. The comment that Viktor posted still stands for Python 2.7. You can use subprocess in platform for Python 2.7, but only if it's available. Otherwise the module must fall back to the portable popen() that comes with the platform module. It may be worth adding that selection process to the popen() function in platform itself. For Python 3.x, you can use subprocess. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 05 2012) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2012-09-27: Released eGenix PyRun 1.1.0 ... http://egenix.com/go35 2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34 2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go33 2012-10-23: Python Meeting Duesseldorf ... 18 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- ___ Python tracker <http://bugs.python.org/issue16112> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16047] Tools/freeze no longer works in Python 3
New submission from Marc-Andre Lemburg: The freeze tool used for compiling Python binaries with frozen modules no longer works with Python 3.x. It looks like it was never updated to the various path and symbols changes introduced with PEP 3149 (ABI tags) in Python 3.2. Even with lots of symlinks to restore the non-ABI flagged names, freezing fails with a linker error in Python 3.3: Tools/freeze> python3 freeze.py hello.py Tools/freeze> make config.o:(.data+0x38): undefined reference to `PyInit__imp' collect2: ld returned 1 exit status make: *** [hello] Error 1 -- components: Demos and Tools messages: 171295 nosy: lemburg priority: normal severity: normal status: open title: Tools/freeze no longer works in Python 3 versions: Python 3.2, Python 3.3, Python 3.4 ___ Python tracker <http://bugs.python.org/issue16047> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16027] pkgutil doesn't support frozen modules
Marc-Andre Lemburg added the comment: Nick Coghlan wrote: > > Nick Coghlan added the comment: > > Can you confirm this problem still exists on 3.3? The pkgutil emulation isn't > used by runpy any more - with the migration to importlib, the interface that > runpy invokes fails outright if no loader is found rather than falling back > to the emulation (we only retained the emulation for backwards compatibility > - it's a public API, so others may be using it directly). That's difficult to test, since the Tools/freeze/ tool no longer works in Python 3.3. I'll open a separate issue for that. > I have a feeling that there may still be a couple of checks which are > restricted to PY_SOURCE and PY_COMPILED that really should be allowing > PY_FROZEN as well. Same here. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 25 2012) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2012-10-29: PyCon DE 2012, Leipzig, Germany ...34 days to go 2012-10-23: Python Meeting Duesseldorf ... 28 days to go 2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go31 2012-09-18: Released mxODBC Zope DA 2.1.0 ... http://egenix.com/go32 eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- ___ Python tracker <http://bugs.python.org/issue16027> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16027] pkgutil doesn't support frozen modules
Marc-Andre Lemburg added the comment: Here's the fix we're applying in pyrun to make -m imports work at least for top-level modules: --- /home/lemburg/orig/Python-2.7.3/Lib/pkgutil.py 2012-04-10 01:07:30.0 +0200 +++ pkgutil.py 2012-09-24 22:53:30.982526065 +0200 @@ -273,10 +273,21 @@ class ImpLoader: def is_package(self, fullname): fullname = self._fix_name(fullname) return self.etc[2]==imp.PKG_DIRECTORY def get_code(self, fullname=None): +if self.code is not None: +return self.code +fullname = self._fix_name(fullname) +mod_type = self.etc[2] +if mod_type == imp.PY_FROZEN: +self.code = imp.get_frozen_object(fullname) +return self.code +else: +return self._get_code(fullname) + +def _get_code(self, fullname=None): fullname = self._fix_name(fullname) if self.code is None: mod_type = self.etc[2] if mod_type==imp.PY_SOURCE: source = self.get_source(fullname) This makes runpy work for top-level frozen modules, but it's really only partial solution, since pkgutil would need to get such support in more places. We also found that for some reason, runpy/pkgutil does not work for frozen package imports, e.g. wsgiref.util. The reasons for this appear to be deeper than just in the pkgutil module. We don't have a solution for this yet. It is also not clear whether the problem still exists in Python 3.x. The __path__ attribute of frozen modules was changed in 3.0 to be a list like for all other modules, however, applying that change to 2.x lets runpy/pkgutil fail altogether (not even the above fix works anymore). -- ___ Python tracker <http://bugs.python.org/issue16027> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16027] pkgutil doesn't support frozen modules
Marc-Andre Lemburg added the comment: Correction: the helper function is called imp.get_frozen_object(). -- ___ Python tracker <http://bugs.python.org/issue16027> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16027] pkgutil doesn't support frozen modules
New submission from Marc-Andre Lemburg: pkgutil is used by runpy to run Python modules that are loaded via the -m command line switch. Unfortunately, this doesn't work for frozen modules, since pkgutil doesn't know how to load their code object (this can be had via imp.get_code_object() for frozen modules). We found the problem while working on eGenix PyRun (see http://www.egenix.com/products/python/PyRun/) which uses frozen modules extensively. We currently only target Python 2.x, so will have work around the problem with a patch, but Python 3.x still has the same problem. -- components: Library (Lib) messages: 171163 nosy: lemburg priority: normal severity: normal status: open title: pkgutil doesn't support frozen modules versions: Python 3.2, Python 3.3 ___ Python tracker <http://bugs.python.org/issue16027> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15443] datetime module has no support for nanoseconds
Marc-Andre Lemburg added the comment: [Roundup's email interface again...] >>>> x = 86400.0 >>>> x == x + 1e-9 > False >>>> x == x + 1e-10 > False >>>> x == x + 1e-11 > False >>>> x == x + 1e-12 > True -- ___ Python tracker <http://bugs.python.org/issue15443> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15443] datetime module has no support for nanoseconds
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > On Wed, Jul 25, 2012 at 4:17 AM, Marc-Andre Lemburg > wrote: >> ... full C double precision for the time part of a timestamp, >> which covers nanoseconds just fine. > > No, it does not: > >>>> import time >>>> t = time.time() >>>> t + 5e-9 == t > True > > In fact, C double precision is barely enough to cover microseconds: > >>>> t + 1e-6 == t > False > >>>> t + 1e-7 == t > True I was referring to the use of a C double to store the time part in mxDateTime. mxDateTime uses the C double to store the number of seconds since midnight, so you don't run into the Unix ticks value range problem you showcased above. -- ___ Python tracker <http://bugs.python.org/issue15443> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15443] datetime module has no support for nanoseconds
Marc-Andre Lemburg added the comment: Marc-Andre Lemburg wrote: > >> Alexander Belopolsky added the comment: >> >> On Wed, Jul 25, 2012 at 4:17 AM, Marc-Andre Lemburg >> wrote: >>> ... full C double precision for the time part of a timestamp, >>> which covers nanoseconds just fine. >> >> No, it does not: >> >>>>> import time >>>>> t = time.time() >>>>> t + 5e-9 == t >> True >> >> In fact, C double precision is barely enough to cover microseconds: >> >>>>> t + 1e-6 == t >> False >> >>>>> t + 1e-7 == t >> True > > I was referring to the use of a C double to store the time part > in mxDateTime. mxDateTime uses the C double to store the number of > seconds since midnight, so you don't run into the Unix ticks value > range problem you showcased above. There's enough room to even store 1/100th of a nanosecond, which may be needed for some physics experiments :-) False >>> x == x + 1e-10 False >>> x == x + 1e-11 False >>> x == x + 1e-12 True -- ___ Python tracker <http://bugs.python.org/issue15443> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15444] Incorrectly written contributor's names
Marc-Andre Lemburg added the comment: Thank you for taking the initiative. Regarding use of UTF-8 for text files: I think we ought to acknowledge that UTF-8 has become the defacto standard for non-ASCII text files by now and with Python 3 being all Unicode, it feels silly not make use of it in Python source files. Regarding my name: I have no issue with the apostrophe missing on the e. I've long given up using it in source code or emails :-) -- ___ Python tracker <http://bugs.python.org/issue15444> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15443] datetime module has no support for nanoseconds
Marc-Andre Lemburg added the comment: Vincenzo Ampolo wrote: > > Vincenzo Ampolo added the comment: > > This is a real use case I'm working with that needs nanosecond precision > and lead me in submitting this request: > > most OSes let users capture network packets (using tools like tcpdump or > wireshark) and store them using file formats like pcap or pcap-ng. These > formats include a timestamp for each of the captured packets, and this > timestamp usually has nanosecond precision. The reason is that on > gigabit and 10 gigabit networks the frame rate is so high that > microsecond precision is not enough to tell two frames apart. > pcap (and now pcap-ng) are extremely popular file formats, with millions > of files stored around the world. Support for nanoseconds in datetime > would make it possible to properly parse these files inside python to > compute precise statistics, for example network delays or round trip times. > > Other case is in stock markets. In that field information is timed in > nanoseconds and have the ability to easily deal with this kind of > representation natively with datetime can make the standard module even > more powerful. > > The company I work for is in the data networking field, and we use > python extensively. Currently we rely on custom code to process > timestamps, a nanosecond datetime would let us avoit that and use > standard python datetime module. Thanks for the two use cases. You might want to look at mxDateTime and use that for your timestamps. It does provide full C double precision for the time part of a timestamp, which covers nanoseconds just fine. -- ___ Python tracker <http://bugs.python.org/issue15443> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15443] datetime module has no support for nanoseconds
Marc-Andre Lemburg added the comment: Vincenzo Ampolo wrote: > > As long as computers evolve time management becomes more precise and more > granular. > Unfortunately the standard datetime module is not able to deal with > nanoseconds even if OSes are able to. For example if i do: > > print "%.9f" % time.time() > 1343158163.471209049 > > I've actual timestamp from the epoch with nanosecond granularity. > > Thus support for nanoseconds in datetime would really be appreciated I would be interested in an actual use case for this. -- ___ Python tracker <http://bugs.python.org/issue15443> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15369] pybench and test.pystone poorly documented
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > > Brett Cannon added the comment: > > I disagree. They are outdated benchmarks and probably should either be > removed or left undocumented. Proper testing of performance is with the > Unladen Swallow benchmarks. I disagree with your statement. Just like every benchmark, they serve their purpose in their particular field of use, e.g. pybench may not be useful for the JIT approach originally taken by the Unladden Swallow project, but it's still useful to test/check changes in the non-JIT CPython interpreter and it's extensible to take new developments into account. pystone is useful to get a quick feel the performance of Python on a machine. -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue15369> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1294959] Problems with /usr/lib64 builds.
Marc-Andre Lemburg added the comment: Éric Araujo wrote: > > Éric Araujo added the comment: > > On Mar 29, 2011, at 10:12 PM, Matthias Klose wrote: >> no, it looks for headers and libraries in more directories. But really, this >> whole testing for paths is wrong. Just use the compiler to search for headers >> and libraries, no need to check these on your own. > > Do all compilers provide this info, including Windows ones? If so, that > would be a nice feature for distutils2. This only works for a handful of system library paths, not the extra ones that you may need to search for local installations of libraries and which you have to inform the compiler about :-) Many gcc installations, for example, don't include the /usr/local or /opt/local dir trees in the search. On Windows, you have to run the correct vc*.bat files to have the paths setup and optional software rarely adds the correct paths to LIB and INCLUDE. The compiler also won't help with the problem Sean originally pointed to: building software on systems that can run both 32-bit and 64-bit and finding the right set of libs to link at. Another problem is finding the paths to the right version of a library (both include files and corresponding libraries). While it would be great to have a system tool take care of setting things up correctly, I don't know of any such tool, so searching paths and inspecting files using REs appears to be the only way to build a general purpose detection scheme. mxSetup.py (included in egenix-mx-base) uses such a scheme, distutils has one too. -- ___ Python tracker <http://bugs.python.org/issue1294959> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14572] 2.7.3: sqlite module does not build on centos 5 and Mac OS X 10.4
Marc-Andre Lemburg added the comment: Mac OS X 10.4 is also affected and for the same reason. SQLite builds fine for Python 2.5 and 2.6, but not for 2.7. -- nosy: +lemburg title: 2.7.3: sqlite module does not build on centos 5 -> 2.7.3: sqlite module does not build on centos 5 and Mac OS X 10.4 ___ Python tracker <http://bugs.python.org/issue14572> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Marc-Andre Lemburg wrote: > > Marc-Andre Lemburg added the comment: > > Nick Coghlan wrote: >> >> Nick Coghlan added the comment: >> >> At the very least, failing to regenerate importlib.h shouldn't be a fatal >> build error. It should just run with what its got, and hopefully you will >> get a working interpreter out the other end, such that you can regenerate >> the frozen module on the next pass. >> >> If we change that, then I'm OK with keeping the automatic rebuild. > > I fixed that already today. See http://bugs.python.org/issue14605 and http://hg.python.org/lookup/acfdf46b8de1 + http://hg.python.org/cpython/rev/5fea362b92fc > You now get a warning message from make, but no build error across > all buildbots like I had run into yesterday when working on the code. -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Nick Coghlan wrote: > > Nick Coghlan added the comment: > > At the very least, failing to regenerate importlib.h shouldn't be a fatal > build error. It should just run with what its got, and hopefully you will get > a working interpreter out the other end, such that you can regenerate the > frozen module on the next pass. > > If we change that, then I'm OK with keeping the automatic rebuild. I fixed that already today. You now get a warning message from make, but no build error across all buildbots like I had run into yesterday when working on the code. -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> The question pybuildir.txt apparently tries to solve is whether Python >> is running from the build dir or not. It's not whether Python was >> installed or not. > > That's the same, for all we're concerned. > But pybuilddir.txt does not only solve that problem. It also contains > the path to extension modules generated by setup.py, so that sys.path > can be setup appropriately at startup. Would be easier to tell distutils to install the extensions in a fixed name dir (instead of using a platform and version in the name) and then use that getpath.c. distutils is pretty flexible at that :-) -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >>> Look for "pybuilddir.txt". >> >> Oh dear. Another one of those hacks... why wasn't this done using >> constants passed in by the configure script and simple string >> comparison ? > > How would that help distinguish between an installed Python and a > non-installed Python? If you have an idea about that, please open an > issue and explain it precisely :) The question pybuildir.txt apparently tries to solve is whether Python is running from the build dir or not. It's not whether Python was installed or not. Checking for the build dir can be done by looking at the argv[0] of the executable and comparing that to the build dir. This can be compiled into the interpreter using a constant, say BUILDIR. At runtime, you'd simply compare the current argv[0] to the BUILDDIR. If it matches, you know that you can assume the build dir layout with reasonable certainty and proceed accordingly. No need for extra joins, file reads, etc. But given the enormous startup time of Python 3.3, those few stats won't make a difference anyway. This would need a completely different holistic approach. Perhaps something for SoC project. -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> Code to detect whether you're running off a checkout vs. a normal >> installation by looking at even more directories ? I don't >> see any in getpath.c (and that's good). > > Look for "pybuilddir.txt". Oh dear. Another one of those hacks... why wasn't this done using constants passed in by the configure script and simple string comparison ? BTW: The startup time of python3.3 is 113ms on my machine, that's more than twice as long as python2.7. Given the history, it looks like no one cares about these things anymore... :-( -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14605] Make import machinery explicit
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > > You can see a little discussion in http://bugs.python.org/issue14642, but it > has been discussed elsewhere and the automatic rebuilding was preferred (but > it is not a requirement to build as importlib.h is in hg). An automatic rebuild is fine, but only as long as the local ./python actually exists. I was unaware of make rule, so did not run make to check things before the checkin. As a result, the bootstrap module received a more recent timestamp than importlib.h and this caused all the buildbots to force a rebuild of importlib.h - which failed, since they didn't have a built ./python at that stage. I checked in a fix and added a warning to the bootstrap script. -- ___ Python tracker <http://bugs.python.org/issue14605> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14605] Make import machinery explicit
Marc-Andre Lemburg added the comment: Marc-Andre Lemburg wrote: > Looking further I found this line in the Makefile: > > > # Importlib > > Python/importlib.h: $(srcdir)/Lib/importlib/_bootstrap.py > $(srcdir)/Python/freeze_importlib.py > ./$(BUILDPYTHON) $(srcdir)/Python/freeze_importlib.py \ > $(srcdir)/Lib/importlib/_bootstrap.py Python/importlib.h > > Since the patch modified _bootstrap.py, make wants to recreate importlib.h, > but at that time $(BUILDPYTHON) doesn't yet exist. I now ran 'make' after applying the patches to have the importlib.h recreated. This setup looks a bit fragile to me. I think it would be better to make creation of the importlib.h an explicit operation that has to be done in case the Python code changes (e.g. by creating a make target build-importlib.h), with the Makefile only warning about a needed update instead of failing completely. -- ___ Python tracker <http://bugs.python.org/issue14605> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14605] Make import machinery explicit
Marc-Andre Lemburg added the comment: R. David Murray wrote: > > R. David Murray added the comment: > > Hmm. Some at least of the buildbots have failed to build after that patch: > > ./python ./Python/freeze_importlib.py \ > ./Lib/importlib/_bootstrap.py Python/importlib.h > make: ./python: Command not found > make: *** [Python/importlib.h] Error 127 > program finished with exit code 2 > > (http://www.python.org/dev/buildbot/all/builders/AMD64%20Gentoo%20Wide%203.x/builds/3771) Thanks for mentioning this. I've reverted the change for now and will have a look tomorrow. The logs of the failing bots are not very informative about what is going on: gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. -I./Include -DPy_BUILD_CORE -o Python/dynamic_annotations.o Python/dynamic_annotations.c gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. -I./Include -DPy_BUILD_CORE -o Python/errors.o Python/errors.c ./python ./Python/freeze_importlib.py \ ./Lib/importlib/_bootstrap.py Python/importlib.h make: ./python: Command not found make: *** [Python/importlib.h] Error 127 program finished with exit code 2 vs. gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. -I./Include -DPy_BUILD_CORE -o Python/dynamic_annotations.o Python/dynamic_annotations.c gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. -I./Include -DPy_BUILD_CORE -o Python/errors.o Python/errors.c gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. -I./Include -DPy_BUILD_CORE -o Python/frozen.o Python/frozen.c gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. -I./Include -DPy_BUILD_CORE -o Python/frozenmain.o Python/frozenmain.c gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. -I./Include -DPy_BUILD_CORE -o Python/future.o Python/future.c I guess some commands are not printed to stdout. Looking at the buildbots again: reverting the patch has not caused the lights to go green again. Very strange indeed. Looking further I found this line in the Makefile: # Importlib Python/importlib.h: $(srcdir)/Lib/importlib/_bootstrap.py $(srcdir)/Python/freeze_importlib.py ./$(BUILDPYTHON) $(srcdir)/Python/freeze_importlib.py \ $(srcdir)/Lib/importlib/_bootstrap.py Python/importlib.h Since the patch modified _bootstrap.py, make wants to recreate importlib.h, but at that time $(BUILDPYTHON) doesn't yet exist. -- ___ Python tracker <http://bugs.python.org/issue14605> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14605] Make import machinery explicit
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > > I documented it explicitly so people can use it if they so choose (e.g. look > at sys._getframe()). If you want to change this that's fine, but I am > personally not going to put the effort in to rename the class, update the > tests, and change the docs for this (we almost stopped allowing the > importation of bytecode directly not so long ago but got push-back so we > backed off). I renamed the loader and reworded the notice in the docs. Thanks, -- Marc-Andre Lemburg eGenix.com 2012-04-28: PythonCamp 2012, Cologne, Germany 3 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- ___ Python tracker <http://bugs.python.org/issue14605> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > >> Adding more cruft to getpath.c or similar routines is just going to >> slow down startup time even more... > > The code is already there. Code to detect whether you're running off a checkout vs. a normal installation by looking at even more directories ? I don't see any in getpath.c (and that's good). -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > > Modules/getpath.c seems to be where the C code does it when getting paths for > sys.path. So it would be possible to use that same algorithm to set some sys > attribute (e.g. in_checkout or something) much like sys.gettotalrefcount is > optional and only shown when built with --with-pydebug. Otherwise some > directory structure check could be done (e.g. find importlib/_bootstrap.py > off of sys.path, and then see if ../Modules/Setup or something also exists > that would never show up in an installed CPython). Why not simply use a flag that get's set based on an environment variable, say PYTHONDEVMODE ? Adding more cruft to getpath.c or similar routines is just going to slow down startup time even more... Python 2.7 has a startup time of 70ms on my machine; compare that to Python 2.1 with 10ms and Perl 5 with just 4ms. -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > > Brett Cannon added the comment: > > So basically if you are running in a checkout, grab the source file and > compile it manually since its location is essentially hard-coded and thus you > don't need to care about sys.path and all the other stuff required to do an > import, while using the frozen code for when you are running an installed > module since you would otherwise need to do the search for importlib's source > file to do a load at startup properly. Right. > That's an interesting idea. How do we currently tell that the interpreter is > running in a checkout? Is that exposed in any way to Python code? There's some magic happening in site.py for checkouts, but I'm not sure whether any of that is persistent or even available at the time these particular imports would happen. Then again, I'm not sure you need to know whether you have a checkout or not. You just need some flag to identify whether you want the search for external module code to take place or not. sys.flags could be used for that. -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > > Brett Cannon added the comment: > > I don't quite follow what you are suggesting, MAL. Are you saying to freeze > importlib.__init__ and importlib._bootstrap and somehow have > improtlib.__init__ choose what to load, frozen or source? No, it always loads and runs the frozen code, but at the start of the module code it branches between the frozen bytecode and the code read from an external file. Pseudo-code in every module you wish to be able to host externally: # # MyModule # if operating_in_dev_mode and '' in __file__: exec(open('dev-area/MyModule.py', 'r).read(), globals(), globals()) else: # Normal module code class MyClass: ... # hundreds of lines of code... Aside: With a module scope "break", the code would look more elegant: # # MyModule # if operating_in_dev_mode and '' in __file__: exec(open('dev-area/MyModule.py', 'r).read(), globals(), globals()) break # Normal module code class MyClass: ... # hundreds of lines of code... -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14605] Make import machinery explicit
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > > That initial comment is out-of-date. If you look that the commit I made I > documented importlib.machinery._SourcelessFileLoader. I am continuing the > discouragement of using bytecode files as an obfuscation technique (because > it's a bad one), but I decided to at least document the class so people can > use it at their own peril and know about it if they happen to come across the > object during execution. It's not a perfect obfuscation technique, but a pretty simple and (legally) effective one to use. FWIW, I don't think the comment in the check-in is appropriate: """ 1.127 + It is **strongly** suggested you do not rely on this loader (hence the 1.128 + leading underscore of the class). Direct use of bytecode files (and thus not 1.129 + source code files) inhibits your modules from being usable by all Python 1.130 + implementations. It also runs the risk of your bytecode files not being 1.131 + usable by new versions of Python which change the bytecode format. This 1.132 + class is only documented as it is directly used by import and thus can 1.133 + potentially have instances show up as a module's ``__loader__`` attribute. """ The "risks" you mention there are really up to the application developers to decide how to handle, not the Python developers. Python has a long tradition of being friendly to commercial applications and I don't see any reason why we should stop that. If you do want this to change, please write a PEP. This may appear to be a small change in direction, but it does in fact have quite some impact on the usefulness of CPython in commercial settings. I also think that the SourcelessFileLoader loader should be first class citizen without the leading underscore if the importlib is to completely replace the current import mechanism. Why force developers to write their own loader instead of using the standard one just because of the leading underscore, when it's only 20 lines of code ? Thanks, -- Marc-Andre Lemburg eGenix.com 2012-04-28: PythonCamp 2012, Cologne, Germany 4 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- ___ Python tracker <http://bugs.python.org/issue14605> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: test me thod. Another option is we hide the source as _importlib or something to allow direct importation w/o any tricks under a protected name. Using the freeze everything approach you make things easier for the implementation, since you don't have to think about whether certain pieces of code are already available or not. For development, you can also have the package load bytecode or source from an external package instead of running (all of) the module's bytecode that was compiled into the binary. This is fairly easy to do, since the needed exec() does not depend on the import machinery. The only downside is big if statement to isolate the frozen version from the loaded one - would be great if we had a command to stop module execution or code execution for a block to make that more elegant, e.g. "break" at module scope :-) -- ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14605] Make import machinery explicit
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > I am not exposing SourcelessFileLoader because importlib publicly tries to > discourage the shipping of .pyc files w/o their corresponding source files. > Otherwise all objects as used by importlib for performing imports will become > public. What's the reasoning behind this idea ? Is Python 3.3 no longer meant to be used for closed source applications ? -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue14605> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14657] Avoid two importlib copies
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> This would also mean that changes to importlib._bootstrap would >> actually take effect for user code almost immediately, *without* >> rebuilding Python, as the frozen version would *only* be used to get >> hold of the pure Python version. > > Actually, _io, encodings and friends must be loaded before importlib > gets imported from Python code, so you will still have __loader__ > entries referencing the frozen importlib, unless you also rewrite these > attributes. > > My desire here is not to hide _frozen_importlib, rather to avoid subtle > issues with two instances of a module living in memory with separate > global states. Whether it's the frozen version or the on-disk Python > version that gets the preference is another question (a less important > one in my mind). Why don't you freeze the whole importlib package to avoid all these issues ? As side effect, it will also load a little faster. -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue14657> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14423] Getting the starting date of iso week from a week number and a year.
Marc-Andre Lemburg added the comment: Mark Dickinson wrote: > > By the way, I don't think the algorithm used in the current patch is correct. > For 'date.from_iso_week(2009, 1)' I get 2009/1/1, which was a Thursday. The > documentation seems to indicate that a Monday should be returned. True, the correct date is 2008-12-29. -- ___ Python tracker <http://bugs.python.org/issue14423> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler
Marc-Andre Lemburg added the comment: ink it is not unlikely that you *are* the only ones affected by it. With "in the wild" I'm referring to the function being released in the ccompiler not only in alpha releases but also in the beta releases, the 2.7, 2.7.1 and 2.7.2 release - in every release since early in 2010. We were unaware of the reversal of the changes by Tarek and the way we coded things in mxSetup.py did not show that things were removed again, simply because we support more than just Python 2.7 and have proper fallback solutions for most things. Only in this particular case, we were using different strategies based on the Python version number and so there is no fallback. > Nevertheless, what are the alternatives? We could add a wrapper function > into distutils.ccompiler that just calls the distutils.sysconfig version. > Here's a patch that attempts to do so. That should fix that breakage for the > eGenix packages. It would be great if you could test it. The fix is easy: simply import the customize_compiler() API in the ccompiler module to maintain compatibility with what had already been release. No need to add a wrapper function, a single from distutils.sysconfig import customize_compiler() in ccompile.py will do just fine. > It's up to the 2.7 release manager to decide what action to take, i.e. > whether the patch is needed and, if so, how quickly to schedule a new > release. As a practical matter, regardless of whether the patch is applied > in Python or not, I would assume that a faster solution for your end users > would be to ship a version of the eGenix packages that reverts the changes(s) > there. By the way, it looks like you'll need to eventually do that anyway > since the code in mxSetup.py incorrectly assumes that the corresponding > changes were also made to Python 3.2. We don't support Python 3.x yet, so that's a non-issue at the moment. But yes, we will have to release new patch level releases for all our packages to get this fixed for our users. -- ___ Python tracker <http://bugs.python.org/issue13994> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler
Marc-Andre Lemburg added the comment: Marc-Andre Lemburg wrote: > >> Ned Deily added the comment: >> >> That's unfortunate. But the documented location for customize_compiler is >> and, AFAIK, had always been in distutils.sysconfig. It was an inadvertent >> consequence of the bad revert during the 2.7 development cycle that a second >> copy was made available in distutils.ccompiler. That change was not >> supposed to be released in 2.7 and was never documented. So I don't think >> there is anything that can or needs to be done as this point in Python >> itself. Other opinions? > > Excuse me, Ned, but that's not how we do approach dot releases in Python. > > Regardless of whether the documentation was fixed or not, you cannot > simply remove a non-private function without making sure that at least > the import continues to work. Turns out, the "fix" broke all our packages for Python 2.7.3 and I can hardly believe we're the only ones affected by this. -- ___ Python tracker <http://bugs.python.org/issue13994> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler
Marc-Andre Lemburg added the comment: Ned Deily wrote: > > And to recap the history here, there was a change in direction for Distutils > during the 2.7 development cycle, as decided at the 2010 language summit, in > particular to revert feature changes in Distutils for 2.7 to its 2.6.x state > and, going forward, "Distutils in Python will be feature-frozen". > > http://mail.python.org/pipermail/python-dev/2010-March/098135.html I know that distutils development was stopped (even though I don't consider that a good thing), but since the code changes were let into the wild, we have to deal with it properly now. -- ___ Python tracker <http://bugs.python.org/issue13994> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler
Changes by Marc-Andre Lemburg : -- resolution: fixed -> ___ Python tracker <http://bugs.python.org/issue13994> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler
Marc-Andre Lemburg added the comment: Ned Deily wrote: > > Ned Deily added the comment: > > That's unfortunate. But the documented location for customize_compiler is > and, AFAIK, had always been in distutils.sysconfig. It was an inadvertent > consequence of the bad revert during the 2.7 development cycle that a second > copy was made available in distutils.ccompiler. That change was not supposed > to be released in 2.7 and was never documented. So I don't think there is > anything that can or needs to be done as this point in Python itself. Other > opinions? Excuse me, Ned, but that's not how we do approach dot releases in Python. Regardless of whether the documentation was fixed or not, you cannot simply remove a non-private function without making sure that at least the import continues to work. -- status: pending -> open ___ Python tracker <http://bugs.python.org/issue13994> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler
Marc-Andre Lemburg added the comment: Éric Araujo wrote: > > Sorry for not thinking about this. I’ll be more careful. No need to be sorry; these things can happen. What I don't understand is this line in the news section: "Complete the revert back to only having one in distutils.sysconfig as 7.12 + is the case in 3.x." Back when I discussed these changes with Tarek, we both agreed that customize_compiler() is better placed into the ccompiler module than the sysconfig module, so I think the one in the sysconfig module should be replaced with a reference to the version in the ccompile module - in both 2.7 and 3.x. -- ___ Python tracker <http://bugs.python.org/issue13994> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler
Marc-Andre Lemburg added the comment: Here's the quote from mxSetup.py: # distutils changed a lot in Python 2.7 due to many # distutils.sysconfig APIs having been moved to the new # (top-level) sysconfig module. from sysconfig import \ get_config_h_filename, parse_config_h, get_path, \ get_config_vars, get_python_version, get_platform # This API was moved from distutils.sysconfig to distutils.ccompiler # in Python 2.7 from distutils.ccompiler import customize_compiler So in 2.7 the function was moved from sysconfig to ccompiler (where it belongs), and now you're reverting the change in the third dot release. -- ___ Python tracker <http://bugs.python.org/issue13994> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler
Marc-Andre Lemburg added the comment: The patch broke egenix-mx-base, since it relies on the customize_compiler() being available in distutils.ccompiler: https://www.egenix.com/mailman-archives/egenix-users/2012-April/114838.html If you make such changes to dot releases, please make absolutely sure that when you move functions from one module to another, you keep backwards compatibility aliases around. -- nosy: +lemburg resolution: fixed -> status: closed -> open ___ Python tracker <http://bugs.python.org/issue13994> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14428] Implementation of the PEP 418
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> Please leave the pybench default timers unchanged in case the >> new APIs are not available. > > Ok, done in the new patch: perf_counter_process_time-2.patch. Thanks. -- ___ Python tracker <http://bugs.python.org/issue14428> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14619] Enhanced variable substitution for databases
Marc-Andre Lemburg added the comment: Raymond, the variable substitution is normally done by the database and not the Python database modules, so you'd have to ask the database maintainers for assistance. The qmark ('?') parameter style is part of the ODBC standard, so it's unlikely that this will get changed any time soon unless you have good contacts with Microsoft :-) The ODBC standard also doesn't support multi-value substitutions in the API, so there's no way to pass the array to the database driver. BTW: Such things are better discussed on the DB-SIG mailing list than the Python tracker. -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue14619> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14428] Implementation of the PEP 418
Marc-Andre Lemburg added the comment: Please leave the pybench default timers unchanged in case the new APIs are not available. The perf_counter_process_time.patch currently changes them, even though the new APIs are not available on older Python releases, thus breaking pybench for e.g. Python 3.2 or earlier releases. Ditto for the resolution changes: these need to be optional and not cause a break when used in Python 3.1/3.2. Thanks, -- Marc-Andre Lemburg eGenix.com 2012-04-28: PythonCamp 2012, Cologne, Germany 10 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- ___ Python tracker <http://bugs.python.org/issue14428> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14428] Implementation of the PEP 418
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > perf_counter_process_time.patch: replace "time.clock if windows else > time.time" with time.perf_counter, and getrusage/clock with time.process_time. > > pybench and timeit now use time.perf_counter() by default. profile uses > time.proces_time() by default. > > pybench uses time.get_clock_info() to display the precision and the > underlying C function (or the resolution if the precision is not available). > > Tools/pybench/systimes.py and Tools/pybench/clockres.py may be removed: these > features are now available directly in the time module. No changes to the pybench defaults, please. It has to stay backwards compatible with older releases. Adding optional new timers is fine, though. Thanks, -- Marc-Andre Lemburg eGenix.com 2012-04-28: PythonCamp 2012, Cologne, Germany 15 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- ___ Python tracker <http://bugs.python.org/issue14428> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14423] Getting the starting date of iso week from a week number and a year.
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > On Mon, Apr 9, 2012 at 6:20 PM, Marc-Andre Lemburg > wrote: >> Which is wrong, since the start of the first ISO week of a year >> can in fact start in the preceeding year... > > Hmm, the dateutil documentation seems to imply that relativedelta > takes care of this: > > http://labix.org/python-dateutil#head-72c4689ec5608067d118b9143cef6bdffb6dad4e > > (Search the page for "ISO") That's not realtivedelta taking care of it, it's the way it is used: the week with 4.1. in it is the first ISO week of a year; it then goes back to the previous Monday and adds 14 weeks from there to go to the Monday of the 15th week. This works fine as long as 4.1. doesn't fall on a Monday... You don't really expect anyone to remember such rules, do you ? :-) -- ___ Python tracker <http://bugs.python.org/issue14423> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14423] Getting the starting date of iso week from a week number and a year.
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > Before you invest in a C version, let's discuss whether this feature is > desirable. The proposed function implements a very simple and not very > common calculation. Note that even dateutil does not provide direct support > for this: you are instructed to use relativedelta to add weeks to January 1st > of the given year. Which is wrong, since the start of the first ISO week of a year can in fact start in the preceeding year... http://en.wikipedia.org/wiki/ISO_week_date and it's not a simple calculation. ISO weeks are in common use throughout Europe, it's part of the ISO 8601 standard. mxDateTime has had such constructors for ages: http://www.egenix.com/products/python/mxBase/mxDateTime/doc/#_Toc293683820 -- ___ Python tracker <http://bugs.python.org/issue14423> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14428] Implementation of the PEP 418
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> I think you need to reconsider the time.steady() name you're using >> in the PEP. For practical purposes, it's better to call it >> time.monotonic() > > I opened a new thread on python-dev to discuss this topic. > >> and only make the function available if the OS provides >> a monotonic clock. > > Oh, I should explain this choice in the PEP. Basically, the idea is to > provide a best-effort portable function. > >> The fallback to time.time() is not a good idea, since then the programmer >> has to check whether the timer really provides the features she's after >> every time it gets used. > > Nope, time.get_clock_info('steady') does not change at runtime. So it > can only be checked once. With "every time" I meant: in every application you use the function. That pretty much spoils the idea of a best effort portable function. It's better to use a try-except to test for availability of functions than to have to (remember to) call a separate function to find out the characteristics of the best effort approach. >> Instead of trying to tweak all the different clocks and timers into >> a single function, wouldn't it be better to expose each kind as a >> different function and then let the programmer decide which fits >> best ?! > > This is a completly different approach. It should be discussed on > python-dev, not in the bug tracker please. I think that Python can > help the developer to write portable code by providing high-level > functions because clock properties are well known (e.g. see > time.get_clock_info). Fair enough. BTW: Are aware of the existing systimes.py module in pybench, which already provides interfaces to high resolution timers usable for benchmarking in a portable way ? Perhaps worth mentioning in the PEP. -- ___ Python tracker <http://bugs.python.org/issue14428> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14428] Implementation of the PEP 418
Marc-Andre Lemburg added the comment: Hi Victor, I think you need to reconsider the time.steady() name you're using in the PEP. For practical purposes, it's better to call it time.monotonic() and only make the function available if the OS provides a monotonic clock. The fallback to time.time() is not a good idea, since then the programmer has to check whether the timer really provides the features she's after every time it gets used. Regardless of this functional problem, I'm also not sure what you want to imply by the term "steady". A steady beat would mean that the timer never stops and keeps a constant pace, but that's not the case for the timers you're using to implement time.steady(). If you're after a mathematical term, "continuous" would be a better term, but again, time.time() is not always continuous. Instead of trying to tweak all the different clocks and timers into a single function, wouldn't it be better to expose each kind as a different function and then let the programmer decide which fits best ?! BTW: Thanks for the research you've done on the different clocks and timers. That's very useful information. Thanks, -- Marc-Andre Lemburg eGenix.com 2012-04-03: Python Meeting Duesseldorf today ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue14428> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13608] remove born-deprecated PyUnicode_AsUnicodeAndSize
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > The Py_UNICODE* type is deprecated but since Python 3.3, Py_UNICODE=wchar_t > and wchar_t* is a common type on Windows. PyUnicode_AsUnicodeAndSize() is > used to encode Python strings to call Windows functions. > > PyUnicode_AsUnicodeAndSize() is preferred over PyUnicode_AsWideCharString() > because PyUnicode_AsWideCharString() stores the result in the Unicode string > and the Unicode string releases the memory automatically later. Calling > PyUnicode_AsWideCharString() twice on the same string avoids also the need of > encoding the string twice because the result is cached. > > I proposed to add a new function using wchar_*t and storing the result in the > Unicode string, but the idea was rejected. I don't remember why. Could you please clarify what you actually intend to do ? Which function do you want to remove and why ? The title and description of this ticket don't match :-) -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue13608> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14397] Use GetTickCount/GetTickCount64 instead of QueryPerformanceCounter for monotonic clock
Marc-Andre Lemburg added the comment: Yury Selivanov wrote: > > Yury Selivanov added the comment: > >> A monotonic clock is not suitable for measuring durations, as it may still >> jump forward. A steady clock will not. > > Well, Victor's implementation of 'steady()' is just a tiny wrapper, which > uses 'monotonic()' or 'time()' if the former is not available. Hence > 'steady()' is a misleading name. Agreed. I think time.monotonic() is a better name. -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue14397> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14309] Deprecate time.clock()
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> There's no other single function providing the same functionality > > time.clock() is not portable: it is a different clock depending on the OS. To > write portable code, you have to use the right function: > > - time.time() > - time.steady() > - os.times(), resource.getrusage() time.clock() does exactly what the docs say: you get access to a CPU timer. It's normal that CPU timers work differently on different OSes. > On Windows, time.clock() should be replaced by time.steady(). What for ? time.clock() uses the same timer as time.steady() on Windows, AFAICT, so all you change is the name of the function. > On UNIX, time.clock() can be replaced with "usage=os.times(); > usage[0]+usage[1]" for example. And what's the advantage of that over using time.clock() directly ? -- ___ Python tracker <http://bugs.python.org/issue14309> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14309] Deprecate time.clock()
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> time.clock() has been in use for ages in many many scripts. >> We don't want to carelessly break all those. > > I don't want to remove the function, just mark it as deprecated to > avoid confusion. It will only be removed from the next major Python. Why ? There's no other single function providing the same functionality, so it's not even a candidate for deprecation. Similar functionality is available via several different functions, but that's true for a lot functions in th stdlib. -- ___ Python tracker <http://bugs.python.org/issue14309> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14309] Deprecate time.clock()
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > New submission from STINNER Victor : > > Python 3.3 has 3 functions to get time: > > - time.clock() > - time.steady() > - time.time() > > Antoine Pitrou suggested to deprecated time.clock() in msg120149 (issue > #10278). > > "The problem is time.clock(), since it does two wildly different things > depending on the OS. I would suggest to deprecate time.clock() at the same > time as we add time.wallclock(). For the Unix-specific definition of > time.clock(), there is already os.times() (which gives even richer > information)." > > (time.wallclock was the old name of time.steady) Strong -1 on this idea. time.clock() has been in use for ages in many many scripts. We don't want to carelessly break all those. -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue14309> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7652] Merge C version of decimal into py3k.
Marc-Andre Lemburg added the comment: Does the C version have a C API importable as capsule ? If not, could you add one and a decimal.h to go with it ? This makes integration in 3rd party modules a lot easier. Thanks, -- Marc-Andre Lemburg eGenix.com 2012-02-13: Released eGenix pyOpenSSL 0.13http://egenix.com/go26 2012-02-09: Released mxODBC.Zope.DA 2.0.2 http://egenix.com/go25 ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- nosy: +lemburg ___ Python tracker <http://bugs.python.org/issue7652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> Question: Should sys.flags.hash_randomization be True (1) when >> PYTHONHASHSEED=0? It is now. >> >> Saying yes "working as intended" is fine by me. > > It is documented that PYTHONHASHSEED=0 disables the randomization, so > sys.flags.hash_randomization must be False (0). PYTHONHASHSEED=1 will disable randomization as well :-) Only setting PYTHONHASHSEED=random actually enables randomization. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Gregory P. Smith wrote: > > Gregory P. Smith added the comment: > > Question: Should sys.flags.hash_randomization be True (1) when > PYTHONHASHSEED=0? It is now. The flag should probably be removed - simply because the env var is not a flag, it's a configuration parameter. Exposing the seed value as sys.hashseed would be better and more useful to applications. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Dave Malcolm wrote: > [new patch] Please change how the env vars work as discussed earlier on this ticket. Quick summary: We only need one env var for the randomization logic: PYTHONHASHSEED. If not set, 0 is used as seed. If set to a number, a fixed seed is used. If set to "random", a random seed is generated at interpreter startup. Same for the -R cmd line option. Thanks, -- Marc-Andre Lemburg eGenix.com ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Dave Malcolm wrote: > > If anyone is aware of an attack via numeric hashing that's actually > possible, please let me know (privately). I believe only specific apps > could be affected, and I'm not aware of any such specific apps. I'm not sure what you'd like to see. Any application reading user provided data from a file, database, web, etc. is vulnerable to the attack, if it uses the read numeric data as keys in a dictionary. The most common use case for this is a dictionary mapping codes or IDs to strings or objects, e.g. for caching purposes, to find a list of unique IDs, checking for duplicates, etc. This also works indirectly on 32-bit platforms, e.g. via date/time or IP address values that get converted to key integers. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Alex Gaynor wrote: > There's no need to cover any container types, because if their constituent > types are securely hashable then they will be as well. And of course if > the constituent types are unsecure then they're directly vulnerable. I wouldn't necessarily take that for granted: since container types usually calculate their hash based on the hashes of their elements, it's possible that a clever combination of elements could lead to a neutralization of the the hash seed used by the elements, thereby reenabling the original attack on the unprotected interpreter. Still, because we have far more vulnerable hashable types out there, trying to find such an attack doesn't really make practical sense, so protecting containers is indeed not as urgent :-) -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Alex Gaynor wrote: > Can't randomization just be applied to integers as well? A simple seed xor'ed with the hash won't work, since the attacks I posted will continue to work (just colliding on a different hash value). Using a more elaborate hash algorithm would slow down uses of numbers as dictionary keys and also be difficult to implement for non-integer types such as float, longs and complex numbers. The reason is that Python applications expect x == y => hash(x) == hash(y), e.g. hash(3) == hash(3L) == hash(3.0) == hash(3+0j). AFAIK, the randomization patch also doesn't cover tuples, which are rather common as dictionary keys as well, nor any of the other more esoteric Python built-in hashable data types (e.g. frozenset) or hashable data types defined by 3rd party extensions or applications (simply because it can't). -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Gregory P. Smith wrote: > > Gregory P. Smith added the comment: > >> >>> The release managers have pronounced: >>> http://mail.python.org/pipermail/python-dev/2012-January/115892.html >>> Quoting that email: >>>> 1. Simple hash randomization is the way to go. We think this has the >>>> best chance of actually fixing the problem while being fairly >>>> straightforward such that we're comfortable putting it in a stable >>>> release. >>>> 2. It will be off by default in stable releases and enabled by an >>>> envar at runtime. This will prevent code breakage from dictionary >>>> order changing as well as people depending on the hash stability. >> >> Right, but that doesn't contradict what I wrote about adding >> env vars to fix a seed and optionally enable using a random >> seed, or adding collision counting as extra protection for >> cases that are not addressed by the hash seeding, such as >> e.g. collisions caused by 3rd types or numbers. > > We won't be back-porting anything more than the hash randomization for > 2.6/2.7/3.1/3.2 but we are free to do more in 3.3 if someone can > demonstrate it working well and a need for it. > > For me, things like collision counting and tree based collision > buckets when the types are all the same and known comparable make > sense but are really sounding like a lot of additional complexity. I'd > *like* to see active black-box design attack code produced that goes > after something like a wsgi web app written in Python with hash > randomization *enabled* to demonstrate the need before we accept > additional protections like this for 3.3+. I posted several examples for the integer collision attack on this ticket. The current randomization patch does not address this at all, the collision counting patch does, which is why I think both are needed. Note that my comment was more about the desire to *not* recommend using random hash seeds per default, but instead advocate using a random but fixed seed, or at least document that using random seeds that are set during interpreter startup will cause problems with repeatability of application runs. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >>> Right, but that doesn't contradict what I wrote about adding >>> env vars to fix a seed and optionally enable using a random >>> seed, or adding collision counting as extra protection for >>> cases that are not addressed by the hash seeding, such as >>> e.g. collisions caused by 3rd types or numbers. >> >> ... at least I hope not :-) > > I think the env var part is a good idea (except that -1 as a magic value > to enable randomization isn't great). Agreed. Since it's an env var, using "random" would be a better choice. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Marc-Andre Lemburg wrote: > Dave Malcolm wrote: >> The release managers have pronounced: >> http://mail.python.org/pipermail/python-dev/2012-January/115892.html >> Quoting that email: >>> 1. Simple hash randomization is the way to go. We think this has the >>> best chance of actually fixing the problem while being fairly >>> straightforward such that we're comfortable putting it in a stable >>> release. >>> 2. It will be off by default in stable releases and enabled by an >>> envar at runtime. This will prevent code breakage from dictionary >>> order changing as well as people depending on the hash stability. > > Right, but that doesn't contradict what I wrote about adding > env vars to fix a seed and optionally enable using a random > seed, or adding collision counting as extra protection for > cases that are not addressed by the hash seeding, such as > e.g. collisions caused by 3rd types or numbers. ... at least I hope not :-) -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Dave Malcolm wrote: > >>> So the overhead in startup time is not an issue? >> >> It is an issue. Not only in terms of startup time, but also >... >> because randomization per default makes Python behave in >> non-deterministc ways - which is not what you want from a >> programming language or interpreter (unless you explicitly >> tell it to behave like that). > > The release managers have pronounced: > http://mail.python.org/pipermail/python-dev/2012-January/115892.html > Quoting that email: >> 1. Simple hash randomization is the way to go. We think this has the >> best chance of actually fixing the problem while being fairly >> straightforward such that we're comfortable putting it in a stable >> release. >> 2. It will be off by default in stable releases and enabled by an >> envar at runtime. This will prevent code breakage from dictionary >> order changing as well as people depending on the hash stability. Right, but that doesn't contradict what I wrote about adding env vars to fix a seed and optionally enable using a random seed, or adding collision counting as extra protection for cases that are not addressed by the hash seeding, such as e.g. collisions caused by 3rd types or numbers. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Jim Jewett wrote: > >> BTW: If you set the limit N to e.g. 100 (which is reasonable given >> Victor's and my tests), > > Agreed. Frankly, I think 5 would be more than reasonable so long as > there is a fallback. > >> the time it takes to process one of those >> sets only takes 0.3 ms on my machine. That's hardly usable as basis >> for an effective DoS attack. > > So it would take around 3Mb to cause a minute's delay... I'm not sure how you calculated that number. Here's what I get: tale a dictionary with 100 integer collisions: d = dict((x*(2**64 - 1), 1) for x in xrange(1, 100)) The repr(d) has 2713 bytes, which is a good approximation of how much (string) data you have to send in order to trigger the problem case. If you can create distinct integer sequences, you'll get a processing time of about 1 second on my slow dev machine. The resulting dict will likely have a repr() of around 60**2713 = 517MB. So you need to send 517MB to cause my slow dev machine to consume 1 minute of CPU time. Today's servers are at least 10 times as fast as my aging machine. If you then take into account that the integer collision dictionary is a very efficient collision example (size vs. effect), the attack doesn't really sound practical anymore. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Jim Jewett wrote: > > Jim Jewett added the comment: > > On Mon, Feb 6, 2012 at 8:12 AM, Marc-Andre Lemburg > wrote: >> >> Marc-Andre Lemburg added the comment: >> >> Antoine Pitrou wrote: >>> >>> The simple collision counting approach leaves a gaping hole open, as >>> demonstrated by Frank. > >> Could you elaborate on this ? > >> Note that I've updated the collision counting patch to cover both >> possible attack cases I mentioned in >> http://bugs.python.org/issue13703#msg150724. >> If there's another case I'm unaware of, please let me know. > > The problematic case is, roughly, > > (1) Find out what N will trigger collision-counting countermeasures. > (2) Insert N-1 colliding entries, to make it as slow as possible. > (3) Keep looking up (or updating) the N-1th entry, so that the > slow-as-possible-without-countermeasures path keeps getting rerun. Since N is constant, I don't see how such an "attack" could be used to trigger the O(n^2) worst-case behavior. Even if you can create n sets of entries that each fill up N-1 positions, the overall performance will still be O(n*N*(N-1)/2) = O(n). So in the end, we're talking about a regular brute force DoS attack, which requires different measures than dictionary implementation tricks :-) BTW: If you set the limit N to e.g. 100 (which is reasonable given Victor's and my tests), the time it takes to process one of those sets only takes 0.3 ms on my machine. That's hardly usable as basis for an effective DoS attack. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > The simple collision counting approach leaves a gaping hole open, as > demonstrated by Frank. Could you elaborate on this ? Note that I've updated the collision counting patch to cover both possible attack cases I mentioned in http://bugs.python.org/issue13703#msg150724. If there's another case I'm unaware of, please let me know. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> In a security fix release, we shouldn't change the linkage procedures, >> so I recommend that the LoadLibrary dance remains. > > So the overhead in startup time is not an issue? It is an issue. Not only in terms of startup time, but also because randomization per default makes Python behave in non-deterministc ways - which is not what you want from a programming language or interpreter (unless you explicitly tell it to behave like that). I think it would be much better to just let the user define a hash seed using environment variables for Python to use and then forget about how this variable value is determined. If it's not set, Python uses 0 as seed, thereby disabling the seeding logic. This approach would have Python behave in a deterministic way per default and still allow users who wish to use a different seed, set this to a different value - even on a case by case basis. If you absolutely want to add a feature to have the seed set randomly, you could make a seed value of -1 trigger the use of a random number source as seed. I also still firmly believe that the collision counting scheme should be made available via an environment variable as well. The user could then set the variable to e.g. 1000 to have it enabled with limit 1000, or leave it undefined to disable the collision counting. With those two tools, users could then choose the method they find most attractive for their purposes. By default, they would be disabled, but applications which are exposed to untrusted user data and use dictionaries for managing such data could check whether the protections are enabled and trigger a startup error if needed. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: > To see the collision counting, enable the DEBUG_DICT_COLLISIONS > macro variable. Running (part of (*)) the test suite with debugging enabled on a 64-bit machine shows that slot collisions are much more frequent than hash collisions, which only account for less than 0.01% of all collisions. It also shows that slot collisions in the low 1-10 range are most frequent, with very few instances of a dict lookup reaching 20 slot collisions (less than 0.0002% of all collisions). The great number of cases with 1 or 2 slot collisions surprised me. It seems that there's potential for improvement of the perturbation formula left. Due to the large number of 1 or 2 slot collisions, the patch is going to cause a minor hit to dict lookup performance. It may make sense to unroll the slot search loop and only start counting after the third round of misses. (*) I stopped the run after several hours run-time, producing some 148GB log data. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: > I've also added a test script which demonstrates both types of > collisions using integer objects (since it's trivial to calculate > their hashes). I forgot to mention: the test script is for 64-bit platforms. It's easy to adapt it to 32-bit if needed. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Here's a version of the collision counting patch that takes both hash and slot collisions into account. I've also added a test script which demonstrates both types of collisions using integer objects (since it's trivial to calculate their hashes). To see the collision counting, enable the DEBUG_DICT_COLLISIONS macro variable. -- Added file: http://bugs.python.org/file24299/hash-attack-3.patch Added file: http://bugs.python.org/file24300/integercollision.py ___ Python tracker <http://bugs.python.org/issue13703> ___Index: Objects/dictobject.c === --- Objects/dictobject.c(revision 88933) +++ Objects/dictobject.c(working copy) @@ -9,7 +9,13 @@ #include "Python.h" +/* Maximum number of allowed collisions. */ +#define Py_MAX_DICT_HASH_COLLISIONS 1000 +#define Py_MAX_DICT_SLOT_COLLISIONS 1000 +/* Debug collision detection */ +#define DEBUG_DICT_COLLISIONS 0 + /* Set a key error with the specified argument, wrapping it in a * tuple automatically so that tuple keys are not unpacked as the * exception arguments. */ @@ -327,6 +333,7 @@ register PyDictEntry *ep; register int cmp; PyObject *startkey; +size_t hash_collisions, slot_collisions; i = (size_t)hash & mask; ep = &ep0[i]; @@ -361,6 +368,8 @@ /* In the loop, me_key == dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ +hash_collisions = 1; +slot_collisions = 1; for (perturb = hash; ; perturb >>= PERTURB_SHIFT) { i = (i << 2) + i + perturb + 1; ep = &ep0[i & mask]; @@ -387,9 +396,27 @@ */ return lookdict(mp, key, hash); } + #if DEBUG_DICT_COLLISIONS + printf("hash collisions = %zu (i=%zu)\n", hash_collisions, i); + #endif + if (++hash_collisions > Py_MAX_DICT_HASH_COLLISIONS) { + PyErr_SetString(PyExc_KeyError, + "too many hash collisions"); + return NULL; + } } -else if (ep->me_key == dummy && freeslot == NULL) -freeslot = ep; +else { + if (ep->me_key == dummy && freeslot == NULL) + freeslot = ep; + #if DEBUG_DICT_COLLISIONS + printf("slot collisions = %zu (i=%zu)\n", slot_collisions, i); + #endif + if (++slot_collisions > Py_MAX_DICT_SLOT_COLLISIONS) { + PyErr_SetString(PyExc_KeyError, + "too many slot collisions"); + return NULL; + } + } } assert(0); /* NOT REACHED */ return 0; @@ -413,6 +440,7 @@ register size_t mask = (size_t)mp->ma_mask; PyDictEntry *ep0 = mp->ma_table; register PyDictEntry *ep; +size_t hash_collisions, slot_collisions; /* Make sure this function doesn't have to handle non-string keys, including subclasses of str; e.g., one reason to subclass @@ -439,18 +467,39 @@ /* In the loop, me_key == dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ +hash_collisions = 1; +slot_collisions = 1; for (perturb = hash; ; perturb >>= PERTURB_SHIFT) { i = (i << 2) + i + perturb + 1; ep = &ep0[i & mask]; if (ep->me_key == NULL) return freeslot == NULL ? ep : freeslot; -if (ep->me_key == key -|| (ep->me_hash == hash -&& ep->me_key != dummy -&& _PyString_Eq(ep->me_key, key))) +if (ep->me_key == key) return ep; -if (ep->me_key == dummy && freeslot == NULL) -freeslot = ep; +if (ep->me_hash == hash && ep->me_key != dummy) { + if (_PyString_Eq(ep->me_key, key)) + return ep; + #if DEBUG_DICT_COLLISIONS + printf("hash collisions = %zu (i=%zu)\n", hash_collisions, i); + #endif + if (++hash_collisions > Py_MAX_DICT_HASH_COLLISIONS) { + PyErr_SetString(PyExc_KeyError, + "too many hash collisions"); + return NULL; + } + } +else { + if (ep->me_key == dummy && freeslot == NULL) + freeslot = ep; + #if DEBUG_DICT_COLLISIONS + printf("slot collisions = %zu (i=%zu)\n", slot_collisions, i); + #endif + if (++slot_collisions > Py_MAX_DICT_SLOT_COLLISIONS) { + PyErr_SetString(PyExc_KeyError, +
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Alex Gaynor wrote: > I'm able to put N pieces of data into the database on successive requests, > but then *rendering* that data puts it in a dictionary, which renders that > page unviewable by anyone. I think you're asking a bit much here :-) A broken app is a broken app, no matter how nice Python tries to work around it. If an app puts too much trust into user data, it will be vulnerable one way or another and regardless of how the user data enters the app. These are the collision counting possibilities we've discussed so far: With an collision counting exception you'd get a clear notice that something in your data and your application is wrong and needs fixing. The rest of your web app will continue to work fine and you won't run into a DoS problem taking down all of your web server. With the proposed enhancement of collision counting + universal hash function for Python 3.3, you'd get a warning printed to the logs, the dict implementation would self-heal and your page is viewable nonetheless. The admin would then see the log entry and get a chance to fix the problem. Note: Even if Python works around the problem successfully, there's no guarantee that the data doesn't end up being processed by some other tool in the chain with similar problems. All this is a work-around for an application bug, nothing more. Silencing the problem by e.g. using randomization in the string hash algorithm doesn't really help in identifying the bug. Overall, I don't think we should make Python's hash function non-deterministic. Even with the universal hash function idea, the dict implementation should use a predefined way of determining the next hash parameter to use, so that running the application twice against attack data will still result in the same data output. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Dave Malcolm wrote: > > Dave Malcolm added the comment: > > On Fri, 2012-01-06 at 12:52 +, Marc-Andre Lemburg wrote: >> Marc-Andre Lemburg added the comment: >> >> Demo patch implementing the collision limit idea for Python 2.7. >> >> -- >> Added file: http://bugs.python.org/file24151/hash-attack.patch >> > > Marc: is this the latest version of your patch? Yes. As mentioned in the above message, it's just a demo of how the collision limit idea can be implemented. > Whether or not we go with collision counting and/or adding a random salt > to hashes and/or something else, I've had a go at updating your patch > > Although debate on python-dev seems to have turned against the > collision-counting idea, based on flaws reported by Frank Sievertsen > http://mail.python.org/pipermail/python-dev/2012-January/115726.html > it seemed to me to be worth at least adding some test cases to flesh out > the approach. Note that the test cases deliberately avoid containing > "hostile" data. Martin's example is really just a red herring: it doesn't matter where the hostile data originates or how it gets into the application. There are many ways an attacker can get the O(n^2) worst case timing triggered. Frank's example is an attack on the second possible way to trigger the O(n^2) behavior. See msg150724 further above where I listed the two possibilities: """ An attack can be based on trying to find many objects with the same hash value, or trying to find many objects that, as they get inserted into a dictionary, very often cause collisions due to the collision resolution algorithm not finding a free slot. """ My demo patch only addresses the first variant. In order to cover the second variant as well, you'd have to count and limit the number of iterations in the perturb for-loop of the lookdict() functions where the hash value of the slot does not match the key's hash value. Note that the second variant is both a lot less likely to trigger (due to the dict getting resized on a regular basis) and the code involved a lot faster than the code for the first variant (which requires a costly object comparison), so the limit for the second variant would have to be somewhat higher than for the first. BTW: The collision counting patch chunk for the string dicts in my demo patch is wrong. I've attached a corrected version. In the original patch it was counting both collision variants with the same counter and limit. -- Added file: http://bugs.python.org/file24295/hash-attack-2.patch ___ Python tracker <http://bugs.python.org/issue13703> ___Index: Objects/dictobject.c === --- Objects/dictobject.c(revision 88933) +++ Objects/dictobject.c(working copy) @@ -9,6 +9,8 @@ #include "Python.h" +/* Maximum number of allowed hash collisions. */ +#define Py_MAX_DICT_COLLISIONS 1000 /* Set a key error with the specified argument, wrapping it in a * tuple automatically so that tuple keys are not unpacked as the @@ -327,6 +329,7 @@ register PyDictEntry *ep; register int cmp; PyObject *startkey; +size_t collisions; i = (size_t)hash & mask; ep = &ep0[i]; @@ -361,6 +364,7 @@ /* In the loop, me_key == dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ +collisions = 1; for (perturb = hash; ; perturb >>= PERTURB_SHIFT) { i = (i << 2) + i + perturb + 1; ep = &ep0[i & mask]; @@ -387,6 +391,11 @@ */ return lookdict(mp, key, hash); } + if (++collisions > Py_MAX_DICT_COLLISIONS) { + PyErr_SetString(PyExc_KeyError, + "too many hash collisions"); + return NULL; + } } else if (ep->me_key == dummy && freeslot == NULL) freeslot = ep; @@ -413,6 +422,7 @@ register size_t mask = (size_t)mp->ma_mask; PyDictEntry *ep0 = mp->ma_table; register PyDictEntry *ep; +size_t collisions; /* Make sure this function doesn't have to handle non-string keys, including subclasses of str; e.g., one reason to subclass @@ -439,17 +449,24 @@ /* In the loop, me_key == dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ +collisions = 1; for (perturb = hash; ; perturb >>= PERTURB_SHIFT) { i = (i << 2) + i + perturb + 1; ep = &ep0[i & mask]; if (ep->me_key == NULL) return freeslot == NULL ? ep : f
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Charles-François Natali wrote: > > Anyway, I still think that the hash randomization is the right way to > go, simply because it does solve the problem, whereas the collision > counting doesn't: Martin made a very good point on python-dev with his > database example. For completeness, I quote Martin here: """ The main issue with that approach is that it allows a new kind of attack. An attacker now needs to find 1000 colliding keys, and submit them one-by-one into a database. The limit will not trigger, as those are just database insertions. Now, if the applications also as a need to read the entire database table into a dictionary, that will suddenly break, and not for the attacker (which would be ok), but for the regular user of the application or the site administrator. So it may be that this approach actually simplifies the attack, making the cure worse than the disease. """ Martin is correct in that it is possible to trick an application into building some data pool which can then be used as indirect input for an attack. What I don't see is what's wrong with the application raising an exception in case it finds such data in an untrusted source (reading arbitrary amounts of user data from a database is just as dangerous as reading such data from any other source). The exception will tell the programmer to be more careful and patch the application not to read untrusted data without additional precautions. It will also tell the maintainer of the application that there was indeed an attack on the system which may need to be tracked down. Note that the collision counting demo patch is trivial - I just wanted to demonstrate how it works. As already mentioned, there's room for improvement: If Python objects were to provide an additional method for calculating a universal hash value (based on an integer input parameter), the dictionary in question could use this to rehash itself and avoid the attack. Think of this as "randomization when needed". (*) Since the dict would still detect the problem, it could also raise a warning to inform the maintainer of the application. So you get the best of both worlds and randomization would only kick in when it's really needed to keep the application running. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Frank Sievertsen wrote: > > Frank Sievertsen added the comment: > >> The suffix only introduces a constant change in all hash values >> output, so even if you don't know the suffix, you can still >> generate data sets with collisions by just having the prefix. > > That's true. But without the suffix, I can pretty easy and efficient guess > the prefix by just seeing the result of a few well-chosen and short > repr(dict(X)). I suppose that's harder with the suffix. Since the hash function is known, it doesn't make things much harder. Without suffix you just need hash('') to find out what the prefix is. With suffix, two values are enough. Say P is your prefix and S your suffix. Let's say you can get the hash values of A = hash('') and B = hash('\x00'). With Victor's hash function you have (IIRC): A = hash('') = P ^ (0<<7) ^ 0 ^ S = P ^ S B = hash('\x00') = ((P ^ (0<<7)) * 103) ^ 0 ^ 1 ^ S = (P * 103) ^ 1 ^ S Let X = A ^ B, then X = P ^ (P * 103) ^ 1 since S ^ S = 0 and 0 ^ Y = Y (for any Y), i.e. the suffix doesn't make any difference. For P < 50, you can then easily calculate P from X using: P = X // 102 (things obviously get tricky once overflow kicks in) Note that for number hashes the randomization doesn't work at all, since there's no length or feedback loop involved. With Victor's approach hash(0) would output the whole seed, but even if the seed is not known, creating an attack data set is trivial, since hash(x) = P ^ x ^ S. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: [Reposting, since roundup removed part of the Python output] M.-A. Lemburg wrote: > Note that the integer attack also applies to other number types > in Python: > > --> (hash(3), hash(3.0), hash(3+0j) > (3, 3, 3) > > See Tim's post I referenced earlier on for the reasons. Here's > a quick summary ;-) ... > > --> {3:1, 3.0:2, 3+0j:3} > {3: 3} -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > I tried the collision counting with a low number of collisions: > ... no false positives with a limit of 50 collisions ... Thanks for running those tests. Looks like a limit lower than 1000 would already do just fine. Some timings showing how long it would take to hit a limit: # 100 python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 100))" 100 loops, best of 3: 297 usec per loop # 250 python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 250))" 100 loops, best of 3: 1.46 msec per loop # 500 python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 500))" 100 loops, best of 3: 5.73 msec per loop # 750 python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 750))" 100 loops, best of 3: 12.7 msec per loop # 1000 python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 1000))" 100 loops, best of 3: 22.4 msec per loop These timings have to matched against the size of the payload needed to trigger those limits. In any case, the limit needs to be configurable like the hash seed in the randomization patch. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> Please note, that you'd have to extend the randomization to >> all other Python data types as well in order to reach the same level >> of security as the collision counting approach. > > You also have to extend the collision counting to sets, by the way. Indeed, but that's easy, since the set implementation derives from the dict implementation. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > ... > So I expect something similar in applications: no change in the > applications, but a lot of hacks/tricks in tests. Tests usually check output of an application given a certain input. If those fail with the randomization, then it's likely real-world application uses will show the same kinds of failures due to the application changing from deterministic to non-deterministic via the randomization. >> BTW: The patch still includes the unnecessary _Py_unicode_hash_secret.suffix >> which needlessly complicates the code and doesn't any additional >> protection against hash value collisions > > How does it complicate the code? It adds an extra XOR to hash(str) and > 4 or 8 bytes in memory, that's all. It is more difficult to compute > the secret from hash(str) output if there is a prefix *and* a suffix. > If there is only a prefix, knowning a single hash(str) value is just > enough to retrieve directly the secret. The suffix only introduces a constant change in all hash values output, so even if you don't know the suffix, you can still generate data sets with collisions by just having the prefix. >> I don't think it affects more than 0.01% of applications/users :) > > It would help to try a patched Python on a real world application like > Django to realize how much code is broken (or not) by a randomized > hash function. That would help for both approaches, indeed. Please note, that you'd have to extend the randomization to all other Python data types as well in order to reach the same level of security as the collision counting approach. As-is the randomization patch does not solve the integer key attack and even though parsers such as JSON and XML-RPC aren't directly affected, it is well possible that stringified integers such as IDs are converted back to integers later during processing, thereby triggering the attack. Note that the integer attack also applies to other number types in Python: (3, 3, 3) See Tim's post I referenced earlier on for the reasons. Here's a quick summary ;-) ... {3: 3} -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > Patch version 7: > - Make PyOS_URandom() private (renamed to _PyOS_URandom) > - os.urandom() releases the GIL for I/O operation for its implementation > reading /dev/urandom > - move _Py_unicode_hash_secret_t documentation into unicode_hash() > > I moved also fixes for tests in a separated patch: random_fix-tests.patch. Don't you think that the number of corrections you have to apply in order to get the tests working again shows how much impact such a change would have in real-world applications ? Perhaps we should start to think about a compromise: make both the collision counting and the hash seeding optional and let the user decide which option is best. BTW: The patch still includes the unnecessary _Py_unicode_hash_secret.suffix which needlessly complicates the code and doesn't any additional protection against hash value collisions. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Eric Snow wrote: > > Eric Snow added the comment: > >> The vulnerability is known since 2003 (Usenix 2003): read "Denial of >> Service via Algorithmic Complexity Attacks" by Scott A. Crosby and Dan >> S. Wallach. > > Crosby started a meaningful thread on python-dev at that time similar to the > current one: > > http://mail.python.org/pipermail/python-dev/2003-May/035874.html > > It includes a some good insight into the problem. Thanks for the pointer. Some interesting postings... Vulnerability of applications: http://mail.python.org/pipermail/python-dev/2003-May/035887.html Speed of hashing, portability and practical aspects: http://mail.python.org/pipermail/python-dev/2003-May/035902.html Changing the hash function: http://mail.python.org/pipermail/python-dev/2003-May/035911.html http://mail.python.org/pipermail/python-dev/2003-May/035915.html -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Frank Sievertsen wrote: > > I don't want my software to stop working because someone managed to enter > 1000 bad strings into it. Think of a software that handles names of customers > or filenames. We don't want it to break completely just because someone > entered a few clever names. Collision counting is just a simple way to trigger an action. As I mentioned in my proposal on this ticket, raising an exception is just one way to deal with the problem in case excessive collisions are found. A better way is to add a universal hash method, so that the dict can adapt to the data and modify the hash functions for just that dict (without breaking other dicts or changing the standard hash functions). Note that raising an exception doesn't completely break your software. It just signals a severe problem with the input data and a likely attack on your software. As such, it's no different than turning on DOS attack prevention in your router. In case you do get an exception, a web server will simply return a 500 error and continue working normally. For other applications, you may see a failure notice in your logs. If you're sure that there are no possible ways to attack the application using such data, then you can simply disable the feature to prevent such exceptions. > Randomization fixes most of these problems. See my list of issues with this approach (further up on this ticket). > However, it breaks the steadiness of hash(X) between two runs of the same > software. There's probably code out there that assumes that hash(X) always > returns the same value: database- or serialization-modules, for example. > > There might be good reasons to also have a steady hash-function available. > The broken code is hard to fix if no such a function is available at all. > Maybe it's possible to add a second steady hash-functions later again? This is one of the issues I mentioned. > For the moment I think the best way is to turn on randomization of hash() by > default, but having a way to turn it off. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> On my slow dev machine 1000 collisions run in around 22ms: >> >> python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, >> 1000))" >> 100 loops, best of 3: 22.4 msec per loop >> >> Using this for a DOS attack would be rather noisy, much unlike >> sending a single POST. > > Note that sending one POST is not enough, unless the attacker is content > with blocking *one* worker process for a couple of seconds or minutes > (which is a rather tiny attack if you ask me :-)). Also, you can combine > many dicts in a single JSON list, so that the 1000 limit isn't > overreached for any of the dicts. Right, but such an approach only scales linearly and doesn't exhibit the quadric nature of the collision resolution. The above with 1 items takes 5 seconds on my machine. The same with 10 items is still running after 16 minutes. > So in all cases the attacker would have to send many of these POST > requests in order to overwhelm the target machine. That's how DOS > attacks work AFAIK. Depends :-) Hiding a few tens of such requests in the input stream of a busy server is easy. Doing the same with thousands of requests is a lot harder. FWIW: The above dict string version just has some 263kB for the 10 case, 114kB if gzip compressed. >> Yes, which is why the patch should be disabled by default (using >> an env var) in dot-releases. It's probably also a good idea to >> make the limit configurable to adjust to ones needs. > > Agreed if it's disabled by default then it's not a problem, but then > Python is vulnerable by default... Yes, but at least the user has an option to switch on the added protection. We'd need some field data to come to a decision. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Mark Dickinson wrote: > > Mark Dickinson added the comment: > > [Antoine] >> Also, how about false positives? Having legitimate programs break >> because of legitimate data would be a disaster. > > This worries me, too. > > [MAL] >> Yes, which is why the patch should be disabled by default (using >> an env var) in dot-releases. > > Are you proposing having it enabled by default in Python 3.3? Possibly, yes. Depends on whether anyone comes up with a problem in the alpha, beta, RC release cycle. It would be great to have the universal hash method approach for Python 3.3. That way Python could self-heal itself in case it finds too many collisions. My guess is that it's still better to raise an exception, though, since it would uncover either attacks or programming errors. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> OTOH, the collision counting patch is very simple, doesn't have >> the performance issues and provides real protection against the >> attack. > > I don't know about real protection: you can still slow down dict > construction by 1000x (the number of allowed collisions per lookup), > which can be enough combined with a brute-force DOS. On my slow dev machine 1000 collisions run in around 22ms: python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 1000))" 100 loops, best of 3: 22.4 msec per loop Using this for a DOS attack would be rather noisy, much unlike sending a single POST. Note that the choice of 1000 as limit is rather arbitrary. I just chose it because it's high enough because it's very unlikely to be hit by an application that is not written to trigger it and it's low enough to still provide a good run-time behavior. Perhaps an even lower figure would be better. > Also, how about false positives? Having legitimate programs break > because of legitimate data would be a disaster. Yes, which is why the patch should be disabled by default (using an env var) in dot-releases. It's probably also a good idea to make the limit configurable to adjust to ones needs. Still, it is *very* unlikely that you run into real data causing more than 1000 collisions for a single insert. For full protection the universal hash method idea would have to be implemented (adding a parameter to the hash methods, so that they can be parametrized). This would then allow switching the dict to an alternative hash implementation resolving the collision problem, in case the implementation detects high number of collisions. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Mark Shannon wrote: > > Mark Shannon added the comment: > >>>> * the method would need to be implemented for all hashable Python types >>> It was already discussed, and it was said that only hash(str) need to >>> be modified. >> >> Really ? What about the much simpler attack on integer hash values ? >> >> You only have to send a specially crafted JSON dictionary with integer >> keys to a Python web server providing JSON interfaces in order to >> trigger the integer hash attack. > > JSON objects are decoded as dicts with string keys, integers keys are > not possible. > > >>> json.loads(json.dumps({1:2})) > {'1': 2} Thanks for the correction. Looks like XML-RPC also doesn't accept integers as dict keys. That's good :-) However, as Paul already noted, such attacks can also occur in other places or parsers in an application, e.g. when decoding FORM parameters that use integers to signal a line or parameter position (example: value_1=2&value_2=3...) which are then converted into a dictionary mapping the position integer to the data. marshal and pickle are vulnerable, but then you normally don't expose those to untrusted data. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> * it is exceedingly complex > > Which part exactly? For hash(str), it just add two extra XOR. I'm not talking specifically about your patch, but the whole idea and the needed changes in general. >> * the method would need to be implemented for all hashable Python types > > It was already discussed, and it was said that only hash(str) need to > be modified. Really ? What about the much simpler attack on integer hash values ? You only have to send a specially crafted JSON dictionary with integer keys to a Python web server providing JSON interfaces in order to trigger the integer hash attack. The same goes for the other Python data types. >> * it causes startup time to increase (you need urandom data for >> every single hashable Python data type) > > My patch reads 8 or 16 bytes from /dev/urandom which doesn't block. Do > you have a benchmark showing a difference? > > I didn't try my patch on Windows yet. Your patch only implements the simple idea of adding an init vector and a fixed suffix vector (which you don't need since it doesn't prevent hash collisions). I don't think that's good enough, since it doesn't change how the hash algorithm works on the actual data, but instead just shifts the algorithm to a different sequence. If you apply the same logic to the integer hash function, you'll see that more clearly. Paul's algorithm is much more secure in this respect, but it requires more random startup data. >> * it causes run-time to increase due to changes in the hash >> algorithm (more operations in the tight loop) > > I posted a micro-benchmark on hash(str) on python-dev: the overhead is > nul. Did you have numbers showing that the overhead is not nul? For the simple solution, that's an expected result, but if you want more safety, then you'll see a hit due to the random data getting XOR'ed in every single loop. >> * causes different processes in a multi-process setup to use different >> hashes for the same object > > Correct. If you need to get the same hash, you can disable the > randomized hash (PYTHONHASHSEED=0) or use a fixed seed (e.g. > PYTHONHASHSEED=42). So you have the choice of being able to work in a multi-process environment and be vulnerable to the attack or not. I think we can do better :-) Note that web servers written in Python tend to be long running processes, so an attacker has lots of time to test various seeds. >> * doesn't appear to work well in embedded interpreters that >> regularly restarted interpreters (AFAIK, some objects persist across >> restarts and those will have wrong hash values in the newly started >> instances) > > test_capi runs _testembed which restarts a embedded interpreters 3 > times, and the test pass (with my patch version 5). Can you write a > script showing the problem if there is a real problem? > > In an older version of my patch, the hash secret was recreated at each > initiliazation. I changed my patch to only generate the secret once. Ok, that should fix the case. Two more issue that I forgot: * enabling randomized hashing can make debugging a lot harder, since it's rather difficult to reproduce the same state in a controlled way (unless you record the hash seed somewhere in the logs) and even though applications should not rely on the order of dict repr()s or str()s, they do often enough: * randomized hashing will result in repr() and str() of dictionaries to be random as well >> The most important issue, though, is that it doesn't really >> protect Python against the attack - it only makes it less >> likely that an adversary will find the init vector (or a way >> around having to find it via crypt analysis). > > I agree that the patch is not perfect. As written in the patch, it > just makes the attack more complex. I consider that it is enough. Wouldn't you rather see a fix that works for all hash functions and Python objects ? One that doesn't cause performance issues ? The collision counting idea has this potential. > Perl has a simpler protection than the one proposed in my patch. Is > Perl vulnerable to the hash collision vulnerability? I don't know what Perl did or how hashing works in Perl, so cannot comment on the effect of their fix. FWIW, I don't think that we should use Perl or Java as reference here. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > Patch version 5 fixes test_unicode for 64-bit system. Victor, I don't think the randomization idea is going anywhere. The code has many issues: * it is exceedingly complex * the method would need to be implemented for all hashable Python types * it causes startup time to increase (you need urandom data for every single hashable Python data type) * it causes run-time to increase due to changes in the hash algorithm (more operations in the tight loop) * causes different processes in a multi-process setup to use different hashes for the same object * doesn't appear to work well in embedded interpreters that regularly restarted interpreters (AFAIK, some objects persist across restarts and those will have wrong hash values in the newly started instances) The most important issue, though, is that it doesn't really protect Python against the attack - it only makes it less likely that an adversary will find the init vector (or a way around having to find it via crypt analysis). OTOH, the collision counting patch is very simple, doesn't have the performance issues and provides real protection against the attack. Even better still, it can detect programming errors in hash method implementations. IMO, it would be better to put efforts into refining the collision detection patch (perhaps adding support for the universal hash method slot I mentioned) and run some real life tests with it. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Marc-Andre Lemburg wrote: > > Marc-Andre Lemburg added the comment: > > Christian Heimes wrote: >> Marc-Andre: >> Have you profiled your suggestion? I'm interested in the speed implications. >> My gut feeling is that your idea could be slower, since you have added more >> instructions to a tight loop, that is execute on every lookup, insert, >> update and deletion of a dict key. The hash modification could have a >> smaller impact, since the hash is cached. I'm merely speculating here until >> we have some numbers to compare. > > I haven't done any profiling on this yet, but will run some > tests. I ran pybench and pystone: neither shows a significant change. I wish we had a simple to run benchmark based on Django to allow checking such changes against real world applications. Not that I expect different results from such a benchmark... To check the real world impact, I guess it would be best to run a few websites with the patch for a week and see whether the collision exception gets raised. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Christian Heimes wrote: > Marc-Andre: > Have you profiled your suggestion? I'm interested in the speed implications. > My gut feeling is that your idea could be slower, since you have added more > instructions to a tight loop, that is execute on every lookup, insert, update > and deletion of a dict key. The hash modification could have a smaller > impact, since the hash is cached. I'm merely speculating here until we have > some numbers to compare. I haven't done any profiling on this yet, but will run some tests. The lookup functions in the dict implementation are optimized to make the first non-collision case fast. The patch doesn't touch this loop. The only change is in the collision case, where an increment and comparison is added (and then only after the comparison which is the real cost factor in the loop). I did add a printf() to see how often this case occurs - it's a surprisingly rare case, which suggests that Tim, Christian and all the others that have invested considerable time into the implementation have done a really good job here. BTW: I noticed that a rather obvious optimization appears to be missing from the Python dict initialization code: when passing in a list of (key, value) pairs, the implementation doesn't make use of the available length information and still starts with an empty (small) dict table and then iterates over the pairs, increasing the table size as necessary. It would be better to start with a table that is presized to O(len(data)). The dict implementation already provides such a function, but it's not being used in the case dict(pair_list). Anyway, just an aside. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Tim Peters wrote: > > Tim Peters added the comment: > > [Marc-Andre] >> BTW: I wonder how long it's going to take before >> someone figures out that our merge sort based >> list.sort() is vulnerable as well... its worst- >> case performance is O(n log n), making attacks >> somewhat harder. > > I wouldn't worry about that, because nobody could stir up anguish > about it by writing a paper ;-) > > 1. O(n log n) is enormously more forgiving than O(n**2). > > 2. An attacker need not be clever at all: O(n log n) is not only > sort()'s worst case, it's also its _expected_ case when fed randomly > ordered data. > > 3. It's provable that no comparison-based sorting algorithm can have > better worst-case asymptotic behavior when fed randomly ordered data. > > So if anyone whines about this, tell 'em to go do something useful instead :-) Right on all accounts :-) -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Paul McMillan wrote: > >> I'll upload a patch that demonstrates the collisions counting >> strategy to show that detecting the problem is easy. Whether >> just raising an exception is a good idea, is another issue. > > I'm in cautious agreement that collision counting is a better > strategy. The dict implementation performance would suffer from > randomization. > >> The dict implementation could then alter the hash parameter >> and recreate the dict table in case the number of collisions >> exceeds a certain limit, thereby actively taking action >> instead of just relying on randomness solving the issue in >> most cases. > > This is clever. You basically neuter the attack as you notice it but > everything else is business as usual. I'm concerned that this may end > up being costly in some edge cases (e.g. look up how many collisions > it takes to force the recreation, and then aim for just that many > collisions many times). Unfortunately, each dict object has to > discover for itself that it's full of offending hashes. Another > approach would be to neuter the offending object by changing its hash, > but this would require either returning multiple values, or fixing up > existing dictionaries, neither of which seems feasible. I ran some experiments with the collision counting patch and could not trigger it in normal applications, not even in cases that are documented in the dict implementation to have a poor collision resolution behavior (integers with zeros the the low bits). The probability of having to deal with dictionaries that create over a thousand collisions for one of the key objects in a real life application appears to be very very low. Still, it may cause problems with existing applications for the Python dot releases, so it's probably safer to add it in a disabled-per-default form there (using an environment variable to adjust the setting). For 3.3 it could be enabled per default and it would also make sense to allow customizing the limit using a sys module setting. The idea with adding a parameter to the hash method/slot in order to have objects provide a hash family function instead of a fixed unparametrized hash function would probably have to be implemented as additional hash method, e.g. .__uhash__() and tp_uhash ("u" for universal). The builtin types should then grow such methods in order to make hashing safe against such attacks. For objects defined in 3rd party extensions, we would need to encourage implementing the slot/method as well. If it's not implemented, the dict implementation would have to fallback to raising an exception. Please note that I'm just sketching things here. I don't have time to work on a full-blown patch, just wanted to show what I meant with the collision counting idea and demonstrate that it actually works as intended. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Here's an example of hash-attack.patch finding an on-purpose programming error (hashing all objects to the same value): http://stackoverflow.com/questions/4865325/counting-collisions-in-a-python-dictionary (see the second example on the page for @Winston Ewert's solution) With the patch you get: Traceback (most recent call last): File "testcollisons.py", line 20, in d[o] = 1 KeyError: 'too many hash collisions' -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > hash-attack.patch does never decrement the collision counter. Why should it ? It's only used as local variable in the lookup function. Note that the limit only triggers on a per-key basis. It's not a limit on the total number of collisions in the table, so you don't need to keep the number of collisions stored on the object. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Stupid email interface again... here's the full text: The hash-attack.patch solves the problem for the integer case I posted earlier on and doesn't cause any problems with the test suite. >>> d = dict((x*(2**64 - 1), hash(x*(2**64 - 1))) for x in xrange(1, 100)) >>> d = dict((x*(2**64 - 1), hash(x*(2**64 - 1))) for x in xrange(1, 1000)) Traceback (most recent call last): File "", line 1, in KeyError: 'too many hash collisions' It also doesn't change the hashing or dict repr in existing applications. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: The hash-attack.patch solves the problem for the integer case I posted earlier on and doesn't cause any problems with the test suite. Traceback (most recent call last): File "", line 1, in KeyError: 'too many hash collisions' It also doesn't change the hashing or dict repr in existing applications. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Demo patch implementing the collision limit idea for Python 2.7. -- Added file: http://bugs.python.org/file24151/hash-attack.patch ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Before continuing down the road of adding randomness to hash functions, please have a good read of the existing dictionary implementation: """ Major subtleties ahead: Most hash schemes depend on having a "good" hash function, in the sense of simulating randomness. Python doesn't: its most important hash functions (for strings and ints) are very regular in common cases: [0, 1, 2, 3] >>> map(hash, ("namea", "nameb", "namec", "named")) [-1658398457, -1658398460, -1658398459, -1658398462] >>> This isn't necessarily bad! To the contrary, in a table of size 2**i, taking the low-order i bits as the initial table index is extremely fast, and there are no collisions at all for dicts indexed by a contiguous range of ints. The same is approximately true when keys are "consecutive" strings. So this gives better-than-random behavior in common cases, and that's very desirable. ... """ There's also a file called dictnotes.txt which has more interesting details about how the implementation is designed. Please note that the term "collision" is used in a slightly different way: it refers to trying to find an empty slot in the dictionary table. Having a collision implies that the hash values of two distinct objects are the same, but you also get collisions in case two distinct objects with different hash values get mapped to the same table entry. An attack can be based on trying to find many objects with the same hash value, or trying to find many objects that, as they get inserted into a dictionary, very often cause collisions due to the collision resolution algorithm not finding a free slot. In both cases, the (slow) object comparisons needed to find an empty slot is what makes the attack practical, if the application puts too much trust into large blobs of input data - which is the actual security issues we're trying to work around here... Given the dictionary implementation notes, I'm even less certain that the randomization change is a good idea. It will likely introduce a performance hit due to both the added complexity in calculating the hash as well as the reduced cache locality of the data in the dict table. I'll upload a patch that demonstrates the collisions counting strategy to show that detecting the problem is easy. Whether just raising an exception is a good idea, is another issue. It may be better to change the tp_hash slot in Python 3.3 to take an argument, so that the dict implementation can use the hash function as universal hash family function (see http://en.wikipedia.org/wiki/Universal_hash). The dict implementation could then alter the hash parameter and recreate the dict table in case the number of collisions exceeds a certain limit, thereby actively taking action instead of just relying on randomness solving the issue in most cases. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Marc-Andre Lemburg added the comment: Paul McMillan wrote: > > This is not something that can be fixed by limiting the size of POST/GET. > > Parsing documents (even offline) can generate these problems. I can create > books that calibre (a Python-based ebook format shifting tool) can't convert, > but are otherwise perfectly valid for non-python devices. If I'm allowed to > insert usernames into a database and you ever retrieve those in a dict, > you're vulnerable. If I can post things one at a time that eventually get > parsed into a dict (like the tag example), you're vulnerable. I can generate > web traffic that creates log files that are unparsable (even offline) in > Python if dicts are used anywhere. Any application that accepts data from > users needs to be considered. > > Even if the web framework has a dictionary implementation that randomizes the > hashes so it's not vulnerable, the entire python standard library uses dicts > all over the place. If this is a problem which must be fixed by the > framework, they must reinvent every standard library function they hope to > use. > > Any non-trivial python application which parses data needs the fix. The > entire standard library needs the fix if is to be relied upon by applications > which accept data. It makes sense to fix Python. Agreed: Limiting the size of POST requests only applies to *web* applications. Other applications will need other fixes. Trying to fix the problem in general by tweaking the hash function to (apparently) make it hard for an attacker to guess a good set of colliding strings/integers/etc. is not really a good solution. You'd only be making it harder for script kiddies, but as soon as someone crypt-analysis the used hash algorithm, you're lost again. You'd need to use crypto hash functions or universal hash functions if you want to achieve good security, but that's not an option for Python objects, since the hash functions need to be as fast as possible (which rules out crypto hash functions) and cannot easily drop the invariant "a=b => hash(a)=hash(b)" (which rules out universal hash functions, AFAICT). IMO, the strategy to simply cap the number of allowed collisions is a better way to achieve protection against this particular resource attack. The probability of having valid data reach such a limit is low and, if configurable, can be made 0. > Of course we must fix all the basic hashing functions in python, not just the > string hash. There aren't that many. ... not in Python itself, but if you consider all the types in Python extensions and classes implementing __hash__ in user code, the number of hash functions to fix quickly becomes unmanageable. > Marc-Andre: > If you look at my proposed code, you'll notice that we do more than simply > shift the period of the hash. It's not trivial for an attacker to create > colliding hash functions without knowing the key. Could you post it on the ticket ? BTW: I wonder how long it's going to take before someone figures out that our merge sort based list.sort() is vulnerable as well... its worst-case performance is O(n log n), making attacks somewhat harder. The popular quicksort which Python used for a long time has O(n²), making it much easier to attack, but fortunately, we replaced it with merge sort in Python 2.3, before anyone noticed ;-) -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com