[issue16112] platform.architecture does not correctly escape argument to /usr/bin/file

2012-10-04 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

Jesús Cea Avión wrote:
> 
> Jesús Cea Avión added the comment:
> 
> Thanks for the heads-up, Victor.
> 
> I have added Marc-Andre Lemburg to the nosy list, so he can know about this 
> issue and can provide feedback (or request a backout for 2.7).
> 
> Marc-Andre?.

The comment that Viktor posted still stands for Python 2.7.

You can use subprocess in platform for Python 2.7, but only if
it's available. Otherwise the module must fall back to the
portable popen() that comes with the platform module.

It may be worth adding that selection process to the popen()
function in platform itself.

For Python 3.x, you can use subprocess.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 05 2012)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2012-09-27: Released eGenix PyRun 1.1.0 ...   http://egenix.com/go35
2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34
2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go33
2012-10-23: Python Meeting Duesseldorf ... 18 days to go

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker 
<http://bugs.python.org/issue16112>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16047] Tools/freeze no longer works in Python 3

2012-09-25 Thread Marc-Andre Lemburg

New submission from Marc-Andre Lemburg:

The freeze tool used for compiling Python binaries with frozen modules no 
longer works with Python 3.x.

It looks like it was never updated to the various path and symbols changes 
introduced with PEP 3149 (ABI tags) in Python 3.2.

Even with lots of symlinks to restore the non-ABI flagged names, freezing fails 
with a linker error in Python 3.3:

Tools/freeze> python3 freeze.py hello.py
Tools/freeze> make
config.o:(.data+0x38): undefined reference to `PyInit__imp'
collect2: ld returned 1 exit status
make: *** [hello] Error 1

--
components: Demos and Tools
messages: 171295
nosy: lemburg
priority: normal
severity: normal
status: open
title: Tools/freeze no longer works in Python 3
versions: Python 3.2, Python 3.3, Python 3.4

___
Python tracker 
<http://bugs.python.org/issue16047>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16027] pkgutil doesn't support frozen modules

2012-09-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

Nick Coghlan wrote:
> 
> Nick Coghlan added the comment:
> 
> Can you confirm this problem still exists on 3.3? The pkgutil emulation isn't 
> used by runpy any more - with the migration to importlib, the interface that 
> runpy invokes fails outright if no loader is found rather than falling back 
> to the emulation (we only retained the emulation for backwards compatibility 
> - it's a public API, so others may be using it directly).

That's difficult to test, since the Tools/freeze/ tool no longer works
in Python 3.3. I'll open a separate issue for that.

> I have a feeling that there may still be a couple of checks which are 
> restricted to PY_SOURCE and PY_COMPILED that really should be allowing 
> PY_FROZEN as well.

Same here.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 25 2012)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2012-10-29: PyCon DE 2012, Leipzig, Germany ...34 days to go
2012-10-23: Python Meeting Duesseldorf ... 28 days to go
2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go31
2012-09-18: Released mxODBC Zope DA 2.1.0 ... http://egenix.com/go32

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker 
<http://bugs.python.org/issue16027>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16027] pkgutil doesn't support frozen modules

2012-09-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

Here's the fix we're applying in pyrun to make -m imports work at least for 
top-level modules:

--- /home/lemburg/orig/Python-2.7.3/Lib/pkgutil.py  2012-04-10 
01:07:30.0 +0200
+++ pkgutil.py  2012-09-24 22:53:30.982526065 +0200
@@ -273,10 +273,21 @@ class ImpLoader:
 def is_package(self, fullname):
 fullname = self._fix_name(fullname)
 return self.etc[2]==imp.PKG_DIRECTORY

 def get_code(self, fullname=None):
+if self.code is not None:
+return self.code
+fullname = self._fix_name(fullname)
+mod_type = self.etc[2]
+if mod_type == imp.PY_FROZEN:
+self.code = imp.get_frozen_object(fullname)
+return self.code
+else:
+return self._get_code(fullname)
+
+def _get_code(self, fullname=None):
 fullname = self._fix_name(fullname)
 if self.code is None:
 mod_type = self.etc[2]
 if mod_type==imp.PY_SOURCE:
 source = self.get_source(fullname)

This makes runpy work for top-level frozen modules, but it's really only 
partial solution, since pkgutil would need to get such support in more places.

We also found that for some reason, runpy/pkgutil does not work for frozen 
package imports, e.g. wsgiref.util. The reasons for this appear to be deeper 
than just in the pkgutil module. We don't have a solution for this yet. It is 
also not clear whether the problem still exists in Python 3.x. The __path__ 
attribute of frozen modules was changed in 3.0 to be a list like for all other 
modules, however, applying that change to 2.x lets runpy/pkgutil fail 
altogether (not even the above fix works anymore).

--

___
Python tracker 
<http://bugs.python.org/issue16027>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16027] pkgutil doesn't support frozen modules

2012-09-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

Correction: the helper function is called imp.get_frozen_object().

--

___
Python tracker 
<http://bugs.python.org/issue16027>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16027] pkgutil doesn't support frozen modules

2012-09-24 Thread Marc-Andre Lemburg

New submission from Marc-Andre Lemburg:

pkgutil is used by runpy to run Python modules that are loaded via the -m 
command line switch.

Unfortunately, this doesn't work for frozen modules, since pkgutil doesn't know 
how to load their code object (this can be had via imp.get_code_object() for 
frozen modules).

We found the problem while working on eGenix PyRun (see 
http://www.egenix.com/products/python/PyRun/) which uses frozen modules 
extensively. We currently only target Python 2.x, so will have work around the 
problem with a patch, but Python 3.x still has the same problem.

--
components: Library (Lib)
messages: 171163
nosy: lemburg
priority: normal
severity: normal
status: open
title: pkgutil doesn't support frozen modules
versions: Python 3.2, Python 3.3

___
Python tracker 
<http://bugs.python.org/issue16027>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15443] datetime module has no support for nanoseconds

2012-07-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

[Roundup's email interface again...]

>>>> x = 86400.0
>>>> x == x + 1e-9
> False
>>>> x == x + 1e-10
> False
>>>> x == x + 1e-11
> False
>>>> x == x + 1e-12
> True

--

___
Python tracker 
<http://bugs.python.org/issue15443>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15443] datetime module has no support for nanoseconds

2012-07-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky  added the comment:
> 
> On Wed, Jul 25, 2012 at 4:17 AM, Marc-Andre Lemburg  
> wrote:
>> ... full C double precision for the time part of a timestamp,
>> which covers nanoseconds just fine.
> 
> No, it does not:
> 
>>>> import time
>>>> t = time.time()
>>>> t + 5e-9 == t
> True
> 
> In fact, C double precision is barely enough to cover microseconds:
> 
>>>> t + 1e-6 == t
> False
> 
>>>> t + 1e-7 == t
> True

I was referring to the use of a C double to store the time part
in mxDateTime. mxDateTime uses the C double to store the number of
seconds since midnight, so you don't run into the Unix ticks value
range problem you showcased above.

--

___
Python tracker 
<http://bugs.python.org/issue15443>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15443] datetime module has no support for nanoseconds

2012-07-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Marc-Andre Lemburg wrote:
> 
>> Alexander Belopolsky  added the comment:
>>
>> On Wed, Jul 25, 2012 at 4:17 AM, Marc-Andre Lemburg  
>> wrote:
>>> ... full C double precision for the time part of a timestamp,
>>> which covers nanoseconds just fine.
>>
>> No, it does not:
>>
>>>>> import time
>>>>> t = time.time()
>>>>> t + 5e-9 == t
>> True
>>
>> In fact, C double precision is barely enough to cover microseconds:
>>
>>>>> t + 1e-6 == t
>> False
>>
>>>>> t + 1e-7 == t
>> True
> 
> I was referring to the use of a C double to store the time part
> in mxDateTime. mxDateTime uses the C double to store the number of
> seconds since midnight, so you don't run into the Unix ticks value
> range problem you showcased above.

There's enough room to even store 1/100th of a nanosecond, which may
be needed for some physics experiments :-)

False
>>> x == x + 1e-10
False
>>> x == x + 1e-11
False
>>> x == x + 1e-12
True

--

___
Python tracker 
<http://bugs.python.org/issue15443>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15444] Incorrectly written contributor's names

2012-07-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Thank you for taking the initiative. Regarding use of UTF-8 for text files:

I think we ought to acknowledge that UTF-8 has become the defacto standard
for non-ASCII text files by now and with Python 3 being all Unicode, it
feels silly not make use of it in Python source files.

Regarding my name: I have no issue with the apostrophe missing on the e.
I've long given up using it in source code or emails :-)

--

___
Python tracker 
<http://bugs.python.org/issue15444>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15443] datetime module has no support for nanoseconds

2012-07-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Vincenzo Ampolo wrote:
> 
> Vincenzo Ampolo  added the comment:
> 
> This is a real use case I'm working with that needs nanosecond precision
> and lead me in submitting this request:
> 
> most OSes let users capture network packets (using tools like tcpdump or
> wireshark) and store them using file formats like pcap or pcap-ng. These
> formats include a timestamp for each of the captured packets, and this
> timestamp usually has nanosecond precision. The reason is that on
> gigabit and 10 gigabit networks the frame rate is so high that
> microsecond precision is not enough to tell two frames apart.
> pcap (and now pcap-ng) are extremely popular file formats, with millions
> of files stored around the world. Support for nanoseconds in datetime
> would make it possible to properly parse these files inside python to
> compute precise statistics, for example network delays or round trip times.
> 
> Other case is in stock markets. In that field information is timed in
> nanoseconds and have the ability to easily deal with this kind of
> representation natively with datetime can make the standard module even
> more powerful.
> 
> The company I work for is in the data networking field, and we use
> python extensively. Currently we rely on custom code to process
> timestamps, a nanosecond datetime would let us avoit that and use
> standard python datetime module.

Thanks for the two use cases.

You might want to look at mxDateTime and use that for your timestamps.
It does provide full C double precision for the time part of a timestamp,
which covers nanoseconds just fine.

--

___
Python tracker 
<http://bugs.python.org/issue15443>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15443] datetime module has no support for nanoseconds

2012-07-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Vincenzo Ampolo wrote:
> 
> As long as computers evolve time management becomes more precise and more 
> granular.
> Unfortunately the standard datetime module is not able to deal with 
> nanoseconds even if OSes are able to. For example if i do:
> 
> print "%.9f" % time.time()
> 1343158163.471209049
> 
> I've actual timestamp from the epoch with nanosecond granularity.
> 
> Thus support for nanoseconds in datetime would really be appreciated

I would be interested in an actual use case for this.

--

___
Python tracker 
<http://bugs.python.org/issue15443>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15369] pybench and test.pystone poorly documented

2012-07-17 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Brett Cannon wrote:
> 
> Brett Cannon  added the comment:
> 
> I disagree. They are outdated benchmarks and probably should either be 
> removed or left undocumented. Proper testing of performance is with the 
> Unladen Swallow benchmarks.

I disagree with your statement. Just like every benchmark, they serve
their purpose in their particular field of use, e.g. pybench may not
be useful for the JIT approach originally taken by the Unladden Swallow
project, but it's still useful to test/check changes in the non-JIT
CPython interpreter and it's extensible to take new developments
into account. pystone is useful to get a quick feel the performance
of Python on a machine.

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue15369>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1294959] Problems with /usr/lib64 builds.

2012-05-15 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Éric Araujo wrote:
> 
> Éric Araujo  added the comment:
> 
> On Mar 29, 2011, at 10:12 PM, Matthias Klose wrote:
>> no, it looks for headers and libraries in more directories.  But really, this
>> whole testing for paths is wrong. Just use the compiler to search for headers
>> and libraries, no need to check these on your own.
> 
> Do all compilers provide this info, including Windows ones?  If so, that 
> would be a nice feature for distutils2.

This only works for a handful of system library paths, not the extra
ones that you may need to search for local installations of
libraries and which you have to inform the compiler about :-)

Many gcc installations, for example, don't include the /usr/local
or /opt/local dir trees in the search. On Windows, you have to
run the correct vc*.bat files to have the paths setup and optional
software rarely adds the correct paths to LIB and INCLUDE.

The compiler also won't help with the problem Sean originally
pointed to: building software on systems that can run both
32-bit and 64-bit and finding the right set of libs to
link at.

Another problem is finding the paths to the right version of a
library (both include files and corresponding libraries).

While it would be great to have a system tool take care of setting
things up correctly, I don't know of any such tool, so searching
paths and inspecting files using REs appears to be the only way
to build a general purpose detection scheme.

mxSetup.py (included in egenix-mx-base) uses such a scheme, distutils
has one too.

--

___
Python tracker 
<http://bugs.python.org/issue1294959>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14572] 2.7.3: sqlite module does not build on centos 5 and Mac OS X 10.4

2012-05-04 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Mac OS X 10.4 is also affected and for the same reason. SQLite builds fine for 
Python 2.5 and 2.6, but not for 2.7.

--
nosy: +lemburg
title: 2.7.3: sqlite module does not build on centos 5 -> 2.7.3: sqlite module 
does not build on centos 5 and Mac OS X 10.4

___
Python tracker 
<http://bugs.python.org/issue14572>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Marc-Andre Lemburg wrote:
> 
> Marc-Andre Lemburg  added the comment:
> 
> Nick Coghlan wrote:
>>
>> Nick Coghlan  added the comment:
>>
>> At the very least, failing to regenerate importlib.h shouldn't be a fatal 
>> build error. It should just run with what its got, and hopefully you will 
>> get a working interpreter out the other end, such that you can regenerate 
>> the frozen module on the next pass.
>>
>> If we change that, then I'm OK with keeping the automatic rebuild.
> 
> I fixed that already today.

See http://bugs.python.org/issue14605 and
http://hg.python.org/lookup/acfdf46b8de1 +
http://hg.python.org/cpython/rev/5fea362b92fc

> You now get a warning message from make, but no build error across
> all buildbots like I had run into yesterday when working on the code.

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Nick Coghlan wrote:
> 
> Nick Coghlan  added the comment:
> 
> At the very least, failing to regenerate importlib.h shouldn't be a fatal 
> build error. It should just run with what its got, and hopefully you will get 
> a working interpreter out the other end, such that you can regenerate the 
> frozen module on the next pass.
> 
> If we change that, then I'm OK with keeping the automatic rebuild.

I fixed that already today.

You now get a warning message from make, but no build error across
all buildbots like I had run into yesterday when working on the code.

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>> The question pybuildir.txt apparently tries to solve is whether Python
>> is running from the build dir or not. It's not whether Python was
>> installed or not.
> 
> That's the same, for all we're concerned.
> But pybuilddir.txt does not only solve that problem. It also contains
> the path to extension modules generated by setup.py, so that sys.path
> can be setup appropriately at startup.

Would be easier to tell distutils to install the extensions
in a fixed name dir (instead of using a platform and version
in the name) and then use that getpath.c. distutils is pretty
flexible at that :-)

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>>> Look for "pybuilddir.txt".
>>
>> Oh dear. Another one of those hacks... why wasn't this done using
>> constants passed in by the configure script and simple string
>> comparison ?
> 
> How would that help distinguish between an installed Python and a
> non-installed Python? If you have an idea about that, please open an
> issue and explain it precisely :)

The question pybuildir.txt apparently tries to solve is whether Python
is running from the build dir or not. It's not whether Python was
installed or not. Checking for the build dir can be done by looking
at the argv[0] of the executable and comparing that to the build dir.
This can be compiled into the interpreter using a constant, say
BUILDIR. At runtime, you'd simply compare the current argv[0] to
the BUILDDIR. If it matches, you know that you can assume the
build dir layout with reasonable certainty and proceed accordingly.
No need for extra joins, file reads, etc.

But given the enormous startup time of Python 3.3, those few stats
won't make a difference anyway. This would need a completely different
holistic approach. Perhaps something for SoC project.

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>> Code to detect whether you're running off a checkout vs. a normal
>> installation by looking at even more directories ? I don't
>> see any in getpath.c (and that's good).
> 
> Look for "pybuilddir.txt".

Oh dear. Another one of those hacks... why wasn't this done using
constants passed in by the configure script and simple string
comparison ?

BTW: The startup time of python3.3 is 113ms on my machine, that's
more than twice as long as python2.7. Given the history, it
looks like no one cares about these things anymore... :-(

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14605] Make import machinery explicit

2012-04-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Brett Cannon wrote:
> 
> You can see a little discussion in http://bugs.python.org/issue14642, but it 
> has been discussed elsewhere and the automatic rebuilding was preferred (but 
> it is not a requirement to build as importlib.h is in hg).

An automatic rebuild is fine, but only as long as the local ./python
actually exists.

I was unaware of make rule, so did not run make to check things before
the checkin. As a result, the bootstrap module received a more recent
timestamp than importlib.h and this caused all the buildbots to
force a rebuild of importlib.h - which failed, since they didn't
have a built ./python at that stage.

I checked in a fix and added a warning to the bootstrap script.

--

___
Python tracker 
<http://bugs.python.org/issue14605>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14605] Make import machinery explicit

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Marc-Andre Lemburg wrote:
> Looking further I found this line in the Makefile:
> 
> 
> # Importlib
> 
> Python/importlib.h: $(srcdir)/Lib/importlib/_bootstrap.py 
> $(srcdir)/Python/freeze_importlib.py
> ./$(BUILDPYTHON) $(srcdir)/Python/freeze_importlib.py \
> $(srcdir)/Lib/importlib/_bootstrap.py Python/importlib.h
> 
> Since the patch modified _bootstrap.py, make wants to recreate importlib.h,
> but at that time $(BUILDPYTHON) doesn't yet exist.

I now ran 'make' after applying the patches to have the importlib.h
recreated.

This setup looks a bit fragile to me.

I think it would be better to make creation of the importlib.h an
explicit operation that has to be done in case the Python code
changes (e.g. by creating a make target build-importlib.h),
with the Makefile only warning about a needed update instead
of failing completely.

--

___
Python tracker 
<http://bugs.python.org/issue14605>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14605] Make import machinery explicit

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

R. David Murray wrote:
> 
> R. David Murray  added the comment:
> 
> Hmm.  Some at least of the buildbots have failed to build after that patch:
> 
> ./python ./Python/freeze_importlib.py \
> ./Lib/importlib/_bootstrap.py Python/importlib.h
> make: ./python: Command not found
> make: *** [Python/importlib.h] Error 127
> program finished with exit code 2
> 
> (http://www.python.org/dev/buildbot/all/builders/AMD64%20Gentoo%20Wide%203.x/builds/3771)

Thanks for mentioning this. I've reverted the change for now and
will have a look tomorrow.

The logs of the failing bots are not very informative about what
is going on:

gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. 
-I./Include
-DPy_BUILD_CORE -o Python/dynamic_annotations.o Python/dynamic_annotations.c
gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. 
-I./Include
-DPy_BUILD_CORE -o Python/errors.o Python/errors.c
./python ./Python/freeze_importlib.py \
./Lib/importlib/_bootstrap.py Python/importlib.h
make: ./python: Command not found
make: *** [Python/importlib.h] Error 127
program finished with exit code 2

vs.

gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. 
-I./Include
-DPy_BUILD_CORE -o Python/dynamic_annotations.o Python/dynamic_annotations.c
gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. 
-I./Include
-DPy_BUILD_CORE -o Python/errors.o Python/errors.c
gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. 
-I./Include
-DPy_BUILD_CORE -o Python/frozen.o Python/frozen.c
gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. 
-I./Include
-DPy_BUILD_CORE -o Python/frozenmain.o Python/frozenmain.c
gcc -pthread -c -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes-I. 
-I./Include
-DPy_BUILD_CORE -o Python/future.o Python/future.c

I guess some commands are not printed to stdout.

Looking at the buildbots again: reverting the patch has not caused
the lights to go green again. Very strange indeed.

Looking further I found this line in the Makefile:


# Importlib

Python/importlib.h: $(srcdir)/Lib/importlib/_bootstrap.py 
$(srcdir)/Python/freeze_importlib.py
./$(BUILDPYTHON) $(srcdir)/Python/freeze_importlib.py \
$(srcdir)/Lib/importlib/_bootstrap.py Python/importlib.h

Since the patch modified _bootstrap.py, make wants to recreate importlib.h,
but at that time $(BUILDPYTHON) doesn't yet exist.

--

___
Python tracker 
<http://bugs.python.org/issue14605>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14605] Make import machinery explicit

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Brett Cannon wrote:
> 
> I documented it explicitly so people can use it if they so choose (e.g. look 
> at sys._getframe()). If you want to change this that's fine, but I am 
> personally not going to put the effort in to rename the class, update the 
> tests, and change the docs for this (we almost stopped allowing the 
> importation of bytecode directly not so long ago but got push-back so we 
> backed off).

I renamed the loader and reworded the notice in the docs.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com


2012-04-28: PythonCamp 2012, Cologne, Germany   3 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker 
<http://bugs.python.org/issue14605>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
>> Adding more cruft to getpath.c or similar routines is just going to
>> slow down startup time even more...
> 
> The code is already there.

Code to detect whether you're running off a checkout vs. a normal
installation by looking at even more directories ? I don't
see any in getpath.c (and that's good).

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Brett Cannon wrote:
> 
> Modules/getpath.c seems to be where the C code does it when getting paths for 
> sys.path. So it would be possible to use that same algorithm to set some sys 
> attribute (e.g. in_checkout or something) much like sys.gettotalrefcount is 
> optional and only shown when built with --with-pydebug. Otherwise some 
> directory structure check could be done (e.g. find importlib/_bootstrap.py 
> off of sys.path, and then see if ../Modules/Setup or something also exists 
> that would never show up in an installed CPython).

Why not simply use a flag that get's set based on an environment
variable, say PYTHONDEVMODE ?

Adding more cruft to getpath.c or similar routines is just going to
slow down startup time even more...

Python 2.7 has a startup time of 70ms on my machine; compare that to
Python 2.1 with 10ms and
Perl 5 with just 4ms.

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Brett Cannon wrote:
> 
> Brett Cannon  added the comment:
> 
> So basically if you are running in a checkout, grab the source file and 
> compile it manually since its location is essentially hard-coded and thus you 
> don't need to care about sys.path and all the other stuff required to do an 
> import, while using the frozen code for when you are running an installed 
> module since you would otherwise need to do the search for importlib's source 
> file to do a load at startup properly.

Right.

> That's an interesting idea. How do we currently tell that the interpreter is 
> running in a checkout? Is that exposed in any way to Python code?

There's some magic happening in site.py for checkouts, but I'm not sure
whether any of that is persistent or even available at the time these
particular imports would happen.

Then again, I'm not sure you need to know whether you have a checkout
or not. You just need some flag to identify whether you want the
search for external module code to take place or not. sys.flags
could be used for that.

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Brett Cannon wrote:
> 
> Brett Cannon  added the comment:
> 
> I don't quite follow what you are suggesting, MAL. Are you saying to freeze 
> importlib.__init__ and importlib._bootstrap and somehow have 
> improtlib.__init__ choose what to load, frozen or source?

No, it always loads and runs the frozen code, but at the start of
the module code it branches between the frozen bytecode and the code
read from an external file.

Pseudo-code in every module you wish to be able to host externally:

#
# MyModule
#
if operating_in_dev_mode and '' in __file__:
exec(open('dev-area/MyModule.py', 'r).read(), globals(), globals())
else:
# Normal module code
class MyClass: ...
# hundreds of lines of code...

Aside: With a module scope "break", the code would look more elegant:

#
# MyModule
#
if operating_in_dev_mode and '' in __file__:
exec(open('dev-area/MyModule.py', 'r).read(), globals(), globals())
break

# Normal module code
class MyClass: ...
# hundreds of lines of code...

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14605] Make import machinery explicit

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Brett Cannon wrote:
> 
> That initial comment is out-of-date. If you look that the commit I made I  
> documented importlib.machinery._SourcelessFileLoader. I am continuing the 
> discouragement of using bytecode files as an obfuscation technique (because 
> it's a bad one), but I decided to at least document the class so people can 
> use it at their own peril and know about it if they happen to come across the 
> object during execution.

It's not a perfect obfuscation technique, but a pretty simple and
(legally) effective one to use.

FWIW, I don't think the comment in the check-in is appropriate:

"""
   1.127 +   It is **strongly** suggested you do not rely on this loader (hence 
the
   1.128 +   leading underscore of the class). Direct use of bytecode files 
(and thus not
   1.129 +   source code files) inhibits your modules from being usable by all 
Python
   1.130 +   implementations. It also runs the risk of your bytecode files not 
being
   1.131 +   usable by new versions of Python which change the bytecode format. 
This
   1.132 +   class is only documented as it is directly used by import and thus 
can
   1.133 +   potentially have instances show up as a module's ``__loader__`` 
attribute.
"""

The "risks" you mention there are really up to the application developers
to decide how to handle, not the Python developers. Python has a long
tradition of being friendly to commercial applications and I don't see
any reason why we should stop that.

If you do want this to change, please write a PEP. This may appear
to be a small change in direction, but it does in fact have quite
some impact on the usefulness of CPython in commercial settings.

I also think that the SourcelessFileLoader loader should be first class
citizen without the leading underscore if the importlib is to completely
replace the current import mechanism. Why force developers to write their
own loader instead of using the standard one just because of the leading
underscore, when it's only 20 lines of code ?

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com


2012-04-28: PythonCamp 2012, Cologne, Germany   4 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker 
<http://bugs.python.org/issue14605>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

test me
thod. Another option is we hide the source as _importlib or something to allow 
direct importation w/o any tricks under a protected name.

Using the freeze everything approach you make things easier for the
implementation, since you don't have to think about whether certain
pieces of code are already available or not.

For development, you can also have the package load bytecode
or source from an external package instead of running (all of)
the module's bytecode that was compiled into the binary.

This is fairly easy to do, since the needed exec() does not
depend on the import machinery.

The only downside is big if statement to isolate the frozen
version from the loaded one - would be great if we had a
command to stop module execution or code execution for a block to
make that more elegant, e.g. "break" at module scope :-)

--

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14605] Make import machinery explicit

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Brett Cannon wrote:
> I am not exposing SourcelessFileLoader because importlib publicly tries to 
> discourage the shipping of .pyc files w/o their corresponding source files. 
> Otherwise all objects as used by importlib for performing imports will become 
> public.

What's the reasoning behind this idea ? Is Python 3.3 no longer meant to
be used for closed source applications ?

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue14605>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14657] Avoid two importlib copies

2012-04-24 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>> This would also mean that changes to importlib._bootstrap would
>> actually take effect for user code almost immediately, *without*
>> rebuilding Python, as the frozen version would *only* be used to get
>> hold of the pure Python version.
> 
> Actually, _io, encodings and friends must be loaded before importlib
> gets imported from Python code, so you will still have __loader__
> entries referencing the frozen importlib, unless you also rewrite these
> attributes.
> 
> My desire here is not to hide _frozen_importlib, rather to avoid subtle
> issues with two instances of a module living in memory with separate
> global states. Whether it's the frozen version or the on-disk Python
> version that gets the preference is another question (a less important
> one in my mind).

Why don't you freeze the whole importlib package to avoid all these
issues ? As side effect, it will also load a little faster.

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue14657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14423] Getting the starting date of iso week from a week number and a year.

2012-04-22 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Mark Dickinson wrote:
> 
> By the way, I don't think the algorithm used in the current patch is correct. 
>  For 'date.from_iso_week(2009, 1)' I get 2009/1/1, which was a Thursday.  The 
> documentation seems to indicate that a Monday should be returned.

True, the correct date is 2008-12-29.

--

___
Python tracker 
<http://bugs.python.org/issue14423>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler

2012-04-20 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

ink it 
is not unlikely that you *are* the only ones affected by it.

With "in the wild" I'm referring to the function being released in
the ccompiler not only in alpha releases but also in the beta
releases, the 2.7, 2.7.1 and 2.7.2 release - in every release
since early in 2010.

We were unaware of the reversal of the changes by Tarek and
the way we coded things in mxSetup.py did not show that things
were removed again, simply because we support more than just
Python 2.7 and have proper fallback solutions for most things.

Only in this particular case, we were using different strategies
based on the Python version number and so there is no fallback.

> Nevertheless, what are the alternatives?  We could add a wrapper function 
> into distutils.ccompiler that just calls the distutils.sysconfig version.  
> Here's a patch that attempts to do so. That should fix that breakage for the 
> eGenix packages.  It would be great if you could test it.

The fix is easy: simply import the customize_compiler() API in
the ccompiler module to maintain compatibility with what had
already been release. No need to add a wrapper function,
a single

from distutils.sysconfig import customize_compiler()

in ccompile.py will do just fine.

> It's up to the 2.7 release manager to decide what action to take, i.e. 
> whether the patch is needed and, if so, how quickly to schedule a new 
> release.  As a practical matter, regardless of whether the patch is applied 
> in Python or not, I would assume that a faster solution for your end users 
> would be to ship a version of the eGenix packages that reverts the changes(s) 
> there.  By the way, it looks like you'll need to eventually do that anyway 
> since the code in mxSetup.py incorrectly assumes that the corresponding 
> changes were also made to Python 3.2.

We don't support Python 3.x yet, so that's a non-issue at the moment.

But yes, we will have to release new patch level releases for all
our packages to get this fixed for our users.

--

___
Python tracker 
<http://bugs.python.org/issue13994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler

2012-04-20 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Marc-Andre Lemburg wrote:
> 
>> Ned Deily  added the comment:
>>
>> That's unfortunate.  But the documented location for customize_compiler is 
>> and, AFAIK, had always been in distutils.sysconfig.  It was an inadvertent 
>> consequence of the bad revert during the 2.7 development cycle that a second 
>> copy was made available in distutils.ccompiler.  That change was not 
>> supposed to be released in 2.7 and was never documented.  So I don't think 
>> there is anything that can or needs to be done as this point in Python 
>> itself.  Other opinions?
> 
> Excuse me, Ned, but that's not how we do approach dot releases in Python.
> 
> Regardless of whether the documentation was fixed or not, you cannot
> simply remove a non-private function without making sure that at least
> the import continues to work.

Turns out, the "fix" broke all our packages for Python 2.7.3 and
I can hardly believe we're the only ones affected by this.

--

___
Python tracker 
<http://bugs.python.org/issue13994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler

2012-04-20 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Ned Deily wrote:
> 
> And to recap the history here, there was a change in direction for Distutils 
> during the 2.7 development cycle, as decided at the 2010 language summit, in 
> particular to revert feature changes in Distutils for 2.7 to its 2.6.x state 
> and, going forward, "Distutils in Python will be feature-frozen".
> 
> http://mail.python.org/pipermail/python-dev/2010-March/098135.html

I know that distutils development was stopped (even though I don't
consider that a good thing), but since the code changes were let
into the wild, we have to deal with it properly now.

--

___
Python tracker 
<http://bugs.python.org/issue13994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler

2012-04-20 Thread Marc-Andre Lemburg

Changes by Marc-Andre Lemburg :


--
resolution: fixed -> 

___
Python tracker 
<http://bugs.python.org/issue13994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler

2012-04-20 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Ned Deily wrote:
> 
> Ned Deily  added the comment:
> 
> That's unfortunate.  But the documented location for customize_compiler is 
> and, AFAIK, had always been in distutils.sysconfig.  It was an inadvertent 
> consequence of the bad revert during the 2.7 development cycle that a second 
> copy was made available in distutils.ccompiler.  That change was not supposed 
> to be released in 2.7 and was never documented.  So I don't think there is 
> anything that can or needs to be done as this point in Python itself.  Other 
> opinions?

Excuse me, Ned, but that's not how we do approach dot releases in Python.

Regardless of whether the documentation was fixed or not, you cannot
simply remove a non-private function without making sure that at least
the import continues to work.

--
status: pending -> open

___
Python tracker 
<http://bugs.python.org/issue13994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler

2012-04-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Éric Araujo wrote:
> 
> Sorry for not thinking about this.  I’ll be more careful.

No need to be sorry; these things can happen.

What I don't understand is this line in the news section:

"Complete the revert back to only having one in distutils.sysconfig as
 7.12 +  is the case in 3.x."

Back when I discussed these changes with Tarek, we both agreed that
customize_compiler() is better placed into the ccompiler module
than the sysconfig module, so I think the one in the sysconfig
module should be replaced with a reference to the version in the
ccompile module - in both 2.7 and 3.x.

--

___
Python tracker 
<http://bugs.python.org/issue13994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler

2012-04-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Here's the quote from mxSetup.py:

# distutils changed a lot in Python 2.7 due to many
# distutils.sysconfig APIs having been moved to the new
# (top-level) sysconfig module.
from sysconfig import \
 get_config_h_filename, parse_config_h, get_path, \
 get_config_vars, get_python_version, get_platform

# This API was moved from distutils.sysconfig to distutils.ccompiler
# in Python 2.7
from distutils.ccompiler import customize_compiler

So in 2.7 the function was moved from sysconfig to ccompiler (where it 
belongs), and now you're reverting the change in the third dot release.

--

___
Python tracker 
<http://bugs.python.org/issue13994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13994] incomplete revert in 2.7 Distutils left two copies of customize_compiler

2012-04-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

The patch broke egenix-mx-base, since it relies on the customize_compiler() 
being available in distutils.ccompiler:

https://www.egenix.com/mailman-archives/egenix-users/2012-April/114838.html

If you make such changes to dot releases, please make absolutely sure that when 
you move functions from one module to another, you keep backwards compatibility 
aliases around.

--
nosy: +lemburg
resolution: fixed -> 
status: closed -> open

___
Python tracker 
<http://bugs.python.org/issue13994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14428] Implementation of the PEP 418

2012-04-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> Please leave the pybench default timers unchanged in case the
>> new APIs are not available.
> 
> Ok, done in the new patch: perf_counter_process_time-2.patch.

Thanks.

--

___
Python tracker 
<http://bugs.python.org/issue14428>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14619] Enhanced variable substitution for databases

2012-04-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Raymond, the variable substitution is normally done by the database and not the 
Python database modules, so you'd have to ask the database maintainers for 
assistance.

The qmark ('?') parameter style is part of the ODBC standard, so it's unlikely 
that this will get changed any time soon unless you have good contacts with 
Microsoft :-)

The ODBC standard also doesn't support multi-value substitutions in the API, so 
there's no way to pass the array to the database driver.

BTW: Such things are better discussed on the DB-SIG mailing list than the 
Python tracker.

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue14619>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14428] Implementation of the PEP 418

2012-04-18 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Please leave the pybench default timers unchanged in case the
new APIs are not available.

The perf_counter_process_time.patch currently changes them, even
though the new APIs are not available on older Python releases,
thus breaking pybench for e.g. Python 3.2 or earlier releases.

Ditto for the resolution changes: these need to be optional and
not cause a break when used in Python 3.1/3.2.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com


2012-04-28: PythonCamp 2012, Cologne, Germany  10 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker 
<http://bugs.python.org/issue14428>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14428] Implementation of the PEP 418

2012-04-13 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
> perf_counter_process_time.patch: replace "time.clock if windows else 
> time.time" with time.perf_counter, and getrusage/clock with time.process_time.
> 
> pybench and timeit now use time.perf_counter() by default. profile uses 
> time.proces_time() by default.
> 
> pybench uses time.get_clock_info() to display the precision and the 
> underlying C function (or the resolution if the precision is not available).
> 
> Tools/pybench/systimes.py and Tools/pybench/clockres.py may be removed: these 
> features are now available directly in the time module.

No changes to the pybench defaults, please. It has to stay backwards
compatible with older releases. Adding optional new timers is fine,
though.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com


2012-04-28: PythonCamp 2012, Cologne, Germany  15 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker 
<http://bugs.python.org/issue14428>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14423] Getting the starting date of iso week from a week number and a year.

2012-04-09 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky  added the comment:
> 
> On Mon, Apr 9, 2012 at 6:20 PM, Marc-Andre Lemburg
>  wrote:
>> Which is wrong, since the start of the first ISO week of a year
>> can in fact start in the preceeding year...
> 
> Hmm, the dateutil documentation seems to imply that relativedelta
> takes care of this:
> 
> http://labix.org/python-dateutil#head-72c4689ec5608067d118b9143cef6bdffb6dad4e
> 
> (Search the page for "ISO")

That's not realtivedelta taking care of it, it's the way it is
used: the week with 4.1. in it is the first ISO week of a year;
it then goes back to the previous Monday and adds 14 weeks from
there to go to the Monday of the 15th week. This works fine as
long as 4.1. doesn't fall on a Monday...

You don't really expect anyone to remember such rules, do you ? :-)

--

___
Python tracker 
<http://bugs.python.org/issue14423>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14423] Getting the starting date of iso week from a week number and a year.

2012-04-09 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky  added the comment:
> 
> Before you invest in a C version, let's discuss whether this feature is 
> desirable.  The proposed function implements a very simple and not very 
> common calculation.  Note that even dateutil does not provide direct support 
> for this: you are instructed to use relativedelta to add weeks to January 1st 
> of the given year.

Which is wrong, since the start of the first ISO week of a year
can in fact start in the preceeding year...

http://en.wikipedia.org/wiki/ISO_week_date

and it's not a simple calculation.

ISO weeks are in common use throughout Europe, it's part of the
ISO 8601 standard. mxDateTime has had such constructors for ages:

http://www.egenix.com/products/python/mxBase/mxDateTime/doc/#_Toc293683820

--

___
Python tracker 
<http://bugs.python.org/issue14423>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14428] Implementation of the PEP 418

2012-04-03 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> I think you need to reconsider the time.steady() name you're using
>> in the PEP. For practical purposes, it's better to call it
>> time.monotonic()
> 
> I opened a new thread on python-dev to discuss this topic.
> 
>> and only make the function available if the OS provides
>> a monotonic clock.
> 
> Oh, I should explain this choice in the PEP. Basically, the idea is to
> provide a best-effort portable function.
> 
>> The fallback to time.time() is not a good idea, since then the programmer
>> has to check whether the timer really provides the features she's after
>> every time it gets used.
> 
> Nope, time.get_clock_info('steady') does not change at runtime. So it
> can only be checked once.

With "every time" I meant: in every application you use the function.
That pretty much spoils the idea of a best effort portable function.

It's better to use a try-except to test for availability of
functions than to have to (remember to) call a separate function
to find out the characteristics of the best effort approach.

>> Instead of trying to tweak all the different clocks and timers into
>> a single function, wouldn't it be better to expose each kind as a
>> different function and then let the programmer decide which fits
>> best ?!
> 
> This is a completly different approach. It should be discussed on
> python-dev, not in the bug tracker please. I think that Python can
> help the developer to write portable code by providing high-level
> functions because clock properties are well known (e.g. see
> time.get_clock_info).

Fair enough.

BTW: Are aware of the existing systimes.py module in pybench,
which already provides interfaces to high resolution timers usable
for benchmarking in a portable way ? Perhaps worth mentioning in
the PEP.

--

___
Python tracker 
<http://bugs.python.org/issue14428>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14428] Implementation of the PEP 418

2012-04-03 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Hi Victor,

I think you need to reconsider the time.steady() name you're using
in the PEP. For practical purposes, it's better to call it
time.monotonic() and only make the function available if the OS provides
a monotonic clock.

The fallback to time.time() is not a good idea, since then the programmer
has to check whether the timer really provides the features she's after
every time it gets used.

Regardless of this functional problem, I'm also not sure what you want
to imply by the term "steady". A steady beat would mean that the timer
never stops and keeps a constant pace, but that's not the case for
the timers you're using to implement time.steady(). If you're after
a mathematical term, "continuous" would be a better term, but
again, time.time() is not always continuous.

Instead of trying to tweak all the different clocks and timers into
a single function, wouldn't it be better to expose each kind as a
different function and then let the programmer decide which fits
best ?!

BTW: Thanks for the research you've done on the different clocks and
timers. That's very useful information.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com


2012-04-03: Python Meeting Duesseldorf today

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue14428>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13608] remove born-deprecated PyUnicode_AsUnicodeAndSize

2012-03-27 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
> The Py_UNICODE* type is deprecated but since Python 3.3, Py_UNICODE=wchar_t 
> and wchar_t* is a common type on Windows. PyUnicode_AsUnicodeAndSize() is 
> used to encode Python strings to call Windows functions.
> 
> PyUnicode_AsUnicodeAndSize() is preferred over PyUnicode_AsWideCharString() 
> because PyUnicode_AsWideCharString() stores the result in the Unicode string 
> and the Unicode string releases the memory automatically later. Calling 
> PyUnicode_AsWideCharString() twice on the same string avoids also the need of 
> encoding the string twice because the result is cached.
> 
> I proposed to add a new function using wchar_*t and storing the result in the 
> Unicode string, but the idea was rejected. I don't remember why.

Could you please clarify what you actually intend to do ? Which
function do you want to remove and why ?

The title and description of this ticket don't match :-)

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue13608>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14397] Use GetTickCount/GetTickCount64 instead of QueryPerformanceCounter for monotonic clock

2012-03-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Yury Selivanov wrote:
> 
> Yury Selivanov  added the comment:
> 
>> A monotonic clock is not suitable for measuring durations, as it may still 
>> jump forward. A steady clock will not.
> 
> Well, Victor's implementation of 'steady()' is just a tiny wrapper, which 
> uses 'monotonic()' or 'time()' if the former is not available.  Hence 
> 'steady()' is a misleading name.

Agreed.

I think time.monotonic() is a better name.

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue14397>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14309] Deprecate time.clock()

2012-03-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> There's no other single function providing the same functionality
> 
> time.clock() is not portable: it is a different clock depending on the OS. To 
> write portable code, you have to use the right function:
> 
>  - time.time()
>  - time.steady()
>  - os.times(), resource.getrusage()

time.clock() does exactly what the docs say: you get access to
a CPU timer. It's normal that CPU timers work differently on
different OSes.

> On Windows, time.clock() should be replaced by time.steady().

What for ? time.clock() uses the same timer as time.steady() on Windows,
AFAICT, so all you change is the name of the function.

> On UNIX, time.clock() can be replaced with "usage=os.times(); 
> usage[0]+usage[1]" for example.

And what's the advantage of that over using time.clock() directly ?

--

___
Python tracker 
<http://bugs.python.org/issue14309>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14309] Deprecate time.clock()

2012-03-16 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> time.clock() has been in use for ages in many many scripts.
>> We don't want to carelessly break all those.
> 
> I don't want to remove the function, just mark it as deprecated to
> avoid confusion. It will only be removed from the next major Python.

Why ? There's no other single function providing the same functionality,
so it's not even a candidate for deprecation.

Similar functionality is available via several different functions,
but that's true for a lot functions in th stdlib.

--

___
Python tracker 
<http://bugs.python.org/issue14309>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14309] Deprecate time.clock()

2012-03-14 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> New submission from STINNER Victor :
> 
> Python 3.3 has 3 functions to get time:
> 
>  - time.clock()
>  - time.steady()
>  - time.time()
> 
> Antoine Pitrou suggested to deprecated time.clock() in msg120149 (issue 
> #10278).
> 
> "The problem is time.clock(), since it does two wildly different things
> depending on the OS. I would suggest to deprecate time.clock() at the same 
> time as we add time.wallclock(). For the Unix-specific definition of 
> time.clock(), there is already os.times() (which gives even richer 
> information)."
> 
> (time.wallclock was the old name of time.steady)

Strong -1 on this idea.

time.clock() has been in use for ages in many many scripts. We don't
want to carelessly break all those.

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue14309>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7652] Merge C version of decimal into py3k.

2012-03-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Does the C version have a C API importable as capsule ?
If not, could you add one and a decimal.h to go with it ?

This makes integration in 3rd party modules a lot easier.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com


2012-02-13: Released eGenix pyOpenSSL 0.13http://egenix.com/go26
2012-02-09: Released mxODBC.Zope.DA 2.0.2 http://egenix.com/go25

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--
nosy: +lemburg

___
Python tracker 
<http://bugs.python.org/issue7652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-21 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> Question: Should sys.flags.hash_randomization be True (1) when 
>> PYTHONHASHSEED=0?  It is now.
>>
>> Saying yes "working as intended" is fine by me.
> 
> It is documented that PYTHONHASHSEED=0 disables the randomization, so
> sys.flags.hash_randomization must be False (0).

PYTHONHASHSEED=1 will disable randomization as well :-)

Only setting PYTHONHASHSEED=random actually enables randomization.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com




[issue13703] Hash collision security issue

2012-02-21 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Gregory P. Smith wrote:
> 
> Gregory P. Smith  added the comment:
> 
> Question: Should sys.flags.hash_randomization be True (1) when 
> PYTHONHASHSEED=0?  It is now.

The flag should probably be removed - simply because
the env var is not a flag, it's a configuration parameter.

Exposing the seed value as sys.hashseed would be better and more useful
to applications.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-13 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Dave Malcolm wrote:
> [new patch]

Please change how the env vars work as discussed earlier on this ticket.

Quick summary:

We only need one env var for the randomization logic: PYTHONHASHSEED.
If not set, 0 is used as seed. If set to a number, a fixed seed
is used. If set to "random", a random seed is generated at
interpreter startup.

Same for the -R cmd line option.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com



::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-08 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Dave Malcolm wrote:
> 
> If anyone is aware of an attack via numeric hashing that's actually
> possible, please let me know (privately).  I believe only specific apps
> could be affected, and I'm not aware of any such specific apps.

I'm not sure what you'd like to see.

Any application reading user provided data from a file, database,
web, etc. is vulnerable to the attack, if it uses the read numeric
data as keys in a dictionary.

The most common use case for this is a dictionary mapping codes or
IDs to strings or objects, e.g. for caching purposes, to find a list
of unique IDs, checking for duplicates, etc.

This also works indirectly on 32-bit platforms, e.g. via date/time
or IP address values that get converted to key integers.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alex Gaynor wrote:
> There's no need to cover any container types, because if their constituent
> types are securely hashable then they will be as well.  And of course if
> the constituent types are unsecure then they're directly vulnerable.

I wouldn't necessarily take that for granted: since container
types usually calculate their hash based on the hashes of their
elements, it's possible that a clever combination of elements
could lead to a neutralization of the the hash seed used by
the elements, thereby reenabling the original attack on the
unprotected interpreter.

Still, because we have far more vulnerable hashable types out there,
trying to find such an attack doesn't really make practical
sense, so protecting containers is indeed not as urgent :-)

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alex Gaynor wrote:
> Can't randomization just be applied to integers as well?

A simple seed xor'ed with the hash won't work, since the attacks
I posted will continue to work (just colliding on a different hash
value).

Using a more elaborate hash algorithm would slow down uses of
numbers as dictionary keys and also be difficult to implement for
non-integer types such as float, longs and complex numbers. The
reason is that Python applications expect x == y => hash(x) == hash(y),
e.g. hash(3) == hash(3L) == hash(3.0) == hash(3+0j).

AFAIK, the randomization patch also doesn't cover tuples, which are
rather common as dictionary keys as well, nor any of the other
more esoteric Python built-in hashable data types (e.g. frozenset)
or hashable data types defined by 3rd party extensions or
applications (simply because it can't).

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Gregory P. Smith wrote:
> 
> Gregory P. Smith  added the comment:
> 
>>
>>> The release managers have pronounced:
>>> http://mail.python.org/pipermail/python-dev/2012-January/115892.html
>>> Quoting that email:
>>>> 1. Simple hash randomization is the way to go. We think this has the
>>>> best chance of actually fixing the problem while being fairly
>>>> straightforward such that we're comfortable putting it in a stable
>>>> release.
>>>> 2. It will be off by default in stable releases and enabled by an
>>>> envar at runtime. This will prevent code breakage from dictionary
>>>> order changing as well as people depending on the hash stability.
>>
>> Right, but that doesn't contradict what I wrote about adding
>> env vars to fix a seed and optionally enable using a random
>> seed, or adding collision counting as extra protection for
>> cases that are not addressed by the hash seeding, such as
>> e.g. collisions caused by 3rd types or numbers.
> 
> We won't be back-porting anything more than the hash randomization for
> 2.6/2.7/3.1/3.2 but we are free to do more in 3.3 if someone can
> demonstrate it working well and a need for it.
> 
> For me, things like collision counting and tree based collision
> buckets when the types are all the same and known comparable make
> sense but are really sounding like a lot of additional complexity. I'd
> *like* to see active black-box design attack code produced that goes
> after something like a wsgi web app written in Python with hash
> randomization *enabled* to demonstrate the need before we accept
> additional protections like this  for 3.3+.

I posted several examples for the integer collision attack on this
ticket. The current randomization patch does not address this at all,
the collision counting patch does, which is why I think both are
needed.

Note that my comment was more about the desire to *not* recommend
using random hash seeds per default, but instead advocate using
a random but fixed seed, or at least document that using random
seeds that are set during interpreter startup will cause
problems with repeatability of application runs.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>>> Right, but that doesn't contradict what I wrote about adding
>>> env vars to fix a seed and optionally enable using a random
>>> seed, or adding collision counting as extra protection for
>>> cases that are not addressed by the hash seeding, such as
>>> e.g. collisions caused by 3rd types or numbers.
>>
>> ... at least I hope not :-)
> 
> I think the env var part is a good idea (except that -1 as a magic value
> to enable randomization isn't great).

Agreed. Since it's an env var, using "random" would be a better choice.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Marc-Andre Lemburg wrote:
> Dave Malcolm wrote:
>> The release managers have pronounced:
>> http://mail.python.org/pipermail/python-dev/2012-January/115892.html
>> Quoting that email:
>>> 1. Simple hash randomization is the way to go. We think this has the
>>> best chance of actually fixing the problem while being fairly
>>> straightforward such that we're comfortable putting it in a stable
>>> release.
>>> 2. It will be off by default in stable releases and enabled by an
>>> envar at runtime. This will prevent code breakage from dictionary
>>> order changing as well as people depending on the hash stability.
> 
> Right, but that doesn't contradict what I wrote about adding
> env vars to fix a seed and optionally enable using a random
> seed, or adding collision counting as extra protection for
> cases that are not addressed by the hash seeding, such as
> e.g. collisions caused by 3rd types or numbers.

... at least I hope not :-)

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Dave Malcolm wrote:
> 
>>> So the overhead in startup time is not an issue?
>>
>> It is an issue. Not only in terms of startup time, but also
>... 
>> because randomization per default makes Python behave in
>> non-deterministc ways - which is not what you want from a
>> programming language or interpreter (unless you explicitly
>> tell it to behave like that).
> 
> The release managers have pronounced:
> http://mail.python.org/pipermail/python-dev/2012-January/115892.html
> Quoting that email:
>> 1. Simple hash randomization is the way to go. We think this has the
>> best chance of actually fixing the problem while being fairly
>> straightforward such that we're comfortable putting it in a stable
>> release.
>> 2. It will be off by default in stable releases and enabled by an
>> envar at runtime. This will prevent code breakage from dictionary
>> order changing as well as people depending on the hash stability.

Right, but that doesn't contradict what I wrote about adding
env vars to fix a seed and optionally enable using a random
seed, or adding collision counting as extra protection for
cases that are not addressed by the hash seeding, such as
e.g. collisions caused by 3rd types or numbers.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Jim Jewett wrote:
> 
>> BTW: If you set the limit N to e.g. 100 (which is reasonable given
>> Victor's and my tests),
> 
> Agreed.  Frankly, I think 5 would be more than reasonable so long as
> there is a fallback.
> 
>> the time it takes to process one of those
>> sets only takes 0.3 ms on my machine. That's hardly usable as basis
>> for an effective DoS attack.
> 
> So it would take around 3Mb to cause a minute's delay...

I'm not sure how you calculated that number.

Here's what I get: tale a dictionary with 100 integer collisions:
d = dict((x*(2**64 - 1), 1) for x in xrange(1, 100))

The repr(d) has 2713 bytes, which is a good approximation of how
much (string) data you have to send in order to trigger the
problem case.

If you can create  distinct integer sequences, you'll get a
processing time of about 1 second on my slow dev machine. The
resulting dict will likely have a repr() of around
60**2713 = 517MB.

So you need to send 517MB to cause my slow dev machine to consume
1 minute of CPU time. Today's servers are at least 10 times as fast as
my aging machine.

If you then take into account that the integer collision dictionary
is a very efficient collision example (size vs. effect), the attack
doesn't really sound practical anymore.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Jim Jewett wrote:
> 
> Jim Jewett  added the comment:
> 
> On Mon, Feb 6, 2012 at 8:12 AM, Marc-Andre Lemburg
>  wrote:
>>
>> Marc-Andre Lemburg  added the comment:
>>
>> Antoine Pitrou wrote:
>>>
>>> The simple collision counting approach leaves a gaping hole open, as
>>> demonstrated by Frank.
> 
>> Could you elaborate on this ?
> 
>> Note that I've updated the collision counting patch to cover both
>> possible attack cases I mentioned in 
>> http://bugs.python.org/issue13703#msg150724.
>> If there's another case I'm unaware of, please let me know.
> 
> The problematic case is, roughly,
> 
> (1)  Find out what N will trigger collision-counting countermeasures.
> (2)  Insert N-1 colliding entries, to make it as slow as possible.
> (3)  Keep looking up (or updating) the N-1th entry, so that the
> slow-as-possible-without-countermeasures path keeps getting rerun.

Since N is constant, I don't see how such an "attack" could be used
to trigger the O(n^2) worst-case behavior. Even if you can create n sets
of entries that each fill up N-1 positions, the overall performance
will still be O(n*N*(N-1)/2) = O(n).

So in the end, we're talking about a regular brute force DoS attack,
which requires different measures than dictionary implementation
tricks :-)

BTW: If you set the limit N to e.g. 100 (which is reasonable given
Victor's and my tests), the time it takes to process one of those
sets only takes 0.3 ms on my machine. That's hardly usable as basis
for an effective DoS attack.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> The simple collision counting approach leaves a gaping hole open, as
> demonstrated by Frank.

Could you elaborate on this ?

Note that I've updated the collision counting patch to cover both
possible attack cases I mentioned in 
http://bugs.python.org/issue13703#msg150724.
If there's another case I'm unaware of, please let me know.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> In a security fix release, we shouldn't change the linkage procedures,
>> so I recommend that the LoadLibrary dance remains.
> 
> So the overhead in startup time is not an issue?

It is an issue. Not only in terms of startup time, but also
because randomization per default makes Python behave in
non-deterministc ways - which is not what you want from a
programming language or interpreter (unless you explicitly
tell it to behave like that).

I think it would be much better to just let the user
define a hash seed using environment variables for Python
to use and then forget about how this variable value is
determined. If it's not set, Python uses 0 as seed, thereby
disabling the seeding logic.

This approach would have Python behave in a deterministic way
per default and still allow users who wish to use a different
seed, set this to a different value - even on a case by case
basis.

If you absolutely want to add a feature to have the seed set
randomly, you could make a seed value of -1 trigger the use
of a random number source as seed.

I also still firmly believe that the collision counting scheme
should be made available via an environment variable as well.
The user could then set the variable to e.g. 1000 to have it
enabled with limit 1000, or leave it undefined to disable the
collision counting.

With those two tools, users could then choose the method they
find most attractive for their purposes.

By default, they would be disabled, but applications which are
exposed to untrusted user data and use dictionaries for managing
such data could check whether the protections are enabled and
trigger a startup error if needed.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-23 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

> To see the collision counting, enable the DEBUG_DICT_COLLISIONS
> macro variable.

Running (part of (*)) the test suite with debugging enabled on a 64-bit
machine shows that slot collisions are much more frequent than
hash collisions, which only account for less than 0.01% of all
collisions.

It also shows that slot collisions in the low 1-10 range are
most frequent, with very few instances of a dict lookup
reaching 20 slot collisions (less than 0.0002% of all
collisions).

The great number of cases with 1 or 2 slot collisions surprised
me. It seems that there's potential for improvement of
the perturbation formula left.

Due to the large number of 1 or 2 slot collisions, the patch
is going to cause a minor hit to dict lookup performance.
It may make sense to unroll the slot search loop and only
start counting after the third round of misses.

(*) I stopped the run after several hours run-time, producing
some 148GB log data.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-23 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

> I've also added a test script which demonstrates both types of
> collisions using integer objects (since it's trivial to calculate
> their hashes).

I forgot to mention: the test script is for 64-bit platforms. It's
easy to adapt it to 32-bit if needed.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-23 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Here's a version of the collision counting patch that takes both hash
and slot collisions into account.

I've also added a test script which demonstrates both types of
collisions using integer objects (since it's trivial to calculate
their hashes).

To see the collision counting, enable the DEBUG_DICT_COLLISIONS
macro variable.

--
Added file: http://bugs.python.org/file24299/hash-attack-3.patch
Added file: http://bugs.python.org/file24300/integercollision.py

___
Python tracker 
<http://bugs.python.org/issue13703>
___Index: Objects/dictobject.c
===
--- Objects/dictobject.c(revision 88933)
+++ Objects/dictobject.c(working copy)
@@ -9,7 +9,13 @@
 
 #include "Python.h"
 
+/* Maximum number of allowed collisions. */
+#define Py_MAX_DICT_HASH_COLLISIONS 1000
+#define Py_MAX_DICT_SLOT_COLLISIONS 1000
 
+/* Debug collision detection */
+#define DEBUG_DICT_COLLISIONS 0
+
 /* Set a key error with the specified argument, wrapping it in a
  * tuple automatically so that tuple keys are not unpacked as the
  * exception arguments. */
@@ -327,6 +333,7 @@
 register PyDictEntry *ep;
 register int cmp;
 PyObject *startkey;
+size_t hash_collisions, slot_collisions;
 
 i = (size_t)hash & mask;
 ep = &ep0[i];
@@ -361,6 +368,8 @@
 
 /* In the loop, me_key == dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
+hash_collisions = 1;
+slot_collisions = 1;
 for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
 i = (i << 2) + i + perturb + 1;
 ep = &ep0[i & mask];
@@ -387,9 +396,27 @@
  */
 return lookdict(mp, key, hash);
 }
+   #if DEBUG_DICT_COLLISIONS
+   printf("hash collisions = %zu (i=%zu)\n", hash_collisions, i);
+   #endif
+   if (++hash_collisions > Py_MAX_DICT_HASH_COLLISIONS) {
+   PyErr_SetString(PyExc_KeyError,
+   "too many hash collisions");
+   return NULL;
+   }
 }
-else if (ep->me_key == dummy && freeslot == NULL)
-freeslot = ep;
+else {
+   if (ep->me_key == dummy && freeslot == NULL)
+   freeslot = ep;
+   #if DEBUG_DICT_COLLISIONS
+   printf("slot collisions = %zu (i=%zu)\n", slot_collisions, i);
+   #endif
+   if (++slot_collisions > Py_MAX_DICT_SLOT_COLLISIONS) {
+   PyErr_SetString(PyExc_KeyError,
+   "too many slot collisions");
+   return NULL;
+   }
+   }
 }
 assert(0);  /* NOT REACHED */
 return 0;
@@ -413,6 +440,7 @@
 register size_t mask = (size_t)mp->ma_mask;
 PyDictEntry *ep0 = mp->ma_table;
 register PyDictEntry *ep;
+size_t hash_collisions, slot_collisions;
 
 /* Make sure this function doesn't have to handle non-string keys,
including subclasses of str; e.g., one reason to subclass
@@ -439,18 +467,39 @@
 
 /* In the loop, me_key == dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
+hash_collisions = 1;
+slot_collisions = 1;
 for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
 i = (i << 2) + i + perturb + 1;
 ep = &ep0[i & mask];
 if (ep->me_key == NULL)
 return freeslot == NULL ? ep : freeslot;
-if (ep->me_key == key
-|| (ep->me_hash == hash
-&& ep->me_key != dummy
-&& _PyString_Eq(ep->me_key, key)))
+if (ep->me_key == key)
 return ep;
-if (ep->me_key == dummy && freeslot == NULL)
-freeslot = ep;
+if (ep->me_hash == hash && ep->me_key != dummy) {
+   if (_PyString_Eq(ep->me_key, key))
+   return ep;
+   #if DEBUG_DICT_COLLISIONS
+   printf("hash collisions = %zu (i=%zu)\n", hash_collisions, i);
+   #endif
+   if (++hash_collisions > Py_MAX_DICT_HASH_COLLISIONS) {
+   PyErr_SetString(PyExc_KeyError,
+   "too many hash collisions");
+   return NULL;
+   }
+   }
+else {
+   if (ep->me_key == dummy && freeslot == NULL)
+   freeslot = ep;
+   #if DEBUG_DICT_COLLISIONS
+   printf("slot collisions = %zu (i=%zu)\n", slot_collisions, i);
+   #endif
+   if (++slot_collisions > Py_MAX_DICT_SLOT_COLLISIONS) {
+   PyErr_SetString(PyExc_KeyError,
+  

[issue13703] Hash collision security issue

2012-01-23 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alex Gaynor wrote:
> I'm able to put N pieces of data into the database on successive requests,
> but then *rendering* that data puts it in a dictionary, which renders that
> page unviewable by anyone.

I think you're asking a bit much here :-) A broken app is a broken
app, no matter how nice Python tries to work around it. If an
app puts too much trust into user data, it will be vulnerable
one way or another and regardless of how the user data enters
the app.

These are the collision counting possibilities we've discussed
so far:

With an collision counting exception you'd get a clear notice that
something in your data and your application is wrong and needs
fixing. The rest of your web app will continue to work fine and
you won't run into a DoS problem taking down all of your web
server.

With the proposed enhancement of collision counting + universal hash
function for Python 3.3, you'd get a warning printed to the logs, the
dict implementation would self-heal and your page is viewable nonetheless.
The admin would then see the log entry and get a chance to fix the
problem.

Note: Even if Python works around the problem successfully, there's no
guarantee that the data doesn't end up being processed by some other
tool in the chain with similar problems. All this is a work-around
for an application bug, nothing more. Silencing the problem
by e.g. using randomization in the string hash algorithm
doesn't really help in identifying the bug.

Overall, I don't think we should make Python's hash function
non-deterministic. Even with the universal hash function idea,
the dict implementation should use a predefined way of determining
the next hash parameter to use, so that running the application
twice against attack data will still result in the same data
output.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-23 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Dave Malcolm wrote:
> 
> Dave Malcolm  added the comment:
> 
> On Fri, 2012-01-06 at 12:52 +, Marc-Andre Lemburg wrote:
>> Marc-Andre Lemburg  added the comment:
>>
>> Demo patch implementing the collision limit idea for Python 2.7.
>>
>> --
>> Added file: http://bugs.python.org/file24151/hash-attack.patch
>>
> 
> Marc: is this the latest version of your patch?

Yes. As mentioned in the above message, it's just a demo of how
the collision limit idea can be implemented.

> Whether or not we go with collision counting and/or adding a random salt
> to hashes and/or something else, I've had a go at updating your patch
> 
> Although debate on python-dev seems to have turned against the
> collision-counting idea, based on flaws reported by Frank Sievertsen
> http://mail.python.org/pipermail/python-dev/2012-January/115726.html
> it seemed to me to be worth at least adding some test cases to flesh out
> the approach.  Note that the test cases deliberately avoid containing
> "hostile" data.

Martin's example is really just a red herring: it doesn't matter
where the hostile data originates or how it gets into the application.
There are many ways an attacker can get the O(n^2) worst case
timing triggered.

Frank's example is an attack on the second possible way to
trigger the O(n^2) behavior. See msg150724 further above where I
listed the two possibilities:

"""
An attack can be based on trying to find many objects with the same
hash value, or trying to find many objects that, as they get inserted
into a dictionary, very often cause collisions due to the collision
resolution algorithm not finding a free slot.
"""

My demo patch only addresses the first variant. In order to cover
the second variant as well, you'd have to count and limit the
number of iterations in the perturb for-loop of the lookdict()
functions where the hash value of the slot does not match the
key's hash value.

Note that the second variant is both a lot less likely to trigger
(due to the dict getting resized on a regular basis) and the
code involved a lot faster than the code for the first
variant (which requires a costly object comparison), so the
limit for the second variant would have to be somewhat higher
than for the first.

BTW: The collision counting patch chunk for the string dicts in my
demo patch is wrong. I've attached a corrected version. In the
original patch it was counting both collision variants with the
same counter and limit.

--
Added file: http://bugs.python.org/file24295/hash-attack-2.patch

___
Python tracker 
<http://bugs.python.org/issue13703>
___Index: Objects/dictobject.c
===
--- Objects/dictobject.c(revision 88933)
+++ Objects/dictobject.c(working copy)
@@ -9,6 +9,8 @@
 
 #include "Python.h"
 
+/* Maximum number of allowed hash collisions. */
+#define Py_MAX_DICT_COLLISIONS 1000
 
 /* Set a key error with the specified argument, wrapping it in a
  * tuple automatically so that tuple keys are not unpacked as the
@@ -327,6 +329,7 @@
 register PyDictEntry *ep;
 register int cmp;
 PyObject *startkey;
+size_t collisions;
 
 i = (size_t)hash & mask;
 ep = &ep0[i];
@@ -361,6 +364,7 @@
 
 /* In the loop, me_key == dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
+collisions = 1;
 for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
 i = (i << 2) + i + perturb + 1;
 ep = &ep0[i & mask];
@@ -387,6 +391,11 @@
  */
 return lookdict(mp, key, hash);
 }
+   if (++collisions > Py_MAX_DICT_COLLISIONS) {
+   PyErr_SetString(PyExc_KeyError,
+   "too many hash collisions");
+   return NULL;
+   }
 }
 else if (ep->me_key == dummy && freeslot == NULL)
 freeslot = ep;
@@ -413,6 +422,7 @@
 register size_t mask = (size_t)mp->ma_mask;
 PyDictEntry *ep0 = mp->ma_table;
 register PyDictEntry *ep;
+size_t collisions;
 
 /* Make sure this function doesn't have to handle non-string keys,
including subclasses of str; e.g., one reason to subclass
@@ -439,17 +449,24 @@
 
 /* In the loop, me_key == dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
+collisions = 1;
 for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
 i = (i << 2) + i + perturb + 1;
 ep = &ep0[i & mask];
 if (ep->me_key == NULL)
 return freeslot == NULL ? ep : f

[issue13703] Hash collision security issue

2012-01-20 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Charles-François Natali wrote:
> 
> Anyway, I still think that the hash randomization is the right way to
> go, simply because it does solve the problem, whereas the collision
> counting doesn't: Martin made a very good point on python-dev with his
> database example.

For completeness, I quote Martin here:

"""
The main issue with that approach is that it allows a new kind of attack.

An attacker now needs to find 1000 colliding keys, and submit them
one-by-one into a database. The limit will not trigger, as those are
just database insertions.

Now, if the applications also as a need to read the entire database
table into a dictionary, that will suddenly break, and not for the
attacker (which would be ok), but for the regular user of the
application or the site administrator.

So it may be that this approach actually simplifies the attack, making
the cure worse than the disease.
"""

Martin is correct in that it is possible to trick an application
into building some data pool which can then be used as indirect
input for an attack.

What I don't see is what's wrong with the application raising
an exception in case it finds such data in an untrusted source
(reading arbitrary amounts of user data from a database is just
as dangerous as reading such data from any other source).

The exception will tell the programmer to be more careful and
patch the application not to read untrusted data without
additional precautions.

It will also tell the maintainer of the application that there
was indeed an attack on the system which may need to be
tracked down.

Note that the collision counting demo patch is trivial - I just
wanted to demonstrate how it works. As already mentioned, there's
room for improvement:

If Python objects were to provide an additional
method for calculating a universal hash value (based on an
integer input parameter), the dictionary in question could
use this to rehash itself and avoid the attack. Think of this
as "randomization when needed". (*)

Since the dict would still detect the problem, it could also
raise a warning to inform the maintainer of the application.

So you get the best of both worlds and randomization would only
kick in when it's really needed to keep the application running.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Frank Sievertsen wrote:
> 
> Frank Sievertsen  added the comment:
> 
>> The suffix only introduces a constant change in all hash values
>> output, so even if you don't know the suffix, you can still
>> generate data sets with collisions by just having the prefix.
> 
> That's true. But without the suffix, I can pretty easy and efficient guess 
> the prefix by just seeing the result of a few well-chosen and short 
> repr(dict(X)). I suppose that's harder with the suffix.

Since the hash function is known, it doesn't make things much
harder. Without suffix you just need hash('') to find out what
the prefix is. With suffix, two values are enough.

Say P is your prefix and S your suffix. Let's say you can get the
hash values of A = hash('') and B = hash('\x00').

With Victor's hash function you have (IIRC):

A = hash('') = P ^ (0<<7) ^ 0 ^ S = P ^ S
B = hash('\x00') = ((P ^ (0<<7)) * 103) ^ 0 ^ 1 ^ S = (P * 103) ^ 1 ^ S

Let X = A ^ B, then

X = P ^ (P * 103) ^ 1

since S ^ S = 0 and 0 ^ Y = Y (for any Y), i.e. the suffix doesn't
make any difference.

For P < 50, you can then easily calculate P from X
using:

P = X // 102

(things obviously get tricky once overflow kicks in)

Note that for number hashes the randomization doesn't work at all,
since there's no length or feedback loop involved.

With Victor's approach hash(0) would output the whole seed,
but even if the seed is not known, creating an attack data
set is trivial, since hash(x) = P ^ x ^ S.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

[Reposting, since roundup removed part of the Python output]

M.-A. Lemburg wrote:
> Note that the integer attack also applies to other number types
> in Python:
> 
> --> (hash(3), hash(3.0), hash(3+0j)
> (3, 3, 3)
> 
> See Tim's post I referenced earlier on for the reasons. Here's
> a quick summary ;-) ...
> 
> --> {3:1, 3.0:2, 3+0j:3}
> {3: 3}

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> I tried the collision counting with a low number of collisions:
> ... no false positives with a limit of 50 collisions ...

Thanks for running those tests. Looks like a limit lower than 1000
would already do just fine.

Some timings showing how long it would take to hit a limit:

# 100
python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 100))"
100 loops, best of 3: 297 usec per loop

# 250
python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 250))"
100 loops, best of 3: 1.46 msec per loop

# 500
python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 500))"
100 loops, best of 3: 5.73 msec per loop

# 750
python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 750))"
100 loops, best of 3: 12.7 msec per loop

# 1000
python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 1000))"
100 loops, best of 3: 22.4 msec per loop

These timings have to matched against the size of the payload
needed to trigger those limits.

In any case, the limit needs to be configurable like the hash seed
in the randomization patch.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>> Please note, that you'd have to extend the randomization to
>> all other Python data types as well in order to reach the same level
>> of security as the collision counting approach.
> 
> You also have to extend the collision counting to sets, by the way.

Indeed, but that's easy, since the set implementation derives from
the dict implementation.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> ...
> So I expect something similar in applications: no change in the
> applications, but a lot of hacks/tricks in tests.

Tests usually check output of an application given a certain
input. If those fail with the randomization, then it's likely
real-world application uses will show the same kinds of failures
due to the application changing from deterministic to
non-deterministic via the randomization.

>> BTW: The patch still includes the unnecessary _Py_unicode_hash_secret.suffix
>> which needlessly complicates the code and doesn't any additional
>> protection against hash value collisions
> 
> How does it complicate the code? It adds an extra XOR to hash(str) and
> 4 or 8 bytes in memory, that's all. It is more difficult to compute
> the secret from hash(str) output if there is a prefix *and* a suffix.
> If there is only a prefix, knowning a single hash(str) value is just
> enough to retrieve directly the secret.

The suffix only introduces a constant change in all hash values
output, so even if you don't know the suffix, you can still
generate data sets with collisions by just having the prefix.

>> I don't think it affects more than 0.01% of applications/users :)
> 
> It would help to try a patched Python on a real world application like
> Django to realize how much code is broken (or not) by a randomized
> hash function.

That would help for both approaches, indeed.

Please note, that you'd have to extend the randomization to
all other Python data types as well in order to reach the same level
of security as the collision counting approach.

As-is the randomization patch does not solve the integer key attack and
even though parsers such as JSON and XML-RPC aren't directly affected,
it is well possible that stringified integers such as IDs are converted
back to integers later during processing, thereby triggering the
attack.

Note that the integer attack also applies to other number types
in Python:

(3, 3, 3)

See Tim's post I referenced earlier on for the reasons. Here's
a quick summary ;-) ...

{3: 3}

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-18 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> Patch version 7:
>  - Make PyOS_URandom() private (renamed to _PyOS_URandom)
>  - os.urandom() releases the GIL for I/O operation for its implementation 
> reading /dev/urandom
>  - move _Py_unicode_hash_secret_t documentation into unicode_hash()
> 
> I moved also fixes for tests in a separated patch: random_fix-tests.patch.

Don't you think that the number of corrections you have to apply in order
to get the tests working again shows how much impact such a change would
have in real-world applications ?

Perhaps we should start to think about a compromise: make both the
collision counting and the hash seeding optional and let the user
decide which option is best.

BTW: The patch still includes the unnecessary _Py_unicode_hash_secret.suffix
which needlessly complicates the code and doesn't any additional
protection against hash value collisions.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-16 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Eric Snow wrote:
> 
> Eric Snow  added the comment:
> 
>> The vulnerability is known since 2003 (Usenix 2003): read "Denial of
>> Service via Algorithmic Complexity Attacks" by Scott A. Crosby and Dan
>> S. Wallach.
> 
> Crosby started a meaningful thread on python-dev at that time similar to the 
> current one:
> 
>   http://mail.python.org/pipermail/python-dev/2003-May/035874.html
> 
> It includes a some good insight into the problem.

Thanks for the pointer. Some interesting postings...

Vulnerability of applications:
http://mail.python.org/pipermail/python-dev/2003-May/035887.html

Speed of hashing, portability and practical aspects:
http://mail.python.org/pipermail/python-dev/2003-May/035902.html

Changing the hash function:
http://mail.python.org/pipermail/python-dev/2003-May/035911.html
http://mail.python.org/pipermail/python-dev/2003-May/035915.html

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-12 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Frank Sievertsen wrote:
> 
> I don't want my software to stop working because someone managed to enter 
> 1000 bad strings into it. Think of a software that handles names of customers 
> or filenames. We don't want it to break completely just because someone 
> entered a few clever names.

Collision counting is just a simple way to trigger an action. As I mentioned
in my proposal on this ticket, raising an exception is just one way to deal
with the problem in case excessive collisions are found. A better way is to
add a universal hash method, so that the dict can adapt to the data and
modify the hash functions for just that dict (without breaking other
dicts or changing the standard hash functions).

Note that raising an exception doesn't completely break your software.
It just signals a severe problem with the input data and a likely
attack on your software. As such, it's no different than turning on DOS
attack prevention in your router.

In case you do get an exception, a web server will simply return a 500 error
and continue working normally.

For other applications, you may see a failure notice in your logs. If
you're sure that there are no possible ways to attack the application using
such data, then you can simply disable the feature to prevent such
exceptions.

> Randomization fixes most of these problems.

See my list of issues with this approach (further up on this ticket).

> However, it breaks the steadiness of hash(X) between two runs of the same 
> software. There's probably code out there that assumes that hash(X) always 
> returns the same value: database- or serialization-modules, for example.
> 
> There might be good reasons to also have a steady hash-function available. 
> The broken code is hard to fix if no such a function is available at all. 
> Maybe it's possible to add a second steady hash-functions later again?

This is one of the issues I mentioned.

> For the moment I think the best way is to turn on randomization of hash() by 
> default, but having a way to turn it off.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-11 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>> On my slow dev machine 1000 collisions run in around 22ms:
>>
>> python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 
>> 1000))"
>> 100 loops, best of 3: 22.4 msec per loop
>>
>> Using this for a DOS attack would be rather noisy, much unlike
>> sending a single POST.
> 
> Note that sending one POST is not enough, unless the attacker is content
> with blocking *one* worker process for a couple of seconds or minutes
> (which is a rather tiny attack if you ask me :-)). Also, you can combine
> many dicts in a single JSON list, so that the 1000 limit isn't
> overreached for any of the dicts.

Right, but such an approach only scales linearly and doesn't
exhibit the quadric nature of the collision resolution.

The above with 1 items takes 5 seconds on my machine.
The same with 10 items is still running after 16 minutes.

> So in all cases the attacker would have to send many of these POST
> requests in order to overwhelm the target machine. That's how DOS
> attacks work AFAIK.

Depends :-) Hiding a few tens of such requests in the input stream
of a busy server is easy. Doing the same with thousands of requests
is a lot harder.

FWIW: The above dict string version just has some 263kB for the 10
case, 114kB if gzip compressed.

>> Yes, which is why the patch should be disabled by default (using
>> an env var) in dot-releases. It's probably also a good idea to
>> make the limit configurable to adjust to ones needs.
> 
> Agreed if it's disabled by default then it's not a problem, but then
> Python is vulnerable by default...

Yes, but at least the user has an option to switch on the added
protection. We'd need some field data to come to a decision.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-11 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Mark Dickinson wrote:
> 
> Mark Dickinson  added the comment:
> 
> [Antoine]
>> Also, how about false positives? Having legitimate programs break
>> because of legitimate data would be a disaster.
> 
> This worries me, too.
> 
> [MAL]
>> Yes, which is why the patch should be disabled by default (using
>> an env var) in dot-releases.
> 
> Are you proposing having it enabled by default in Python 3.3?

Possibly, yes. Depends on whether anyone comes up with a problem in
the alpha, beta, RC release cycle.

It would be great to have the universal hash method approach for
Python 3.3. That way Python could self-heal itself in case it
finds too many collisions. My guess is that it's still better
to raise an exception, though, since it would uncover either
attacks or programming errors.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-11 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>> OTOH, the collision counting patch is very simple, doesn't have
>> the performance issues and provides real protection against the
>> attack.
> 
> I don't know about real protection: you can still slow down dict
> construction by 1000x (the number of allowed collisions per lookup),
> which can be enough combined with a brute-force DOS.

On my slow dev machine 1000 collisions run in around 22ms:

python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 1000))"
100 loops, best of 3: 22.4 msec per loop

Using this for a DOS attack would be rather noisy, much unlike
sending a single POST.

Note that the choice of 1000 as limit is rather arbitrary. I just
chose it because it's high enough because it's very unlikely to be
hit by an application that is not written to trigger it and it's low
enough to still provide a good run-time behavior. Perhaps an
even lower figure would be better.

> Also, how about false positives? Having legitimate programs break
> because of legitimate data would be a disaster.

Yes, which is why the patch should be disabled by default (using
an env var) in dot-releases. It's probably also a good idea to
make the limit configurable to adjust to ones needs.

Still, it is *very* unlikely that you run into real data causing
more than 1000 collisions for a single insert.

For full protection the universal hash method idea would have
to be implemented (adding a parameter to the hash methods, so
that they can be parametrized). This would then allow switching
the dict to an alternative hash implementation resolving the collision
problem, in case the implementation detects high number of
collisions.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-11 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Mark Shannon wrote:
> 
> Mark Shannon  added the comment:
> 
>>>>  * the method would need to be implemented for all hashable Python types
>>> It was already discussed, and it was said that only hash(str) need to
>>> be modified.
>>
>> Really ? What about the much simpler attack on integer hash values ?
>>
>> You only have to send a specially crafted JSON dictionary with integer
>> keys to a Python web server providing JSON interfaces in order to
>> trigger the integer hash attack.
> 
> JSON objects are decoded as dicts with string keys, integers keys are 
> not possible.
> 
>  >>> json.loads(json.dumps({1:2}))
> {'1': 2}

Thanks for the correction. Looks like XML-RPC also doesn't accept
integers as dict keys. That's good :-)

However, as Paul already noted, such attacks can also occur in other
places or parsers in an application, e.g. when decoding FORM parameters
that use integers to signal a line or parameter position (example:
value_1=2&value_2=3...) which are then converted into a dictionary
mapping the position integer to the data.

marshal and pickle are vulnerable, but then you normally don't expose
those to untrusted data.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-11 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>>  * it is exceedingly complex
> 
> Which part exactly? For hash(str), it just add two extra XOR.

I'm not talking specifically about your patch, but the whole idea
and the needed changes in general.

>>  * the method would need to be implemented for all hashable Python types
> 
> It was already discussed, and it was said that only hash(str) need to
> be modified.

Really ? What about the much simpler attack on integer hash values ?

You only have to send a specially crafted JSON dictionary with integer
keys to a Python web server providing JSON interfaces in order to
trigger the integer hash attack.

The same goes for the other Python data types.

>>  * it causes startup time to increase (you need urandom data for
>>   every single hashable Python data type)
> 
> My patch reads 8 or 16 bytes from /dev/urandom which doesn't block. Do
> you have a benchmark showing a difference?
> 
> I didn't try my patch on Windows yet.

Your patch only implements the simple idea of adding an init
vector and a fixed suffix vector (which you don't need since
it doesn't prevent hash collisions).

I don't think that's good enough, since
it doesn't change how the hash algorithm works on the actual
data, but instead just shifts the algorithm to a different
sequence. If you apply the same logic to the integer hash
function, you'll see that more clearly.

Paul's algorithm is much more secure in this respect, but it
requires more random startup data.

>>  * it causes run-time to increase due to changes in the hash
>>   algorithm (more operations in the tight loop)
> 
> I posted a micro-benchmark on hash(str) on python-dev: the overhead is
> nul. Did you have numbers showing that the overhead is not nul?

For the simple solution, that's an expected result, but if you want
more safety, then you'll see a hit due to the random data getting
XOR'ed in every single loop.

>>  * causes different processes in a multi-process setup to use different
>>   hashes for the same object
> 
> Correct. If you need to get the same hash, you can disable the
> randomized hash (PYTHONHASHSEED=0) or use a fixed seed (e.g.
> PYTHONHASHSEED=42).

So you have the choice of being able to work in a multi-process
environment and be vulnerable to the attack or not. I think we
can do better :-)

Note that web servers written in Python tend to be long running
processes, so an attacker has lots of time to test various
seeds.

>>  * doesn't appear to work well in embedded interpreters that
>>   regularly restarted interpreters (AFAIK, some objects persist across
>>   restarts and those will have wrong hash values in the newly started
>>   instances)
> 
> test_capi runs _testembed which restarts a embedded interpreters 3
> times, and the test pass (with my patch version 5). Can you write a
> script showing the problem if there is a real problem?
> 
> In an older version of my patch, the hash secret was recreated at each
> initiliazation. I changed my patch to only generate the secret once.

Ok, that should fix the case.

Two more issue that I forgot:

 * enabling randomized hashing can make debugging a lot harder, since
   it's rather difficult to reproduce the same state in a controlled
   way (unless you record the hash seed somewhere in the logs)

and even though applications should not rely on the order of dict
repr()s or str()s, they do often enough:

 * randomized hashing will result in repr() and str() of dictionaries
   to be random as well

>> The most important issue, though, is that it doesn't really
>> protect Python against the attack - it only makes it less
>> likely that an adversary will find the init vector (or a way
>> around having to find it via crypt analysis).
> 
> I agree that the patch is not perfect. As written in the patch, it
> just makes the attack more complex. I consider that it is enough.

Wouldn't you rather see a fix that works for all hash functions
and Python objects ? One that doesn't cause performance
issues ?

The collision counting idea has this potential.

> Perl has a simpler protection than the one proposed in my patch. Is
> Perl vulnerable to the hash collision vulnerability?

I don't know what Perl did or how hashing works in Perl, so cannot
comment on the effect of their fix. FWIW, I don't think that we
should use Perl or Java as reference here.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-11 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> Patch version 5 fixes test_unicode for 64-bit system.

Victor, I don't think the randomization idea is going anywhere. The
code has many issues:

 * it is exceedingly complex
 * the method would need to be implemented for all hashable
   Python types
 * it causes startup time to increase (you need urandom data for
   every single hashable Python data type)
 * it causes run-time to increase due to changes in the hash
   algorithm (more operations in the tight loop)
 * causes different processes in a multi-process setup to use different
   hashes for the same object
 * doesn't appear to work well in embedded interpreters that
   regularly restarted interpreters (AFAIK, some objects persist across
   restarts and those will have wrong hash values in the newly started
   instances)

The most important issue, though, is that it doesn't really
protect Python against the attack - it only makes it less
likely that an adversary will find the init vector (or a way
around having to find it via crypt analysis).

OTOH, the collision counting patch is very simple, doesn't have
the performance issues and provides real protection against the
attack. Even better still, it can detect programming errors in
hash method implementations.

IMO, it would be better to put efforts into refining the collision
detection patch (perhaps adding support for the universal hash
method slot I mentioned) and run some real life tests with it.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-09 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Marc-Andre Lemburg wrote:
> 
> Marc-Andre Lemburg  added the comment:
> 
> Christian Heimes wrote:
>> Marc-Andre:
>> Have you profiled your suggestion? I'm interested in the speed implications. 
>> My gut feeling is that your idea could be slower, since you have added more 
>> instructions to a tight loop, that is execute on every lookup, insert, 
>> update and deletion of a dict key. The hash modification could have a 
>> smaller impact, since the hash is cached. I'm merely speculating here until 
>> we have some numbers to compare.
> 
> I haven't done any profiling on this yet, but will run some
> tests.

I ran pybench and pystone: neither shows a significant change.

I wish we had a simple to run benchmark based on Django to allow
checking such changes against real world applications. Not that I
expect different results from such a benchmark...

To check the real world impact, I guess it would be best to
run a few websites with the patch for a week and see whether the
collision exception gets raised.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-08 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Christian Heimes wrote:
> Marc-Andre:
> Have you profiled your suggestion? I'm interested in the speed implications. 
> My gut feeling is that your idea could be slower, since you have added more 
> instructions to a tight loop, that is execute on every lookup, insert, update 
> and deletion of a dict key. The hash modification could have a smaller 
> impact, since the hash is cached. I'm merely speculating here until we have 
> some numbers to compare.

I haven't done any profiling on this yet, but will run some
tests.

The lookup functions in the dict implementation are optimized
to make the first non-collision case fast. The patch doesn't touch this
loop. The only change is in the collision case, where an increment
and comparison is added (and then only after the comparison which
is the real cost factor in the loop). I did add a printf() to
see how often this case occurs - it's a surprisingly rare case,
which suggests that Tim, Christian and all the others that have
invested considerable time into the implementation have done
a really good job here.

BTW: I noticed that a rather obvious optimization appears to be
missing from the Python dict initialization code: when passing in
a list of (key, value) pairs, the implementation doesn't make
use of the available length information and still starts with an
empty (small) dict table and then iterates over the pairs, increasing
the table size as necessary. It would be better to start with a
table that is presized to O(len(data)). The dict implementation
already provides such a function, but it's not being used
in the case dict(pair_list). Anyway, just an aside.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-08 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Tim Peters wrote:
> 
> Tim Peters  added the comment:
> 
> [Marc-Andre]
>> BTW: I wonder how long it's going to take before
>> someone figures out that our merge sort based
>> list.sort() is vulnerable as well... its worst-
>> case performance is O(n log n), making attacks
>> somewhat harder.
> 
> I wouldn't worry about that, because nobody could stir up anguish
> about it by writing a paper ;-)
> 
> 1. O(n log n) is enormously more forgiving than O(n**2).
> 
> 2. An attacker need not be clever at all:  O(n log n) is not only
> sort()'s worst case, it's also its _expected_ case when fed randomly
> ordered data.
> 
> 3. It's provable that no comparison-based sorting algorithm can have
> better worst-case asymptotic behavior when fed randomly ordered data.
> 
> So if anyone whines about this, tell 'em to go do something useful instead :-)

Right on all accounts :-)

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Paul McMillan wrote:
> 
>> I'll upload a patch that demonstrates the collisions counting
>> strategy to show that detecting the problem is easy. Whether
>> just raising an exception is a good idea, is another issue.
> 
> I'm in cautious agreement that collision counting is a better
> strategy. The dict implementation performance would suffer from
> randomization.
> 
>> The dict implementation could then alter the hash parameter
>> and recreate the dict table in case the number of collisions
>> exceeds a certain limit, thereby actively taking action
>> instead of just relying on randomness solving the issue in
>> most cases.
> 
> This is clever. You basically neuter the attack as you notice it but
> everything else is business as usual. I'm concerned that this may end
> up being costly in some edge cases (e.g. look up how many collisions
> it takes to force the recreation, and then aim for just that many
> collisions many times). Unfortunately, each dict object has to
> discover for itself that it's full of offending hashes. Another
> approach would be to neuter the offending object by changing its hash,
> but this would require either returning multiple values, or fixing up
> existing dictionaries, neither of which seems feasible.

I ran some experiments with the collision counting patch and
could not trigger it in normal applications, not even in cases
that are documented in the dict implementation to have a poor
collision resolution behavior (integers with zeros the the low bits).
The probability of having to deal with dictionaries that create
over a thousand collisions for one of the key objects in a
real life application appears to be very very low.

Still, it may cause problems with existing applications for the
Python dot releases, so it's probably safer to add it in a
disabled-per-default form there (using an environment variable
to adjust the setting). For 3.3 it could be enabled per default
and it would also make sense to allow customizing the limit
using a sys module setting.

The idea with adding a parameter to the hash method/slot in order
to have objects provide a hash family function instead of a fixed
unparametrized hash function would probably have to be implemented
as additional hash method, e.g. .__uhash__() and tp_uhash ("u"
for universal).

The builtin types should then grow such methods
in order to make hashing safe against such attacks. For objects
defined in 3rd party extensions, we would need to encourage
implementing the slot/method as well. If it's not implemented,
the dict implementation would have to fallback to raising an
exception.

Please note that I'm just sketching things here. I don't have
time to work on a full-blown patch, just wanted to show what
I meant with the collision counting idea and demonstrate that
it actually works as intended.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Here's an example of hash-attack.patch finding an on-purpose
programming error (hashing all objects to the same value):

http://stackoverflow.com/questions/4865325/counting-collisions-in-a-python-dictionary
(see the second example on the page for @Winston Ewert's solution)

With the patch you get:

Traceback (most recent call last):
  File "testcollisons.py", line 20, in 
d[o] = 1
KeyError: 'too many hash collisions'

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
> hash-attack.patch does never decrement the collision counter.

Why should it ? It's only used as local variable in the lookup function.

Note that the limit only triggers on a per-key basis. It's not
a limit on the total number of collisions in the table, so you don't
need to keep the number of collisions stored on the object.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Stupid email interface again... here's the full text:

The hash-attack.patch solves the problem for the integer case
I posted earlier on and doesn't cause any problems with the
test suite.

>>> d = dict((x*(2**64 - 1), hash(x*(2**64 - 1))) for x in xrange(1, 100))
>>> d = dict((x*(2**64 - 1), hash(x*(2**64 - 1))) for x in xrange(1, 1000))
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 'too many hash collisions'

It also doesn't change the hashing or dict repr in existing
applications.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

The hash-attack.patch solves the problem for the integer case
I posted earlier on and doesn't cause any problems with the
test suite.

Traceback (most recent call last):
  File "", line 1, in 
KeyError: 'too many hash collisions'

It also doesn't change the hashing or dict repr in existing
applications.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Demo patch implementing the collision limit idea for Python 2.7.

--
Added file: http://bugs.python.org/file24151/hash-attack.patch

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Before continuing down the road of adding randomness to hash
functions, please have a good read of the existing dictionary
implementation:

"""
Major subtleties ahead:  Most hash schemes depend on having a "good" hash
function, in the sense of simulating randomness.  Python doesn't:  its most
important hash functions (for strings and ints) are very regular in common
cases:

[0, 1, 2, 3]
>>> map(hash, ("namea", "nameb", "namec", "named"))
[-1658398457, -1658398460, -1658398459, -1658398462]
>>>

This isn't necessarily bad!  To the contrary, in a table of size 2**i, taking
the low-order i bits as the initial table index is extremely fast, and there
are no collisions at all for dicts indexed by a contiguous range of ints.
The same is approximately true when keys are "consecutive" strings.  So this
gives better-than-random behavior in common cases, and that's very desirable.
...
"""

There's also a file called dictnotes.txt which has more interesting
details about how the implementation is designed.

Please note that the term "collision" is used in a slightly different
way: it refers to trying to find an empty slot in the dictionary
table. Having a collision implies that the hash values of two distinct
objects are the same, but you also get collisions in case two distinct
objects with different hash values get mapped to the same table entry.

An attack can be based on trying to find many objects with the same
hash value, or trying to find many objects that, as they get inserted
into a dictionary, very often cause collisions due to the collision
resolution algorithm not finding a free slot.

In both cases, the (slow) object comparisons needed to find an
empty slot is what makes the attack practical, if the application
puts too much trust into large blobs of input data - which is
the actual security issues we're trying to work around here...

Given the dictionary implementation notes, I'm even less certain
that the randomization change is a good idea. It will likely
introduce a performance hit due to both the added complexity in
calculating the hash as well as the reduced cache locality of
the data in the dict table.

I'll upload a patch that demonstrates the collisions counting
strategy to show that detecting the problem is easy. Whether
just raising an exception is a good idea, is another issue.

It may be better to change the tp_hash slot in Python 3.3
to take an argument, so that the dict implementation can
use the hash function as universal hash family function
(see http://en.wikipedia.org/wiki/Universal_hash).

The dict implementation could then alter the hash parameter
and recreate the dict table in case the number of collisions
exceeds a certain limit, thereby actively taking action
instead of just relying on randomness solving the issue in
most cases.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Paul McMillan wrote:
> 
> This is not something that can be fixed by limiting the size of POST/GET. 
> 
> Parsing documents (even offline) can generate these problems. I can create 
> books that calibre (a Python-based ebook format shifting tool) can't convert, 
> but are otherwise perfectly valid for non-python devices. If I'm allowed to 
> insert usernames into a database and you ever retrieve those in a dict, 
> you're vulnerable. If I can post things one at a time that eventually get 
> parsed into a dict (like the tag example), you're vulnerable. I can generate 
> web traffic that creates log files that are unparsable (even offline) in 
> Python if dicts are used anywhere. Any application that accepts data from 
> users needs to be considered.
> 
> Even if the web framework has a dictionary implementation that randomizes the 
> hashes so it's not vulnerable, the entire python standard library uses dicts 
> all over the place. If this is a problem which must be fixed by the 
> framework, they must reinvent every standard library function they hope to 
> use.
> 
> Any non-trivial python application which parses data needs the fix. The 
> entire standard library needs the fix if is to be relied upon by applications 
> which accept data. It makes sense to fix Python.

Agreed: Limiting the size of POST requests only applies to *web* applications.
Other applications will need other fixes.

Trying to fix the problem in general by tweaking the hash function to
(apparently) make it hard for an attacker to guess a good set of
colliding strings/integers/etc. is not really a good solution. You'd
only be making it harder for script kiddies, but as soon as someone
crypt-analysis the used hash algorithm, you're lost again.

You'd need to use crypto hash functions or universal hash functions
if you want to achieve good security, but that's not an option for
Python objects, since the hash functions need to be as fast as possible
(which rules out crypto hash functions) and cannot easily drop the invariant
"a=b => hash(a)=hash(b)" (which rules out universal hash functions, AFAICT).

IMO, the strategy to simply cap the number of allowed collisions is
a better way to achieve protection against this particular resource
attack. The probability of having valid data reach such a limit is
low and, if configurable, can be made 0.

> Of course we must fix all the basic hashing functions in python, not just the 
> string hash. There aren't that many. 

... not in Python itself, but if you consider all the types in Python
extensions and classes implementing __hash__ in user code, the number
of hash functions to fix quickly becomes unmanageable.

> Marc-Andre:
> If you look at my proposed code, you'll notice that we do more than simply 
> shift the period of the hash. It's not trivial for an attacker to create 
> colliding hash functions without knowing the key.

Could you post it on the ticket ?

BTW: I wonder how long it's going to take before someone figures out
that our merge sort based list.sort() is vulnerable as well... its
worst-case performance is O(n log n), making attacks somewhat harder.
The popular quicksort which Python used for a long time has O(n²),
making it much easier to attack, but fortunately, we replaced it
with merge sort in Python 2.3, before anyone noticed ;-)

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



<    4   5   6   7   8   9   10   11   12   13   >