Re: [Python-Dev] os.walk() with followlinks=False

2012-01-12 Thread Nick Coghlan
On Thu, Jan 12, 2012 at 12:54 AM, Antoine Pitrou solip...@pitrou.net wrote:
 On Wed, 11 Jan 2012 12:25:46 +1000
 Nick Coghlan ncogh...@gmail.com wrote:
 If it's the latter... could we change it for 3.3, or is that too
 significant a breach of backwards compatibility?

 I think we could change it.

For the benefit of those not following the tracker issue,
Charles-François pointed out that putting the symlinks-to-directories
into the files list instead of the subdirectory list isn't really any
better (it just moves the problem to different use cases, such as
those that actually want to read the file contents).

With that being the case, I've changed my mind and figure we may as
well leave the current behaviour alone. I'll think about adding a
filter to walkdir that makes it easy to control the way they're
handled [1].

[1] 
https://bitbucket.org/ncoghlan/walkdir/issue/9/better-handling-of-dir-symlinks

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-12 Thread Victor Stinner
Many people proposed their own idea to fix the vulnerability, but only
3 wrote a patch:

- Glenn Linderman proposes to fix the vulnerability by adding a new
safe dict type (only accepting string keys). His proof-of-concept
(SafeDict.py) uses a secret of 64 random bits and uses it to compute
the hash of a key.
- Marc Andre Lemburg proposes to fix the vulnerability directly in
dict (for any key type). The patch raises an exception if a lookup
causes more than 1000 collisions.
- I propose to fix the vulnerability only in the Unicode hash (not for
other types). My patch adds a random secret initialized at startup (it
can be disabled or fixed using an environment variable).

--

I consider that Glenn's proposition is not applicable in practice
because all applications and all libraries have to be patched to use
the new safe dict type.

Some people are concerned by possible regression introduced by Marc's
proposition: his patch may raise an exception for legitimate data.

My proposition tries to be just enough secure with a low (runtime
performance) overhead. My patch becomes huge (and so backporting is
more complex), whereas Marc's patch is very simple and so trivial to
backport.

--

It is still unclear to me if the fix should be enabled by default for
Python  3.3. Because the overhead (of my patch) is low, I would
prefer to enable the fix by default, to protect everyone with a simple
Python upgrade.

I prefer to explain how to disable explicitly the randomized hash
(PYTHONHASHSEED=0) (or how to fix application bugs) to people having
troubles with randomized hash, instead of leaving the hole open by
default.

--

We might change hash() for types other than str, but it looks like web
servers are only concerned by dict with string keys.

We may use Paul's hash function if mine is not enough secure.

My patch doesn't fix the DoS, it just make the attack more complex.
The attacker cannot pregenerate data for an attack: (s)he has first to
compute the hash secret, and then compute hash collisions using the
secret. The hash secret is a least 64 bits long (128 bits on a 64 bit
system). So I hope that computing collisions requires a lot of CPU
time (is slow) to make the attack ineffective with today computers.

--

I plan to write a nice patch for Python 3.3, then write a simpler
patch for 3.1 and 3.2 (duplicate os.urandom code to keep it unchanged,
maybe don't create a new random.c file, maybe don't touch the test
suite while the patch breaks many tests), and finally write patches
for Python 2.6 and 2.7.

Details about my patch:

- I tested it on Linux (32 and 64 bits) and Windows (Seven 64 bits)
- a new PYTHONSEED environment variable allow to control the
randomized hash: PYTHONSEED=0 disables completly the randomized hash
(restore the previous behaviour), PYTHONSEED=value uses a fixed seed
for processes sharing data and needind same hash values
(multiprocessing users?)
- no overhead on hash(str)
- no startup overhead on Linux
- startup overhead is 10% on Windows (see the issue, I propose another
solution with a startup overhead of 1%)

The patch is not done, some tests are still failing because of the
randomized hash.

--

FYI, PHP released a version 5.3.9 adding max_input_vars directive to
prevent attacks based on hash collisions (CVE-2011-4885).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-12 Thread Guido van Rossum
Hm... I started out as a big fan of the randomized hash, but thinking more
about it, I actually believe that the chances of some legitimate app having
1000 collisions are way smaller than the chances that somebody's code will
break due to the variable hashing. In fact we know for a fact that the
latter will break code, since it changes the order of items in a dict. This
affects many tests written without this in mind, and I assume there will be
some poor sap out there who uses Python's hash() function to address some
external persistent hash table or some other external datastructure. How
pathological the data needs to be before the collision counter triggers?
I'd expect *very* pathological.

This is depending on how the counting is done (I didn't look at MAL's
patch), and assuming that increasing the hash table size will generally
reduce collisions if items collide but their hashes are different.

That said, even with collision counting I'd like a way to disable it
without changing the code, e.g. a flag or environment variable.

--Guido

On Thu, Jan 12, 2012 at 5:24 PM, Victor Stinner 
victor.stin...@haypocalc.com wrote:

 Many people proposed their own idea to fix the vulnerability, but only
 3 wrote a patch:

 - Glenn Linderman proposes to fix the vulnerability by adding a new
 safe dict type (only accepting string keys). His proof-of-concept
 (SafeDict.py) uses a secret of 64 random bits and uses it to compute
 the hash of a key.
 - Marc Andre Lemburg proposes to fix the vulnerability directly in
 dict (for any key type). The patch raises an exception if a lookup
 causes more than 1000 collisions.
 - I propose to fix the vulnerability only in the Unicode hash (not for
 other types). My patch adds a random secret initialized at startup (it
 can be disabled or fixed using an environment variable).

 --

 I consider that Glenn's proposition is not applicable in practice
 because all applications and all libraries have to be patched to use
 the new safe dict type.

 Some people are concerned by possible regression introduced by Marc's
 proposition: his patch may raise an exception for legitimate data.

 My proposition tries to be just enough secure with a low (runtime
 performance) overhead. My patch becomes huge (and so backporting is
 more complex), whereas Marc's patch is very simple and so trivial to
 backport.

 --

 It is still unclear to me if the fix should be enabled by default for
 Python  3.3. Because the overhead (of my patch) is low, I would
 prefer to enable the fix by default, to protect everyone with a simple
 Python upgrade.

 I prefer to explain how to disable explicitly the randomized hash
 (PYTHONHASHSEED=0) (or how to fix application bugs) to people having
 troubles with randomized hash, instead of leaving the hole open by
 default.

 --

 We might change hash() for types other than str, but it looks like web
 servers are only concerned by dict with string keys.

 We may use Paul's hash function if mine is not enough secure.

 My patch doesn't fix the DoS, it just make the attack more complex.
 The attacker cannot pregenerate data for an attack: (s)he has first to
 compute the hash secret, and then compute hash collisions using the
 secret. The hash secret is a least 64 bits long (128 bits on a 64 bit
 system). So I hope that computing collisions requires a lot of CPU
 time (is slow) to make the attack ineffective with today computers.

 --

 I plan to write a nice patch for Python 3.3, then write a simpler
 patch for 3.1 and 3.2 (duplicate os.urandom code to keep it unchanged,
 maybe don't create a new random.c file, maybe don't touch the test
 suite while the patch breaks many tests), and finally write patches
 for Python 2.6 and 2.7.

 Details about my patch:

 - I tested it on Linux (32 and 64 bits) and Windows (Seven 64 bits)
 - a new PYTHONSEED environment variable allow to control the
 randomized hash: PYTHONSEED=0 disables completly the randomized hash
 (restore the previous behaviour), PYTHONSEED=value uses a fixed seed
 for processes sharing data and needind same hash values
 (multiprocessing users?)
 - no overhead on hash(str)
 - no startup overhead on Linux
 - startup overhead is 10% on Windows (see the issue, I propose another
 solution with a startup overhead of 1%)

 The patch is not done, some tests are still failing because of the
 randomized hash.

 --

 FYI, PHP released a version 5.3.9 adding max_input_vars directive to
 prevent attacks based on hash collisions (CVE-2011-4885).

 Victor
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 

Re: [Python-Dev] Proposed PEP on concurrent programming support

2012-01-12 Thread PJ Eby
On Wed, Jan 11, 2012 at 7:01 PM, Mike Meyer m...@mired.org wrote:

 On Wed, 4 Jan 2012 00:07:27 -0500
 PJ Eby p...@telecommunity.com wrote:
   On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer m...@mired.org wrote:
   For
   instance, combining STM with explicit locking would allow explicit
   locking when IO was required,
  I don't think this idea makes any sense, since STM's don't really
  lock, and to control I/O in an STM system you just STM-ize the
  queues. (Generally speaking.)

 I thought about that. I couldn't convince myself that STM by itself
 sufficient. If you need to make irreversible changes to the state of
 an object, you can't use STM, so what do you use? Can every such
 situation be handled by creating safe values then using an STM to
 update them?


If you need to do something irreversible, you just need to use an
STM-controlled queue, with something that reads from it to do the
irreversible things.  The catch is that your queue design has to support
guaranteed-successful item removal, since if the dequeue transaction fails,
it's too late.  Alternately, the queue reader can commit removal first,
then perform the irreversible operation...  but leave open a short window
for failure.  It depends on the precise semantics you're looking for.

In either case, though, the STM is pretty much sufficient, given a good
enough queue data structure.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com