[Python-Dev] hello, new dict addition for new eve ?

2011-12-30 Thread julien tayon
Hello,
Sorry to annoy the very busy core devs :) out of the blue

I quite noticed people were
1) wanting to have a new dict for Xmas
2) strongly resenting dict addition.

Even though I am not a good developper, I have come to a definition of
addition that would follow algebraic rules, and not something of a
dutch logic. (it is a jest, not a troll)

I propose the following code to validate my point of view regarding
the dictionnatry addition as a proof of concept :
https://github.com/jul/ADictAdd_iction/blob/master/test.py

It follows all my dusty math books regarding addition + it has the
amability to have rules of conservation.

I pretty much see a real advantage in this behaviour in functional
programming (map/reduce). (see the demonstrate.py), and it has a sense
(if dict can be seen has vectors).

I have been told to be a troll, but I am pretty serious.

Since, I coded with luck, no internet, intuition, and a complete
ignorance of the real meaning of the magic methods most of the time,
thus the actual implementation of the addition surely needs a complete
refactoring.

Sheers,
Bonne fêtes
Julien
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Your email to the mailing list

2011-12-30 Thread lahwran
...oops, I did not intend to send this to the mailing list. I
apologize for the accidental off topic.

On Fri, Dec 30, 2011 at 7:40 AM, lahwran  wrote:
> I don't want to post to the mailing list about this; But I must say, I
> found your email very entertaining. You have a good sense of humor.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Your email to the mailing list

2011-12-30 Thread lahwran
I don't want to post to the mailing list about this; But I must say, I
found your email very entertaining. You have a good sense of humor.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] hello, new dict addition for new eve ?

2011-12-30 Thread Guido van Rossum
Hi Julien,

Don't despair! I have tried to get people to warm up to dict addition too
-- in fact it was my counter-proposal at the time when we were considering
adding sets to the language. I will look at your proposal, but I have a
point of order first: this should be discussed on python-ideas, not on
python-dev. I have added python-ideas to the thread and moved python-dev to
Bcc, so followups will hopefully all go to python-ideas.

--Guido

On Fri, Dec 30, 2011 at 7:26 AM, julien tayon  wrote:

> Hello,
> Sorry to annoy the very busy core devs :) out of the blue
>
> I quite noticed people were
> 1) wanting to have a new dict for Xmas
> 2) strongly resenting dict addition.
>
> Even though I am not a good developper, I have come to a definition of
> addition that would follow algebraic rules, and not something of a
> dutch logic. (it is a jest, not a troll)
>
> I propose the following code to validate my point of view regarding
> the dictionnatry addition as a proof of concept :
> https://github.com/jul/ADictAdd_iction/blob/master/test.py
>
> It follows all my dusty math books regarding addition + it has the
> amability to have rules of conservation.
>
> I pretty much see a real advantage in this behaviour in functional
> programming (map/reduce). (see the demonstrate.py), and it has a sense
> (if dict can be seen has vectors).
>
> I have been told to be a troll, but I am pretty serious.
>
> Since, I coded with luck, no internet, intuition, and a complete
> ignorance of the real meaning of the magic methods most of the time,
> thus the actual implementation of the addition surely needs a complete
> refactoring.
>
> Sheers,
> Bonne fêtes
> Julien
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2011-12-30 Thread Python tracker

ACTIVITY SUMMARY (2011-12-23 - 2011-12-30)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open3178 (+10)
  closed 22288 (+16)
  total  25466 (+26)

Open issues with patches: 1365 


Issues opened (21)
==

#12760: Add create mode to open()
http://bugs.python.org/issue12760  reopened by pitrou

#13294: http.server: minor code style changes.
http://bugs.python.org/issue13294  reopened by ezio.melotti

#13659: Add a help() viewer for IDLE's Shell.
http://bugs.python.org/issue13659  opened by ramchandra.apte

#13663: pootle.python.org is outdated.
http://bugs.python.org/issue13663  opened by naoki

#13664: UnicodeEncodeError in gzip when filename contains non-ascii
http://bugs.python.org/issue13664  opened by jason.coombs

#13665: TypeError: string or integer address expected instead of str i
http://bugs.python.org/issue13665  opened by jason.coombs

#13666: datetime documentation typos
http://bugs.python.org/issue13666  opened by steveire

#13668: mute ImportError in __del__ of _threading_local module
http://bugs.python.org/issue13668  opened by Zhiping.Deng

#13669: XATTR_SIZE_MAX and XATTR_LIST_MAX undefined on kfreebsd/debian
http://bugs.python.org/issue13669  opened by zbysz

#13670: Increase test coverage for pstats.py
http://bugs.python.org/issue13670  opened by andrea.crotti

#13672: Add co_qualname attribute in code objects
http://bugs.python.org/issue13672  opened by Arfrever

#13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi
http://bugs.python.org/issue13673  opened by sbt

#13674: crash in datetime.strftime
http://bugs.python.org/issue13674  opened by patrick.vrijlandt

#13676: sqlite3: Zero byte truncates string contents
http://bugs.python.org/issue13676  opened by petri.lehtinen

#13677: correct docstring for builtin compile
http://bugs.python.org/issue13677  opened by Jim.Jewett

#13679: Multiprocessing system crash
http://bugs.python.org/issue13679  opened by Rock.Achu

#13680: Aifc comptype write fix
http://bugs.python.org/issue13680  opened by Oleg.Plakhotnyuk

#13681: Aifc read compressed frames fix
http://bugs.python.org/issue13681  opened by Oleg.Plakhotnyuk

#13682: Documentation of os.fdopen() refers to non-existing bufsize ar
http://bugs.python.org/issue13682  opened by petri.lehtinen

#13683: Docs in Python 3:raise statement mistake
http://bugs.python.org/issue13683  opened by ramchandra.apte

#13684: httplib tunnel infinite loop
http://bugs.python.org/issue13684  opened by luzakiru



Most recent 15 issues with no replies (15)
==

#13684: httplib tunnel infinite loop
http://bugs.python.org/issue13684

#13683: Docs in Python 3:raise statement mistake
http://bugs.python.org/issue13683

#13682: Documentation of os.fdopen() refers to non-existing bufsize ar
http://bugs.python.org/issue13682

#13680: Aifc comptype write fix
http://bugs.python.org/issue13680

#13677: correct docstring for builtin compile
http://bugs.python.org/issue13677

#13668: mute ImportError in __del__ of _threading_local module
http://bugs.python.org/issue13668

#13666: datetime documentation typos
http://bugs.python.org/issue13666

#13665: TypeError: string or integer address expected instead of str i
http://bugs.python.org/issue13665

#13664: UnicodeEncodeError in gzip when filename contains non-ascii
http://bugs.python.org/issue13664

#13649: termios.ICANON is not documented
http://bugs.python.org/issue13649

#13641: decoding functions in the base64 module could accept unicode s
http://bugs.python.org/issue13641

#13640: add mimetype for application/vnd.apple.mpegurl
http://bugs.python.org/issue13640

#13638: PyErr_SetFromErrnoWithFilenameObject is undocumented
http://bugs.python.org/issue13638

#13633: Handling of hex character references in HTMLParser.handle_char
http://bugs.python.org/issue13633

#13631: readline fails to parse some forms of .editrc under editline (
http://bugs.python.org/issue13631



Most recent 15 issues waiting for review (15)
=

#13684: httplib tunnel infinite loop
http://bugs.python.org/issue13684

#13681: Aifc read compressed frames fix
http://bugs.python.org/issue13681

#13680: Aifc comptype write fix
http://bugs.python.org/issue13680

#13677: correct docstring for builtin compile
http://bugs.python.org/issue13677

#13676: sqlite3: Zero byte truncates string contents
http://bugs.python.org/issue13676

#13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi
http://bugs.python.org/issue13673

#13670: Increase test coverage for pstats.py
http://bugs.python.org/issue13670

#13668: mute ImportError in __del__ of _threading_local module
http://bugs.python.org/issue13668

#13651: Improve redirection in urllib
http://bugs.python.org/issue13651

#13645: import machinery vulnerable to timestamp collisions
http://bugs.python.org/iss

Re: [Python-Dev] [Python-checkins] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile,

2011-12-30 Thread Brian Curtin
On Thu, Dec 29, 2011 at 11:55, antoine.pitrou
 wrote:
> http://hg.python.org/cpython/rev/cf57ef65bcd0
> changeset:   74194:cf57ef65bcd0
> user:        Antoine Pitrou 
> date:        Thu Dec 29 18:54:15 2011 +0100
> summary:
>  Issue #12715: Add an optional symlinks argument to shutil functions 
> (copyfile, copymode, copystat, copy, copy2).
> When that parameter is true, symlinks aren't dereferenced and the operation
> instead acts on the symlink itself (or creates one, if relevant).
>
> Patch by Hynek Schlawack.
>
> files:
>  Doc/library/shutil.rst  |   46 -
>  Lib/shutil.py           |  101 +---
>  Lib/test/test_shutil.py |  219 
>  Misc/NEWS               |    5 +
>  4 files changed, 333 insertions(+), 38 deletions(-)
>
>
> diff --git a/Doc/library/shutil.rst b/Doc/library/shutil.rst
> --- a/Doc/library/shutil.rst
> +++ b/Doc/library/shutil.rst
> @@ -45,7 +45,7 @@
>    be copied.
>
>
> -.. function:: copyfile(src, dst)
> +.. function:: copyfile(src, dst[, symlinks=False])
>
>    Copy the contents (no metadata) of the file named *src* to a file named 
> *dst*.
>    *dst* must be the complete target file name; look at :func:`copy` for a 
> copy that
> @@ -56,37 +56,56 @@
>    such as character or block devices and pipes cannot be copied with this
>    function.  *src* and *dst* are path names given as strings.
>
> +   If *symlinks* is true and *src* is a symbolic link, a new symbolic link 
> will
> +   be created instead of copying the file *src* points to.
> +
>    .. versionchanged:: 3.3
>       :exc:`IOError` used to be raised instead of :exc:`OSError`.
> +      Added *symlinks* argument.

Can we expect that readers on Windows know how os.symlink works, or
should the stipulations of os.symlink usage also be laid out or at
least linked to from there?

Basically, almost everyone is going to get an OSError if they call
this on Windows. You have to be on Windows Vista or beyond *and* the
calling process has to have the proper privileges (typically gained
through elevation - "Run as Administrator").
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile,

2011-12-30 Thread Antoine Pitrou
On Fri, 30 Dec 2011 13:29:36 -0600
Brian Curtin  wrote:
> 
> Can we expect that readers on Windows know how os.symlink works, or
> should the stipulations of os.symlink usage also be laid out or at
> least linked to from there?

I assume it won't make a difference in real life, since symlinks are
quite rare under Windows.

> Basically, almost everyone is going to get an OSError if they call
> this on Windows. You have to be on Windows Vista or beyond *and* the
> calling process has to have the proper privileges (typically gained
> through elevation - "Run as Administrator").

I still haven't managed to use symlinks under Windows 7, myself.
The recipes I've tried didn't work.

Regards

Antoine.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile,

2011-12-30 Thread Brian Curtin
On Fri, Dec 30, 2011 at 13:39, Antoine Pitrou  wrote:
> On Fri, 30 Dec 2011 13:29:36 -0600
> Brian Curtin  wrote:
>>
>> Can we expect that readers on Windows know how os.symlink works, or
>> should the stipulations of os.symlink usage also be laid out or at
>> least linked to from there?
>
> I assume it won't make a difference in real life, since symlinks are
> quite rare under Windows.
>
>> Basically, almost everyone is going to get an OSError if they call
>> this on Windows. You have to be on Windows Vista or beyond *and* the
>> calling process has to have the proper privileges (typically gained
>> through elevation - "Run as Administrator").
>
> I still haven't managed to use symlinks under Windows 7, myself.
> The recipes I've tried didn't work.

This might be a place where an image in the documentation would be
helpful. I don't think we do that anywhere else, but maybe I could add
it to the (sorely out of date and in need of a rebuild) Windows FAQ?

What you need to do on Win7 is go to Start > All Programs >
Accessories > Command Prompt, but right click on it instead of left
click. Choose "Run as Administrator", then it'll make you choose yes
or no to elevate privileges. At that point, deep in the heart of
everyone's favorite operating system, it should acquire the
SeCreateSymbolicLink user privilege. After that, os.symlink should
work fine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Jim Jewett
In http://mail.python.org/pipermail/python-dev/2011-December/115138.html,
Christian Heimes
pointed out that

> ... we don't have to alter the outcome of hash ... We just need to reduce the 
> chance that
> an attacker can produce collisions in the dict (and set?)

I'll state it more strongly.  hash probably should not change (at
least for this), but we may
want to consider a different conflict resolution strategy when the
first slot is already filled.

Remember that there was a fair amount of thought and timing effort put
into selecting the
current strategy; it is deliberately sub-optimal for random input, in
order to do better with
typical input.  <
http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt >


If there is a change, it would currently be needed in three places for
each of set and dict
(the lookdict functions and insertdict_clean).  It may be worth adding
some macros just to
keep those six in sync. Once those macros are in place, that allows a
compile-time switch.

My personal opinion is that accepting *and parsing* enough data for
this to be a problem
is enough of an edge case that I don't want normal dicts slowed down
at all for this; I would
therefore prefer that the change be restricted to such a compile-time
switch, with current
behavior the default.


http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictobject.c#l571

   583for (perturb = hash; ep->me_key != NULL; perturb >>= PERTURB_SHIFT) {
   584 i = (i << 2) + i + perturb + 1;

PERTURB_SHIFT is already a private #define to 5; per dictnotes, 4 and 6 perform
almost as well.  Someone worried can easily make that change today,
and be protected
from "generic" anti-python attacks.

I believe the salt suggestions have equivalent to replacing
perturb = hash;
with something likeperturb = hash + salt;

Changing i = (i << 2) + i + perturb + 1;would allow
effectively replacing the initial hash,
but risks spoiling performance in the non-adversary case.

Would there be objections to replacing those two lines with something like:

for (perturb = FIRST_PERTURB(hash, key);
 ep->me_key != NULL;
 perturb = NEXT_PERTURB(hash, key, perturb)) {
i = NEXT_SLOT(i, perturb);


The default macro definitions should keep things as they are

#define FIRST_PERTURB(hash, key)hash
#define NEXT_PERTURB(hash, key, perturb)perturb >> PERTURB_SHIFT
#define NEXT_SLOT(i, perturb)(i << 2) + i + perturb + 1

while allowing #ifdefs for (slower but) safer things like adding a
salt, or even using
alternative hashes.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Victor Stinner

Le 29/12/2011 02:28, Michael Foord a écrit :

A paper (well, presentation) has been published highlighting security problems 
with the hashing algorithm (exploiting collisions) in many programming 
languages Python included:

 
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf


This PDF doesn't explain exactly the problem and how it can be solved. 
Let's try to summarize this "vulnerability".



The creation of a Python dictionary has a complexity of O(n) in most 
cases, but O(n^2) in the *worst* case. The attack tries to go into the 
worst case. It requires to compute a set of N keys having the same hash 
value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to 
compute these keys once. It looks like it is now cheap enough in 
practice to compute this dataset for Python (and other languages).


A countermeasure would be to check that we don't have more than X keys 
with the same hash value. But in practice, we don't know in advance how 
data are processed, and there are too many input vectors in various formats.


If we want to fix something, it should be done in the implementation of 
the dict type or in the hash algorithm. We can implement dict 
differently to avoid this issue, using a binary tree for example. 
Because dict is a fundamental type in Python, I don't think that we can 
change its implementation (without breaking backward compatibility and 
so applications in production). A possibility would be to add a *new* 
type, but all libraries and applications would need to be changed to fix 
the vulnerability.


The last choice is to change the hash algorithm. The *idea* is the same 
than adding salt to hashed password (in practice it will be a little bit 
different): if a pseudo-random salt is added, the attacker cannot 
prepare a single dataset, he/she will have to regenerate a new dataset 
for each possible salt value. If the salt is big enough (size in bits), 
the attacker will need too much CPU to generate the dataset (compute N 
keys with the same hash value). Basically, it slows down the attack by 
2^(size of the salt).


--

Another possibility would be to replace our fast hash function by a 
better hash function like MD5 or SHA1 (so the creation of the dataset 
would be too slow in practice = too expensive), but cryptographic hash 
functions are much slower (and so would slow down Python too much).


Limiting the size of the POST data doesn't solve the problem because 
there are many other input vectors and data formats. It may block the 
most simple attacks because the attacker cannot inject enough keys to 
slow down your CPU.


Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Steven D'Aprano

Jim Jewett wrote:


My personal opinion is that accepting *and parsing* enough data for
this to be a problem
is enough of an edge case that I don't want normal dicts slowed down
at all for this; I would
therefore prefer that the change be restricted to such a compile-time
switch, with current behavior the default.


By compile-time, do you mean when the byte-code is compilated, i.e. just 
before runtime, rather than a switch when compiling the Python executable from 
source? I will assume so.


I'm not a big fan of compile-time (runtime) switches. It makes it too hard to 
compare before-and-after behaviour within a single session, and impossible to 
have fine control over which objects have which behaviour. I don't like 
all-or-nothing settings. (E.g. I'd love to be able to turn -O optimization on 
and off on a per-function basis, but can't.)


How about using a similar strategy to the current dict behaviour with 
__missing__ and defaultdict? Here's my suggestion:



- If a dict subclass defines __salt__, then it is called to salt the hash
  value before lookups. If __salt__ is undefined or None, the current
  behaviour remains unchanged.

- Add a dict subclass (saltdict, for lack of a better name) that defines
  __salt__ appropriately to the collections module. In this case, I don't
  know enough to suggest what is an appropriate salt. I leave that to the
  security experts to argue about.

- Update the relevant standard library modules to use saltdict where needed.


This allows a single application or framework to use saltdict where necessary, 
without slowing down all dict accesses. Dicts which never see user-generated 
input (e.g. globals) can remain full-speed.


If there is no consensus about the best salting strategy, then apps can choose 
their own by subclassing dict.


Responsibility for doing the right thing falls onto the library author, rather 
than Python itself. Some people may consider that a minus.





--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Victor Stinner

Le 29/12/2011 14:19, Christian Heimes a écrit :

Perhaps the dict code is a better place for randomization.


The problem is the creation of a dict with keys all having the same hash 
value. The current implementation of dict uses a linked-list. Adding a 
new item requires to compare the new key to all existing keys (compare 
the value, not the hash, which is much slower).


We had to change completly how dict is implemented to be able to fix 
this issue. I don't think that we can change the dict implementation 
without breaking backward compatibility or breaking applications. Change 
the implementation would change dict properties, and applications rely 
on the properties of the current implementation.


Tell me if I am wrong.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Victor Stinner

In case the watchdog is not a viable solution as I had assumed it was, I
think it's more reasonable to indeed consider adding a flag to Python
that allows randomization of hashes optionally before startup.


A flag will only be needed if the overhead of the fix is too high.


However as it was said earlier, the attack is a lot more complex to
carry out on a 64bit environment that it's probably (as it stands right
now!) safe to ignore.


I suppose that there are still servers running 32 bits Python.


The main problem there however is not that it's a new attack but that
some dickheads could now make prebaked attacks against websites to
disrupt them that might cause some negative publicity.  In general
though there are so many more ways to DDOS a website than this that I
would rate the whole issue very low.


There are countermeasures for low level DDOS (ICMP ping flood, TCP syn 
flood, etc.). An application (or a firewall) cannot implement a 
countermeasure for this high level issue. It can only be fixed in Python 
directly (by changing the implementation of the dict type or of the hash 
function).


Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Christian Heimes
Am 31.12.2011 03:31, schrieb Victor Stinner:
> Le 29/12/2011 14:19, Christian Heimes a écrit :
>> Perhaps the dict code is a better place for randomization.
> 
> The problem is the creation of a dict with keys all having the same hash 
> value. The current implementation of dict uses a linked-list. Adding a 
> new item requires to compare the new key to all existing keys (compare 
> the value, not the hash, which is much slower).
> 
> We had to change completly how dict is implemented to be able to fix 
> this issue. I don't think that we can change the dict implementation 
> without breaking backward compatibility or breaking applications. Change 
> the implementation would change dict properties, and applications rely 
> on the properties of the current implementation.
> 
> Tell me if I am wrong.

You are right and I was wrong. We can't do the randomization inside the
dict code. The randomization factor must used as initialization factor
(IV) for the hashing algorithm. At first I thought the attack used the
unique property of Python's dict implementation (perturbed hash instead
of buckets for equal hashes) but I was wrong. It took me several hours
of reading and digging into the math until I figured out my mistake.
Sorry! :)

This means we can't implement a salted dict unless the salted dict
re-implemention the hash algorithm for unicode, bytes and memoryview. I
doubt that a wise idea.

I've checked my first draft of a possible solution:
http://hg.python.org/features/randomhash/ . The pseudo RNG has to be
replaced with something better and it's missing an option to feed a
seed, too.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Christian Heimes
Am 31.12.2011 03:19, schrieb Steven D'Aprano:
> How about using a similar strategy to the current dict behaviour with 
> __missing__ and defaultdict? Here's my suggestion:
> 
> 
> - If a dict subclass defines __salt__, then it is called to salt the hash
>value before lookups. If __salt__ is undefined or None, the current
>behaviour remains unchanged.

This was my initial proposal, too. It took me a while to figure out that
it won't work. Post-salting won't fix the issue. The random seed must be
used as IV inside hashing algorithm. My brain was still in holiday mode
and it took me a while to figure out the math. Sorry for any confusion!

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Christian Heimes
Am 31.12.2011 03:22, schrieb Victor Stinner:
> The creation of a Python dictionary has a complexity of O(n) in most 
> cases, but O(n^2) in the *worst* case. The attack tries to go into the 
> worst case. It requires to compute a set of N keys having the same hash 
> value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to 
> compute these keys once. It looks like it is now cheap enough in 
> practice to compute this dataset for Python (and other languages).

Correct. The meet-in-the-middle attack and the unique properties of
algorithms that are similar to DJBX33A and DJBX33A make the attack easy
on platforms with 32bit hash.

> A countermeasure would be to check that we don't have more than X keys 
> with the same hash value. But in practice, we don't know in advance how 
> data are processed, and there are too many input vectors in various formats.
> 
> If we want to fix something, it should be done in the implementation of 
> the dict type or in the hash algorithm. We can implement dict 
> differently to avoid this issue, using a binary tree for example. 
> Because dict is a fundamental type in Python, I don't think that we can 
> change its implementation (without breaking backward compatibility and 
> so applications in production). A possibility would be to add a *new* 
> type, but all libraries and applications would need to be changed to fix 
> the vulnerability.

A BTree is too slow for common operations, it's O(log n) instead of O(1)
in average. We can't replace our dict with a btree type. A new btree
type is a lot of work, too.

The unique structure of CPython's dict implementation makes it harder to
get the number of values with equal hash. The academic hash map (the one
I learnt about at university) uses a bucket to store all elements with
equal hash (more precise hash: mod mask). However Python's dict however
perturbs the hash until it finds a free slot its array. The second,
third ... collision can be caused by a legit and completely different
(!) hash.

> The last choice is to change the hash algorithm. The *idea* is the same 
> than adding salt to hashed password (in practice it will be a little bit 
> different): if a pseudo-random salt is added, the attacker cannot 
> prepare a single dataset, he/she will have to regenerate a new dataset 
> for each possible salt value. If the salt is big enough (size in bits), 
> the attacker will need too much CPU to generate the dataset (compute N 
> keys with the same hash value). Basically, it slows down the attack by 
> 2^(size of the salt).

That's the idea of randomized hashing functions as implemented by Ruby
1.8, Perl and others. The random seed is used as IV. Multiple rounds of
multiply, XOR and MOD (integer overflows) cause a deviation. In your
other posting you were worried about the performance implication. A
randomized hash function just adds a single ADD operation, that's all.

Downside: With randomization all hashes are unpredictable and change
after every restart of the interpreter. This has some subtle side
effects like a different outcome of {a:1, b:1, c:1}.keys() after a
restart of the interpreter.

> Another possibility would be to replace our fast hash function by a 
> better hash function like MD5 or SHA1 (so the creation of the dataset 
> would be too slow in practice = too expensive), but cryptographic hash 
> functions are much slower (and so would slow down Python too much).

I agree with your analysis. Cryptographic hash functions are far too
slow for our use case. During my research I found another hash function
that claims to be fast and that may not be vulnerable to this kind of
attack: http://isthe.com/chongo/tech/comp/fnv/

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Terry Reedy

On 12/30/2011 8:04 PM, Jim Jewett wrote:


I'll state it more strongly.  hash probably should not change (at
least for this),


I agree, especially since the vulnerability can be avoided by using 64 
bit servers and will generally abate as more switch anyway.


> but we may

want to consider a different conflict resolution strategy when the
first slot is already filled.

Remember that there was a fair amount of thought and timing effort put
into selecting the
current strategy; it is deliberately sub-optimal for random input, in
order to do better with
typical input.<
http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt>


It would be good to have a set of attack strings to see how vulernerable 
Py dicts actually are (Python may not have been actually tested with 
data) and the affect of any change. I gave the project email of the 2 
presenters in my first post. They apparently want to work with language 
developers to improve defenses against attack.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com