Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Armin Rigo
Hi Guido,

On Thu, Oct 3, 2013 at 10:47 PM, Guido van Rossum gu...@python.org wrote:
 Sounds a bit like some security researchers drumming up business. If you can
 run the binary, presumably you can also recover the seed by looking in
 /proc, right? Or use ctypes or something. This demonstration seems of
 academic interest only.

I'll not try to defend the opposite point of view very actively, but
let me just say that, in my opinion, your objection is not valid.  It
is broken the same way as a different objection, which would claim
that Python can be made sandbox-safe without caring about the numerous
segfault cases.  They are all very obscure for sure; I tried at some
point to list them in Lib/test/crashers.  I gave up when people
started deleting the files because they no longer crashed on newer
versions, just because details changed --- but not because the general
crash they explained was in any way fixed...  Anyway, my point is that
most segfaults can, given enough effort, be transformed into a single,
well-documented tool to conduct a large class of attacks.

The hash issue is similar.  It should be IMHO either ignored (which is
fine for a huge fraction of users), or seriously fixed by people with
the correctly pessimistic approach.  The current hash randomization is
simply not preventing anything; someone posted long ago a way to
recover bit-by-bit the hash randomized used by a remote web program in
Python running on a server.  The only benefit of this hash
randomization option (-R) was to say to the press that Python fixed
very quickly the problem when it was mediatized :-/

This kind of security issues should never be classified as academic
interest only.  Instead they can be classified as it will take weeks
/ months / years before some crazy man manages to put together a
general attack script, but likely, someone will eventually.

From this point of view I'm saluting Christian's effort, even if I
prefer to stay far away from this kind of issues myself :-)


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] project culture: take responsibility for your commits

2013-10-04 Thread Stefan Behnel
Stephen,

thank you for your very thoughtful answer.

Stephen J. Turnbull, 03.10.2013 04:23:
 Stefan Behnel writes:
 
   Hi, I'm looking back on a rather unpleasant experience that I
   recently had in this developer community. Actually, twice by
   now. Here's what I take from it: You should take responsibility for
   your commits.
 
 I have no clue who you're addressing this advice to.  If it's not
 yourself (from the following, I gather it's not), I think the
 implication of what you are saying is mistaken.  Core devs (by which I
 mean the high-profile developers who are candidates for PEP delegate)
 regularly do take responsibility for their commits, just like any
 other committer, by changing or reverting them.  That's visible on
 this list as well as in the commit logs.

I'm aware of that, and I apologise to those who felt offended by my post. I
really didn't mean for it to read that way, but I can see in retrospect
that my phrasing and its implications were bound to be read as offence,
both personally and to the audience.


   Let's assume these complaints [about the code] are reasonable
 
 That's not sufficient.  They must also be presented reasonably, by the
 standards of the community.  Not everybody is good at doing that, and
 those who aren't suffer, as does the project for losing a useful
 contribution.  Unfortunate, but digging out what matters from unclear
 or high-handed presentations requires an enormous amount of effort,
 like pyschotherapy.  Good psychotherapists bill hundreds of dollars an
 hour.  The very best pythotherapists bill nothing, at least not to
 this community.

I'm also aware of that. In one of the OSS projects that I lead, bad bug
reports are actually quite frequent due to a broad distribution of user
experience (simplicity has its drawbacks, it seems). It can sometimes take
way more time than I'd have wanted to invest to decipher them and/or ask
back until it's clearer.


 Regarding the specific core dev behavior that offended you, I can
 speak from my experience in another project.
 
   What do you do in that case? Do you tell them that what's in is in?
 
 I've done that and later reversed my position.  In retrospect, I
 believe I was correct at the time of first approach in the majority of
 cases, though, on the grounds of the lesser of two evils as I
 understood the issues (or occasionally that the contributor had
 completely misunderstood the issues).
 
 In most cases the original requester never did come up with a coherent
 argument, just that something unclear to me didn't work for them.
 Reversal in such cases was due to a third party who was able to
 explain the requester's requirements, and often contribute (most of) a
 specification of a complete fix or a good compromise.

Here, you are mostly saying that it's ok to say that for illegitimate and
unclear complaints. Even in that case, I'd personally be very careful with
that phrase. But I guess that seconds Brett's remarks on subjectivity.


   Do you tell them that you are a core developer and they are not?
 
 I've done that.  I don't know if it applies to the cases you have in
 mind, but invariably that was a last retort when I just wanted to shut
 down a conversation that had already come back to the same place
 twice, and polite guidance seemed to be a complete failure.  Childish,
 I guess, but it's been effective.  That's not sufficient reason to use
 it in Python, which has higher standards for courtesy than my other
 project does.

I also agree here. Personally, I can't recall a situation where I ever said
that in my OSS projects (and I apologise to everyone I forget here ;)


 Caveat: as with the next item, I have to wonder if you mistook an
 explanation that
 
 in such disputes, the Python default is to go with the core dev's
 gut feeling unless there's good reason to do otherwise, and you
 haven't explained well enough yet
 
 for a snotty I am and you're not, so go away!
 
   That they can try to do better, and if they are lucky, find someone
   else who applies their patch?
 
 Definitely, and I would advise any core developer to use exactly that
 response as soon as they feel the discussion is becoming unprofitable.

The problem is that these two can go hand in hand. As a non-committer, you
are always at the mercy of core developers, and it feels bad to be made
aware of it. If the situation (however it was phrased) is essentially it's
committed, and now find someone else to listen, then reverting is no
longer really an option. It's very hard to convince one core developer to
revert a commit of another (and in fact, it should be).

So, basically, by simply turning away, you are forcing the bagger into
fixing it themselves, i.e. into writing the patch, into cleaning up the
mess you left, not even knowing if there will ever be someone else to then
apply it. That's a very awkward situation for them. Not uncommonly, writing
that patch is way more work than the original core developer invested 

Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Victor Stinner
2013/10/4 Armin Rigo ar...@tunes.org:
 The current hash randomization is
 simply not preventing anything; someone posted long ago a way to
 recover bit-by-bit the hash randomized used by a remote web program in
 Python running on a server.

Oh interesting, is it public? If yes, could we please search the URL
of the exploit? I'm more motivated to fix an issue if it is proved to
be exploitable.

I still fail to understand the real impact of a hash DoS compared to
other kinds of DoS. It's like the XML bomb: the vulnerability was also
known since many years, but Christian only fixed the issue recently
(and the fix was implemented in a package on the Cheeseshop, not in
the stblib! Is that correct?).

 The only benefit of this hash
 randomization option (-R) was to say to the press that Python fixed
 very quickly the problem when it was mediatized :-/

The real benefit is to warn users that they should not rely on the
dictionary or set order/representation (in their unit tests), and that
the hash function is not deterministic :-)

(So now it is much easier to replace the hash function with SipHash or
anything else, without breaking new applications.)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Christian Heimes
Am 04.10.2013 11:15, schrieb Victor Stinner:
 2013/10/4 Armin Rigo ar...@tunes.org:
 The current hash randomization is simply not preventing anything;
 someone posted long ago a way to recover bit-by-bit the hash
 randomized used by a remote web program in Python running on a
 server.
 
 Oh interesting, is it public? If yes, could we please search the
 URL of the exploit? I'm more motivated to fix an issue if it is
 proved to be exploitable.

I'm intrigued, too!

 I still fail to understand the real impact of a hash DoS compared
 to other kinds of DoS. It's like the XML bomb: the vulnerability
 was also known since many years, but Christian only fixed the issue
 recently (and the fix was implemented in a package on the
 Cheeseshop, not in the stblib! Is that correct?).

About the XML bomb and other issues ... I kinda lost my motivation to
push the fixes into the stdlib. :( The code is ready. It just needs a
proper configuration interface / API.

The hash DoS and XML DoS vulnerabilities have one thing in common.
Both multiply the effectiveness of an attack by several orders of
magnitude. You don't need 100 GBit/sec to kick a service out of
existence. A simple DSL line or mobile phone with 3G/HSDPA does the
same job (if done right). Nowaday Python is important, for example
major parts of the Brazilian Government run on Python, Zope and Plone.
There are Dropbox, Google App Engine ...

 The real benefit is to warn users that they should not rely on the 
 dictionary or set order/representation (in their unit tests), and
 that the hash function is not deterministic :-)
 
 (So now it is much easier to replace the hash function with SipHash
 or anything else, without breaking new applications.)

Thanks for your groundwork and groudbreaking work, Victor! :)

Christian

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Antoine Pitrou
Le Fri, 4 Oct 2013 11:15:17 +0200,
Victor Stinner victor.stin...@gmail.com a écrit :

 2013/10/4 Armin Rigo ar...@tunes.org:
  The current hash randomization is
  simply not preventing anything; someone posted long ago a way to
  recover bit-by-bit the hash randomized used by a remote web program
  in Python running on a server.
 
 Oh interesting, is it public? If yes, could we please search the URL
 of the exploit? I'm more motivated to fix an issue if it is proved to
 be exploitable.
 
 I still fail to understand the real impact of a hash DoS compared to
 other kinds of DoS. It's like the XML bomb: the vulnerability was also
 known since many years, but Christian only fixed the issue recently
 (and the fix was implemented in a package on the Cheeseshop, not in
 the stblib! Is that correct?).
 
  The only benefit of this hash
  randomization option (-R) was to say to the press that Python fixed
  very quickly the problem when it was mediatized :-/
 
 The real benefit is to warn users that they should not rely on the
 dictionary or set order/representation (in their unit tests), and that
 the hash function is not deterministic :-)

I agree it probably had educational value.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] project culture: take responsibility for your commits

2013-10-04 Thread Ethan Furman

On 10/02/2013 11:58 AM, Stefan Behnel wrote:


I'm looking back on a rather unpleasant experience that I recently had in
this developer community. Actually, twice by now. Here's what I take from it:

You should take responsibility for your commits.


It doesn't sound like you learned anything, then, as you apparently 
already knew this (judging from your later post).  I find it disturbing 
that nowhere in your two posts to this thread do you take responsibility 
for your part in what happened.  (Disclaimer: I'm only aware of one of 
the incidents.)


Here is what I hope you learn, as it will benefit both you, the 
developers you work with, and hopefully Python as well:


  - Be respectful

  - Realize that people don't always agree on the
best solution

  - Ask for clarification on responses if you don't
think your point is being understood

The second and third points follow from the first, and is the one that 
you seemed to have the most trouble with:  starting a trouble ticket 
with accusations that something was snuck in and done behind peoples' 
backs is offensive, as are continual accusations that those you are 
working with simply don't understand.


Add to that constant complaints about writing patches yourself... well, 
to be brief I am not surprised you didn't have a good experience -- I 
don't think anybody involved with that ticket had a good experience, 
including myself, and I was just a bystander.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread martin


Quoting Victor Stinner victor.stin...@gmail.com:


I still fail to understand the real impact of a hash DoS compared to
other kinds of DoS.


I think the key question is: how many attacking nodes do you need to
control to effectively make some system deny service. A threat is bigger
if you can do it in 10 requests/s from a single host, instead of needing
10,000 hosts, each making 1000 request/s.

With the hash DoS, the threat is that if you manage to fill some dictionary
with colliding keys, then each lookup will take a very long time, and you
might arrange to put many lookups into a single HTTP request. So a single
HTTP request might get very costly CPU-wise.

Whether this is a serious threat or not depends on what other threats
the system being attacked is vulnerable to. Maybe there is something even
simpler, or maybe the hash attack is the only hope of bringing the system
to its knees.

IMO, the hash attack is particularly tricky since it is very easy to
argue and very difficult to demonstrate. So it can result in fear
and uncertainty very easily, causing people to overreact just so that
they won't be accused of inactivity.

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Antoine Pitrou
Le Fri, 04 Oct 2013 17:13:32 +0200,
mar...@v.loewis.de a écrit :
 
 Whether this is a serious threat or not depends on what other threats
 the system being attacked is vulnerable to. Maybe there is something
 even simpler, or maybe the hash attack is the only hope of bringing
 the system to its knees.
 
 IMO, the hash attack is particularly tricky since it is very easy to
 argue and very difficult to demonstrate.

If you know how to generate colliding hashes, it's actually relatively
easy to demonstrate, assuming you know how a particular Web application
processes its incoming requests (which you do if it's a standard Web
application such as hgweb).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2013-10-04 Thread Python tracker

ACTIVITY SUMMARY (2013-09-27 - 2013-10-04)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open4256 ( +0)
  closed 26682 (+58)
  total  30938 (+58)

Open issues with patches: 1960 


Issues opened (30)
==

#19066: os.execv fails with spaced names on Windows
http://bugs.python.org/issue19066  reopened by techtonik

#19111: 2to3 should remove from future_builtins import *
http://bugs.python.org/issue19111  opened by maubp

#19113: duplicate test names in Lib/ctypes/test/test_functions.py
http://bugs.python.org/issue19113  opened by xdegaye

#19119: duplicate test name in Lib/test/test_heapq.py
http://bugs.python.org/issue19119  opened by xdegaye

#19120: shlex.shlex.lineno reports a different number depending on the
http://bugs.python.org/issue19120  opened by daniel-s

#19121: Documentation guidelines enhancements
http://bugs.python.org/issue19121  opened by techtonik

#19124: os.execv executes in background on Windows
http://bugs.python.org/issue19124  opened by techtonik

#19129: 6.2.1. Regular Expression Syntax flags
http://bugs.python.org/issue19129  opened by endoalir

#19131: Broken support of compressed AIFC files
http://bugs.python.org/issue19131  opened by serhiy.storchaka

#19133: Transient test failure: test_with_statement (test_ftplib)
http://bugs.python.org/issue19133  opened by koobs

#19138: doctest.IGNORE_EXCEPTION_DETAIL doesn't match when no detail e
http://bugs.python.org/issue19138  opened by jamur2

#19140: inspect.Signature.bind() inaccuracies
http://bugs.python.org/issue19140  opened by epsy

#19141: Windows Launcher fails to respect PATH
http://bugs.python.org/issue19141  opened by gwideman

#19142: Cross-compile fails trying to execute foreign pgen on build ho
http://bugs.python.org/issue19142  opened by Trevor.Bowen

#19143: Finding the Windows version getting messier
http://bugs.python.org/issue19143  opened by tim.peters

#19145: Inconsistent behaviour in itertools.repeat when using negative
http://bugs.python.org/issue19145  opened by vajrasky

#19146: Improvements to traceback module
http://bugs.python.org/issue19146  opened by gvanrossum

#19148: Minor issues with Enum docs
http://bugs.python.org/issue19148  opened by Esa.Peuha

#19150: IDLE shell fails: ModifiedInterpreter instance has no attribu
http://bugs.python.org/issue19150  opened by Grupobetatesting

#19152: ExtensionFileLoader missing get_filename()
http://bugs.python.org/issue19152  opened by eric.snow

#19153: Embedding into a shared library fails again
http://bugs.python.org/issue19153  opened by rinatous

#19154: AttributeError: 'NoneType' in http/client.py when using select
http://bugs.python.org/issue19154  opened by fviard

#19156: Enum helper functions test-coverage
http://bugs.python.org/issue19156  opened by CliffM

#19157: ipaddress.IPv6Network.hosts function omits network and broadca
http://bugs.python.org/issue19157  opened by m01

#19158: BoundedSemaphore.release() subject to races
http://bugs.python.org/issue19158  opened by tim.peters

#19159: 2to3 incorrectly converts two parameter unicode() constructor 
http://bugs.python.org/issue19159  opened by gregory.p.smith

#19161: collections Counter handles nan strangely
http://bugs.python.org/issue19161  opened by Adam.Davison

#19164: Update uuid.UUID TypeError exception: integer should not be an
http://bugs.python.org/issue19164  opened by makronized

#19165: Change formatter warning to DeprecationWarning in 3.5
http://bugs.python.org/issue19165  opened by brett.cannon

#19166: Unusued variable in test_keys in Lib/test/test_dict.py
http://bugs.python.org/issue19166  opened by vajrasky



Most recent 15 issues with no replies (15)
==

#19166: Unusued variable in test_keys in Lib/test/test_dict.py
http://bugs.python.org/issue19166

#19165: Change formatter warning to DeprecationWarning in 3.5
http://bugs.python.org/issue19165

#19157: ipaddress.IPv6Network.hosts function omits network and broadca
http://bugs.python.org/issue19157

#19156: Enum helper functions test-coverage
http://bugs.python.org/issue19156

#19154: AttributeError: 'NoneType' in http/client.py when using select
http://bugs.python.org/issue19154

#19140: inspect.Signature.bind() inaccuracies
http://bugs.python.org/issue19140

#19138: doctest.IGNORE_EXCEPTION_DETAIL doesn't match when no detail e
http://bugs.python.org/issue19138

#19133: Transient test failure: test_with_statement (test_ftplib)
http://bugs.python.org/issue19133

#19131: Broken support of compressed AIFC files
http://bugs.python.org/issue19131

#19129: 6.2.1. Regular Expression Syntax flags
http://bugs.python.org/issue19129

#19121: Documentation guidelines enhancements
http://bugs.python.org/issue19121

#19113: duplicate test names in Lib/ctypes/test/test_functions.py
http://bugs.python.org/issue19113

#19102: Add tests for CLI of the 

Re: [Python-Dev] PEP 455: TransformDict

2013-10-04 Thread Raymond Hettinger

On Sep 22, 2013, at 6:16 PM, Ethan Furman et...@stoneleaf.us wrote:

 Are we close to asking for pronouncement? 

When you're ready, let me know.

In the meantime, I conducting usability tests on students in Python classes
and researching how well it substitutes for existing solutions for 
case insensitive dictionaries (the primary use case) and for other
existing cases such as dictionaries with unicode normalized keys.

If you want to participate in the research, I could also use help looking
at what other languages do.  Python is not the first language with
mappings or to encounter use cases for transforming keys prior
to insertion and lookup.   I would like to find out what work has
already been done on this problem.

Another consideration is whether the problem is more general
that just dictionaries.  Would you want similar functionality in
all mapping-like objects (i.e. a persistent dictionaries, os.environ, etc)?
Would you want similar functionality for other services
(i.e. case-insensitive filenames or other homomorphisms).

You can also add to the discussion by trying out your own usability
tests on people who haven't been exposed to this thread or the pep.

My early results indicate that the API still needs work.

* When shown code that uses a TransformDict, students don't seem 
to be able to deduce what the code does just from the context 
(this contrasts with something like OrderedDict and Counter where
the name says what it does).   

* When given a description of the mechanics of a TransformDict,
they don't seem to be able to figure-out what you would do with it
without being given an example.

* When given a example of using a TransformDict, they understand
the example but don't seem to be able to come-up with other examples
other than the one they were just shown.  And when shown multiple 
examples, they can't think of other use cases where they've ever 
needed this in their own code.

* This contrasts with the results when I show something less general
like a CaseInsensitiveDict.  People seem to get that right away.
As you might expect, the generalized solution is harder to wrap
your head around than a specific solution with a clear name.

* One student asked, why give regular dicts a key-function like 
sorted(), min() and max()?  I didn't have a good answer, but I 
haven't yet had time to read this whole thread.

* Another issue is that we're accumulating too many dictionary
variants and that is making it difficult to differentiate and choose
between them.  I haven't found anyone (even in advanced classes
with very experienced pythonistas) would knew about
all the variations:  dict, defaultdict, Mapping, MutableMapping,
mapping views, OrderedDict, Counter, ChainMap, andTransformDict.

David Beazley on twitter recently proposed that we add a 
MinDict and MaxDict.  There seems to be no shortage of ideas
of things that can be done with dictionaries.

Besides choosing among the dict variants, there is also confusion
about other mapping topics such as 1) when to subclass from dict
rather than inherit from MutableMapping, 2) the difference
between defaultdict(int) and Counter's use of __missing__ to return zero,
and 3) it seems that many experienced users can't even name all the
existing methods on dictionaries (they forget clear(), copy(), pop(), popitem(),
setdefault(), update() and the fromkeys() classmethod).

Overall, my impression at this point is that key transformations
are useful, but I'm not sure how to incorporate them without
taking Python further away from being a language that just fits
in your head.


Raymond___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-04 Thread Victor Stinner
2013/10/4 Raymond Hettinger raymond.hettin...@gmail.com:
 * Another issue is that we're accumulating too many dictionary
 variants and that is making it difficult to differentiate and choose
 between them.  I haven't found anyone (even in advanced classes
 with very experienced pythonistas) would knew about
 all the variations:  dict, defaultdict, Mapping, MutableMapping,
 mapping views, OrderedDict, Counter, ChainMap, andTransformDict.

Ok, but none of these classes address use cases described of the PEP 455.

If it became hard to choose the best container for an use case, it's
maybe a documentation issue.

The PEP 455 contains a long list of existing implementations, so it
means that these use cases are common (even if the Python stdlib
according to the PEP). It's a good thing that Python proposes a
standard implementation (efficient, well tested, documented, etc.) to
answer to these use cases.

I'm not convinced by your usability test. The problem is maybe the
name, TransformDict. We may find a more explicit name, like
TranformKeyDict or NormalizedKeyMapping. Or we can use names of the
Transformers movies: OptimusPrimeDict, BumblebeeMapping,
JazzDictionary, etc.

(If we cannot find a better name, we may add more specialized classes:
KeyInsensitiveDict and IdentiyDict. But I like the idea of using my
own transform function.)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-04 Thread Antoine Pitrou

Good evening,

On Fri, 4 Oct 2013 13:38:05 -0700
Raymond Hettinger raymond.hettin...@gmail.com wrote:
 
 You can also add to the discussion by trying out your own usability
 tests on people who haven't been exposed to this thread or the pep.

I think usability tests should be conducted on people who actually
have a need for the API. Otherwise they simply don't make sense: if you
don't need an API, then you don't have to learn / understand it either.

As an example, if you conduct random usability tests about yield
from (PEP 380, accepted) or single-dispatch generic functions (PEP 443,
accepted), you'll probably get a negative outcome, especially on
students.

Or if you conduct usability tests about the ssl module on someone who's
never done any network programming, you'll get the similar kind of
negative results.

 * When given a description of the mechanics of a TransformDict,
 they don't seem to be able to figure-out what you would do with it
 without being given an example.

Well, the documentation is the place where we give examples.

 * When given a example of using a TransformDict, they understand
 the example but don't seem to be able to come-up with other examples
 other than the one they were just shown.

Is it any different for e.g. defaultdict? Because the mechanics are
exactly the same: a generic construct which you can specialize for
various use cases.

 * This contrasts with the results when I show something less general
 like a CaseInsensitiveDict.  People seem to get that right away.
 As you might expect, the generalized solution is harder to wrap
 your head around than a specific solution with a clear name.

Yet the generic solution is applicable to far many cases than the
specialized one.
I'm not against adding a CaseInsensitiveDict, but that would be a
rather bizarre thing to do given we can add a generic construct that's
far more powerful, and not significantly more difficult.

 * One student asked, why give regular dicts a key-function like 
 sorted(), min() and max()?  I didn't have a good answer, but I 
 haven't yet had time to read this whole thread.

:-)
The key answer is: when you want to retain the original key.

 * Another issue is that we're accumulating too many dictionary
 variants and that is making it difficult to differentiate and choose
 between them.

It shouldn't be difficult, actually, because it doesn't make sense to
choose at all. The use cases for OrderedDict, Counter, TransformDict
and defaultdict are completely different.

 I haven't found anyone (even in advanced classes
 with very experienced pythonistas) would knew about
 all the variations:  dict, defaultdict, Mapping, MutableMapping,
 mapping views, OrderedDict, Counter, ChainMap, andTransformDict.

Is that actually a problem?

 Overall, my impression at this point is that key transformations
 are useful, but I'm not sure how to incorporate them without
 taking Python further away from being a language that just fits
 in your head.

The language fits in your head, but the stdlib doesn't. I don't think
it has done so for ages :-)

I'm not proposing TransformDict as a builtin, though.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-04 Thread Raymond Hettinger

On Oct 4, 2013, at 2:06 PM, Victor Stinner victor.stin...@gmail.com wrote:

 I'm not convinced by your usability test.

You're not the one who needs to be convinced ;-)

Please do conduct your own API tests and report back.  
This is necessary for a new class like TransformDict 
that was constructed from scratch and proposed for 
direct admission to the standard library.

This contrasts with other tools like OrderedDict, ChainMap, 
and namedtuple which started their lives outside the standard
library where we we able observe their fitness for real problems
being solved by real users.

None of my consulting client's have anything like a general
purpose transforming dict in their utility modules, so we lack
the real world experience that informed the design of the other
tools in the collections module.  To make up for that lack of
information, we need to put it in front of users as well as 
do research into how other languages have tackled the use cases.

In short, we need to know whether the API will make sense to people,
whether their code will be more readable with a TransformDict,
and whether the zoo of dict variants should continue to grow.

Right now, I don't know those things.  All I have to go on is that
I personally think the TransformDict is a cool idea.  However, that
alone isn't sufficient for accepting the PEP.



Raymond


“… in order to get things merged you need to solve not only just your own 
problem but also realize that the world is bigger than your company and try to 
solve things in a way where it makes sense for other people, even if primarily 
it is for your own situation.” -- Linus Torvalds 
http://www.extremeta.com/2013/09/linus-torvalds-said-linuxcon-kernel-developer-panel/390


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Larry Hastings

On 10/04/2013 11:15 AM, Victor Stinner wrote:

2013/10/4 Armin Rigo ar...@tunes.org:

The current hash randomization is
simply not preventing anything; someone posted long ago a way to
recover bit-by-bit the hash randomized used by a remote web program in
Python running on a server.

Oh interesting, is it public?


http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html

Quoting the synopsis:

   We also describe a vulnerability of Python's new randomized hash,
   allowing an attacker to easily recover the 128-bit secret seed.


I found all that while reading this interesting, yet moribund, bug report:

   http://bugs.python.org/issue14621

I guess there was enough bike shedding that people ran out of steam, or 
something.  It happens.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Victor Stinner
2013/10/5 Larry Hastings la...@hastings.org:
 On 10/04/2013 11:15 AM, Victor Stinner wrote:

 2013/10/4 Armin Rigo ar...@tunes.org:

 The current hash randomization is
 simply not preventing anything; someone posted long ago a way to
 recover bit-by-bit the hash randomized used by a remote web program in
 Python running on a server.

 Oh interesting, is it public?


 http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html

 Quoting the synopsis:

 We also describe a vulnerability of Python's new randomized hash, allowing
 an attacker to easily recover the 128-bit secret seed.

SipHash homepage contains a proof of concept to compute the secret:
https://131002.net/siphash/poc.py

But the script is not an exploit on a web server, but a script running
locally. It requires for example to know the hash of strings \0 and
\0\0. I would like to know if it's possible to retrieve such
information in practice.

And how do you retrieve the whole hash value from an HTTP page? You
may retrieve some bits using specific HTTP requests, but not directly
the whole hash value. I don't know any web page displaying directly
the hash value of a string coming from the user request!?

I'm not saying that the hash DoS does not exist, I'm just trying to
estimate the risk (compared to other DoS attacks). Changing the
default hash function is also risky and has a (well, minor) impact on
performances.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Victor Stinner
2013/10/4  mar...@v.loewis.de:

 Quoting Victor Stinner victor.stin...@gmail.com:

 I still fail to understand the real impact of a hash DoS compared to
 other kinds of DoS.


 I think the key question is: how many attacking nodes do you need to
 control to effectively make some system deny service. A threat is bigger
 if you can do it in 10 requests/s from a single host, instead of needing
 10,000 hosts, each making 1000 request/s.

Correct. I know that they are some other cheap attacks directly at
the network layer. For example, the spamhaus/CloudFlare attack which
made a lot of noise (300 Gbit/sec) used a DNS trick:

The traffic is being generated primarily from DNS amplification
attacks. Small requests are sent to DNS servers, generating responses
from those servers that are about 50-100 times larger.
http://arstechnica.com/security/2013/03/spamhaus-ddos-grows-to-internet-threatening-size/

In this case, you still need many computers to DoS a server (= DDoS).

 With the hash DoS, the threat is that if you manage to fill some dictionary
 with colliding keys, then each lookup will take a very long time, and you
 might arrange to put many lookups into a single HTTP request. So a single
 HTTP request might get very costly CPU-wise.

Ok, but why should we invest time to fix this specific DoS wheras
there are other DoS like XML bomb? Why not setting a limit on the CPU
time in your favorite web framework instead? I don't know the
complexity of adding sandbox-like features to a web framework. (It's
probably complex because we are discussing how to fix the issue
directly in Python :-))

 Whether this is a serious threat or not depends on what other threats
 the system being attacked is vulnerable to. Maybe there is something even
 simpler, or maybe the hash attack is the only hope of bringing the system
 to its knees.

Popular DDoS attack are usually the simplest, like flooding the server
with ping requests, flooding the DNS server, flooding with HTTP
requests which take a lot of time ot process, etc. Using a botnet, you
don't care of using an inefficient DoS attack, because your power is
the number of zombi.

I have no idea of the price of renting a botnet, it's probably
expensive (and illegal as well).

 IMO, the hash attack is particularly tricky since it is very easy to
 argue and very difficult to demonstrate. So it can result in fear
 and uncertainty very easily, causing people to overreact just so that
 they won't be accused of inactivity.

It would be easy to evaluate the risk with a public exploit on a real
world application :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-04 Thread Raymond Hettinger

On Oct 4, 2013, at 2:14 PM, Antoine Pitrou solip...@pitrou.net wrote:

 I think usability tests should be conducted on people who actually
 have a need for the API. Otherwise they simply don't make sense: if you
 don't need an API, then you don't have to learn / understand it either.

You're right.  Students don't make the best test subjects.
It might be nice to present this at a Python meet-up or somesuch.
Or some people on this list can present it at work to see how
their colleagues do with it.

Also, it might be nice to get feedback from existing users of
IdentityDicts or CaseInsensitiveDicts to see if they are bothered
by the implementation having two underlying dictionaries.


Raymond___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make str/bytes hash algorithm pluggable?

2013-10-04 Thread Steven D'Aprano
On Sat, Oct 05, 2013 at 01:27:37AM +0200, Victor Stinner wrote:

 I have no idea of the price of renting a botnet, it's probably
 expensive (and illegal as well).

Twelve cents per machine. Cheaper in bulk, and cheaper still for 
machines outside of the US. For those on a budget, you can get ten 
thousand zombie machines scattered all over the world for two cents 
each.

http://threatpost.com/how-much-does-botnet-cost-022813/77573


I believe you can also rent a botnet for $2 an hour.


-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 451: ModuleSpec

2013-10-04 Thread Eric Snow
After a few rounds on import-sig PEP 451 is really for general
consumption.  I also have a patch up now.

HTML: http://www.python.org/dev/peps/pep-0451/
implementation: http://bugs.python.org/issue18864

Your comments would be appreciated.

-eric

=

PEP: 451
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow ericsnowcurren...@gmail.com
Discussions-To: import-...@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013, 24-Sep-2013
Resolution:


Abstract


This PEP proposes to add a new class to importlib.machinery called
ModuleSpec.  It will provide all the import-related information used
to load a module and will be available without needing to load the
module first.  Finders will directly provide a module's spec instead of
a loader (which they will continue to provide indirectly).  The import
machinery will be adjusted to take advantage of module specs, including
using them to load modules.


Terms and Concepts
==

The changes in this proposal are an opportunity to make several
existing terms and concepts more clear, whereas currently they are
(unfortunately) ambiguous.  New concepts are also introduced in this
proposal.  Finally, it's worth explaining a few other existing terms
with which people may not be so familiar.  For the sake of context, here
is a brief summary of all three groups of terms and concepts.  A more
detailed explanation of the import system is found at
[import_system_docs]_.

finder
--

A finder is an object that identifies the loader that the import
system should use to load a module.  Currently this is accomplished by
calling the finder's find_module() method, which returns the loader.

Finders are strictly responsible for providing the loader, which they do
through their find_module() method. The import system then uses that
loader to load the module.

loader
--

A loader is an object that is used to load a module during import.
Currently this is done by calling the loader's load_module() method.  A
loader may also provide APIs for getting information about the modules
it can load, as well as about data from sources associated with such a
module.

Right now loaders (via load_module()) are responsible for certain
boilerplate, import-related operations.  These are:

1. perform some (module-related) validation;
2. create the module object;
3. set import-related attributes on the module;
4. register the module to sys.modules;
5. exec the module;
6. clean up in the event of failure while loading the module.

This all takes place during the import system's call to
Loader.load_module().

origin
--

This is a new term and concept.  The idea of it exists subtly in the
import system already, but this proposal makes the concept explicit.

origin in an import context means the system (or resource within a
system) from which a module originates.  For the purposes of this
proposal, origin is also a string which identifies such a resource or
system.  origin is applicable to all modules.

For example, the origin for built-in and frozen modules is the
interpreter itself.  The import system already identifies this origin as
built-in and frozen, respectively.  This is demonstrated in the
following module repr: module 'sys' (built-in).

In fact, the module repr is already a relatively reliable, though
implicit, indicator of a module's origin.  Other modules also indicate
their origin through other means, as described in the entry for
location.

It is up to the loader to decide on how to interpret and use a module's
origin, if at all.

location


This is a new term.  However the concept already exists clearly in the
import system, as associated with the ``__file__`` and ``__path__``
attributes of modules, as well as the name/term path elsewhere.

A location is a resource or place, rather than a system at large,
from which a module is loaded.  It qualifies as an origin.  Examples
of locations include filesystem paths and URLs.  A location is
identified by the name of the resource, but may not necessarily identify
the system to which the resource pertains.  In such cases the loader
would have to identify the system itself.

In contrast to other kinds of module origin, a location cannot be
inferred by the loader just by the module name.  Instead, the loader
must be provided with a string to identify the location, usually by the
finder that generates the loader.  The loader then uses this information
to locate the resource from which it will load the module.  In theory
you could load the module at a given location under various names.

The most common example of locations in the import system are the
files from which source and extension modules are loaded.  For these
modules the location is identified by the string in the ``__file__``