Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
Hi Guido, On Thu, Oct 3, 2013 at 10:47 PM, Guido van Rossum gu...@python.org wrote: Sounds a bit like some security researchers drumming up business. If you can run the binary, presumably you can also recover the seed by looking in /proc, right? Or use ctypes or something. This demonstration seems of academic interest only. I'll not try to defend the opposite point of view very actively, but let me just say that, in my opinion, your objection is not valid. It is broken the same way as a different objection, which would claim that Python can be made sandbox-safe without caring about the numerous segfault cases. They are all very obscure for sure; I tried at some point to list them in Lib/test/crashers. I gave up when people started deleting the files because they no longer crashed on newer versions, just because details changed --- but not because the general crash they explained was in any way fixed... Anyway, my point is that most segfaults can, given enough effort, be transformed into a single, well-documented tool to conduct a large class of attacks. The hash issue is similar. It should be IMHO either ignored (which is fine for a huge fraction of users), or seriously fixed by people with the correctly pessimistic approach. The current hash randomization is simply not preventing anything; someone posted long ago a way to recover bit-by-bit the hash randomized used by a remote web program in Python running on a server. The only benefit of this hash randomization option (-R) was to say to the press that Python fixed very quickly the problem when it was mediatized :-/ This kind of security issues should never be classified as academic interest only. Instead they can be classified as it will take weeks / months / years before some crazy man manages to put together a general attack script, but likely, someone will eventually. From this point of view I'm saluting Christian's effort, even if I prefer to stay far away from this kind of issues myself :-) A bientôt, Armin. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] project culture: take responsibility for your commits
Stephen, thank you for your very thoughtful answer. Stephen J. Turnbull, 03.10.2013 04:23: Stefan Behnel writes: Hi, I'm looking back on a rather unpleasant experience that I recently had in this developer community. Actually, twice by now. Here's what I take from it: You should take responsibility for your commits. I have no clue who you're addressing this advice to. If it's not yourself (from the following, I gather it's not), I think the implication of what you are saying is mistaken. Core devs (by which I mean the high-profile developers who are candidates for PEP delegate) regularly do take responsibility for their commits, just like any other committer, by changing or reverting them. That's visible on this list as well as in the commit logs. I'm aware of that, and I apologise to those who felt offended by my post. I really didn't mean for it to read that way, but I can see in retrospect that my phrasing and its implications were bound to be read as offence, both personally and to the audience. Let's assume these complaints [about the code] are reasonable That's not sufficient. They must also be presented reasonably, by the standards of the community. Not everybody is good at doing that, and those who aren't suffer, as does the project for losing a useful contribution. Unfortunate, but digging out what matters from unclear or high-handed presentations requires an enormous amount of effort, like pyschotherapy. Good psychotherapists bill hundreds of dollars an hour. The very best pythotherapists bill nothing, at least not to this community. I'm also aware of that. In one of the OSS projects that I lead, bad bug reports are actually quite frequent due to a broad distribution of user experience (simplicity has its drawbacks, it seems). It can sometimes take way more time than I'd have wanted to invest to decipher them and/or ask back until it's clearer. Regarding the specific core dev behavior that offended you, I can speak from my experience in another project. What do you do in that case? Do you tell them that what's in is in? I've done that and later reversed my position. In retrospect, I believe I was correct at the time of first approach in the majority of cases, though, on the grounds of the lesser of two evils as I understood the issues (or occasionally that the contributor had completely misunderstood the issues). In most cases the original requester never did come up with a coherent argument, just that something unclear to me didn't work for them. Reversal in such cases was due to a third party who was able to explain the requester's requirements, and often contribute (most of) a specification of a complete fix or a good compromise. Here, you are mostly saying that it's ok to say that for illegitimate and unclear complaints. Even in that case, I'd personally be very careful with that phrase. But I guess that seconds Brett's remarks on subjectivity. Do you tell them that you are a core developer and they are not? I've done that. I don't know if it applies to the cases you have in mind, but invariably that was a last retort when I just wanted to shut down a conversation that had already come back to the same place twice, and polite guidance seemed to be a complete failure. Childish, I guess, but it's been effective. That's not sufficient reason to use it in Python, which has higher standards for courtesy than my other project does. I also agree here. Personally, I can't recall a situation where I ever said that in my OSS projects (and I apologise to everyone I forget here ;) Caveat: as with the next item, I have to wonder if you mistook an explanation that in such disputes, the Python default is to go with the core dev's gut feeling unless there's good reason to do otherwise, and you haven't explained well enough yet for a snotty I am and you're not, so go away! That they can try to do better, and if they are lucky, find someone else who applies their patch? Definitely, and I would advise any core developer to use exactly that response as soon as they feel the discussion is becoming unprofitable. The problem is that these two can go hand in hand. As a non-committer, you are always at the mercy of core developers, and it feels bad to be made aware of it. If the situation (however it was phrased) is essentially it's committed, and now find someone else to listen, then reverting is no longer really an option. It's very hard to convince one core developer to revert a commit of another (and in fact, it should be). So, basically, by simply turning away, you are forcing the bagger into fixing it themselves, i.e. into writing the patch, into cleaning up the mess you left, not even knowing if there will ever be someone else to then apply it. That's a very awkward situation for them. Not uncommonly, writing that patch is way more work than the original core developer invested
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
2013/10/4 Armin Rigo ar...@tunes.org: The current hash randomization is simply not preventing anything; someone posted long ago a way to recover bit-by-bit the hash randomized used by a remote web program in Python running on a server. Oh interesting, is it public? If yes, could we please search the URL of the exploit? I'm more motivated to fix an issue if it is proved to be exploitable. I still fail to understand the real impact of a hash DoS compared to other kinds of DoS. It's like the XML bomb: the vulnerability was also known since many years, but Christian only fixed the issue recently (and the fix was implemented in a package on the Cheeseshop, not in the stblib! Is that correct?). The only benefit of this hash randomization option (-R) was to say to the press that Python fixed very quickly the problem when it was mediatized :-/ The real benefit is to warn users that they should not rely on the dictionary or set order/representation (in their unit tests), and that the hash function is not deterministic :-) (So now it is much easier to replace the hash function with SipHash or anything else, without breaking new applications.) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
Am 04.10.2013 11:15, schrieb Victor Stinner: 2013/10/4 Armin Rigo ar...@tunes.org: The current hash randomization is simply not preventing anything; someone posted long ago a way to recover bit-by-bit the hash randomized used by a remote web program in Python running on a server. Oh interesting, is it public? If yes, could we please search the URL of the exploit? I'm more motivated to fix an issue if it is proved to be exploitable. I'm intrigued, too! I still fail to understand the real impact of a hash DoS compared to other kinds of DoS. It's like the XML bomb: the vulnerability was also known since many years, but Christian only fixed the issue recently (and the fix was implemented in a package on the Cheeseshop, not in the stblib! Is that correct?). About the XML bomb and other issues ... I kinda lost my motivation to push the fixes into the stdlib. :( The code is ready. It just needs a proper configuration interface / API. The hash DoS and XML DoS vulnerabilities have one thing in common. Both multiply the effectiveness of an attack by several orders of magnitude. You don't need 100 GBit/sec to kick a service out of existence. A simple DSL line or mobile phone with 3G/HSDPA does the same job (if done right). Nowaday Python is important, for example major parts of the Brazilian Government run on Python, Zope and Plone. There are Dropbox, Google App Engine ... The real benefit is to warn users that they should not rely on the dictionary or set order/representation (in their unit tests), and that the hash function is not deterministic :-) (So now it is much easier to replace the hash function with SipHash or anything else, without breaking new applications.) Thanks for your groundwork and groudbreaking work, Victor! :) Christian ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
Le Fri, 4 Oct 2013 11:15:17 +0200, Victor Stinner victor.stin...@gmail.com a écrit : 2013/10/4 Armin Rigo ar...@tunes.org: The current hash randomization is simply not preventing anything; someone posted long ago a way to recover bit-by-bit the hash randomized used by a remote web program in Python running on a server. Oh interesting, is it public? If yes, could we please search the URL of the exploit? I'm more motivated to fix an issue if it is proved to be exploitable. I still fail to understand the real impact of a hash DoS compared to other kinds of DoS. It's like the XML bomb: the vulnerability was also known since many years, but Christian only fixed the issue recently (and the fix was implemented in a package on the Cheeseshop, not in the stblib! Is that correct?). The only benefit of this hash randomization option (-R) was to say to the press that Python fixed very quickly the problem when it was mediatized :-/ The real benefit is to warn users that they should not rely on the dictionary or set order/representation (in their unit tests), and that the hash function is not deterministic :-) I agree it probably had educational value. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] project culture: take responsibility for your commits
On 10/02/2013 11:58 AM, Stefan Behnel wrote: I'm looking back on a rather unpleasant experience that I recently had in this developer community. Actually, twice by now. Here's what I take from it: You should take responsibility for your commits. It doesn't sound like you learned anything, then, as you apparently already knew this (judging from your later post). I find it disturbing that nowhere in your two posts to this thread do you take responsibility for your part in what happened. (Disclaimer: I'm only aware of one of the incidents.) Here is what I hope you learn, as it will benefit both you, the developers you work with, and hopefully Python as well: - Be respectful - Realize that people don't always agree on the best solution - Ask for clarification on responses if you don't think your point is being understood The second and third points follow from the first, and is the one that you seemed to have the most trouble with: starting a trouble ticket with accusations that something was snuck in and done behind peoples' backs is offensive, as are continual accusations that those you are working with simply don't understand. Add to that constant complaints about writing patches yourself... well, to be brief I am not surprised you didn't have a good experience -- I don't think anybody involved with that ticket had a good experience, including myself, and I was just a bystander. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
Quoting Victor Stinner victor.stin...@gmail.com: I still fail to understand the real impact of a hash DoS compared to other kinds of DoS. I think the key question is: how many attacking nodes do you need to control to effectively make some system deny service. A threat is bigger if you can do it in 10 requests/s from a single host, instead of needing 10,000 hosts, each making 1000 request/s. With the hash DoS, the threat is that if you manage to fill some dictionary with colliding keys, then each lookup will take a very long time, and you might arrange to put many lookups into a single HTTP request. So a single HTTP request might get very costly CPU-wise. Whether this is a serious threat or not depends on what other threats the system being attacked is vulnerable to. Maybe there is something even simpler, or maybe the hash attack is the only hope of bringing the system to its knees. IMO, the hash attack is particularly tricky since it is very easy to argue and very difficult to demonstrate. So it can result in fear and uncertainty very easily, causing people to overreact just so that they won't be accused of inactivity. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
Le Fri, 04 Oct 2013 17:13:32 +0200, mar...@v.loewis.de a écrit : Whether this is a serious threat or not depends on what other threats the system being attacked is vulnerable to. Maybe there is something even simpler, or maybe the hash attack is the only hope of bringing the system to its knees. IMO, the hash attack is particularly tricky since it is very easy to argue and very difficult to demonstrate. If you know how to generate colliding hashes, it's actually relatively easy to demonstrate, assuming you know how a particular Web application processes its incoming requests (which you do if it's a standard Web application such as hgweb). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2013-09-27 - 2013-10-04) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open4256 ( +0) closed 26682 (+58) total 30938 (+58) Open issues with patches: 1960 Issues opened (30) == #19066: os.execv fails with spaced names on Windows http://bugs.python.org/issue19066 reopened by techtonik #19111: 2to3 should remove from future_builtins import * http://bugs.python.org/issue19111 opened by maubp #19113: duplicate test names in Lib/ctypes/test/test_functions.py http://bugs.python.org/issue19113 opened by xdegaye #19119: duplicate test name in Lib/test/test_heapq.py http://bugs.python.org/issue19119 opened by xdegaye #19120: shlex.shlex.lineno reports a different number depending on the http://bugs.python.org/issue19120 opened by daniel-s #19121: Documentation guidelines enhancements http://bugs.python.org/issue19121 opened by techtonik #19124: os.execv executes in background on Windows http://bugs.python.org/issue19124 opened by techtonik #19129: 6.2.1. Regular Expression Syntax flags http://bugs.python.org/issue19129 opened by endoalir #19131: Broken support of compressed AIFC files http://bugs.python.org/issue19131 opened by serhiy.storchaka #19133: Transient test failure: test_with_statement (test_ftplib) http://bugs.python.org/issue19133 opened by koobs #19138: doctest.IGNORE_EXCEPTION_DETAIL doesn't match when no detail e http://bugs.python.org/issue19138 opened by jamur2 #19140: inspect.Signature.bind() inaccuracies http://bugs.python.org/issue19140 opened by epsy #19141: Windows Launcher fails to respect PATH http://bugs.python.org/issue19141 opened by gwideman #19142: Cross-compile fails trying to execute foreign pgen on build ho http://bugs.python.org/issue19142 opened by Trevor.Bowen #19143: Finding the Windows version getting messier http://bugs.python.org/issue19143 opened by tim.peters #19145: Inconsistent behaviour in itertools.repeat when using negative http://bugs.python.org/issue19145 opened by vajrasky #19146: Improvements to traceback module http://bugs.python.org/issue19146 opened by gvanrossum #19148: Minor issues with Enum docs http://bugs.python.org/issue19148 opened by Esa.Peuha #19150: IDLE shell fails: ModifiedInterpreter instance has no attribu http://bugs.python.org/issue19150 opened by Grupobetatesting #19152: ExtensionFileLoader missing get_filename() http://bugs.python.org/issue19152 opened by eric.snow #19153: Embedding into a shared library fails again http://bugs.python.org/issue19153 opened by rinatous #19154: AttributeError: 'NoneType' in http/client.py when using select http://bugs.python.org/issue19154 opened by fviard #19156: Enum helper functions test-coverage http://bugs.python.org/issue19156 opened by CliffM #19157: ipaddress.IPv6Network.hosts function omits network and broadca http://bugs.python.org/issue19157 opened by m01 #19158: BoundedSemaphore.release() subject to races http://bugs.python.org/issue19158 opened by tim.peters #19159: 2to3 incorrectly converts two parameter unicode() constructor http://bugs.python.org/issue19159 opened by gregory.p.smith #19161: collections Counter handles nan strangely http://bugs.python.org/issue19161 opened by Adam.Davison #19164: Update uuid.UUID TypeError exception: integer should not be an http://bugs.python.org/issue19164 opened by makronized #19165: Change formatter warning to DeprecationWarning in 3.5 http://bugs.python.org/issue19165 opened by brett.cannon #19166: Unusued variable in test_keys in Lib/test/test_dict.py http://bugs.python.org/issue19166 opened by vajrasky Most recent 15 issues with no replies (15) == #19166: Unusued variable in test_keys in Lib/test/test_dict.py http://bugs.python.org/issue19166 #19165: Change formatter warning to DeprecationWarning in 3.5 http://bugs.python.org/issue19165 #19157: ipaddress.IPv6Network.hosts function omits network and broadca http://bugs.python.org/issue19157 #19156: Enum helper functions test-coverage http://bugs.python.org/issue19156 #19154: AttributeError: 'NoneType' in http/client.py when using select http://bugs.python.org/issue19154 #19140: inspect.Signature.bind() inaccuracies http://bugs.python.org/issue19140 #19138: doctest.IGNORE_EXCEPTION_DETAIL doesn't match when no detail e http://bugs.python.org/issue19138 #19133: Transient test failure: test_with_statement (test_ftplib) http://bugs.python.org/issue19133 #19131: Broken support of compressed AIFC files http://bugs.python.org/issue19131 #19129: 6.2.1. Regular Expression Syntax flags http://bugs.python.org/issue19129 #19121: Documentation guidelines enhancements http://bugs.python.org/issue19121 #19113: duplicate test names in Lib/ctypes/test/test_functions.py http://bugs.python.org/issue19113 #19102: Add tests for CLI of the
Re: [Python-Dev] PEP 455: TransformDict
On Sep 22, 2013, at 6:16 PM, Ethan Furman et...@stoneleaf.us wrote: Are we close to asking for pronouncement? When you're ready, let me know. In the meantime, I conducting usability tests on students in Python classes and researching how well it substitutes for existing solutions for case insensitive dictionaries (the primary use case) and for other existing cases such as dictionaries with unicode normalized keys. If you want to participate in the research, I could also use help looking at what other languages do. Python is not the first language with mappings or to encounter use cases for transforming keys prior to insertion and lookup. I would like to find out what work has already been done on this problem. Another consideration is whether the problem is more general that just dictionaries. Would you want similar functionality in all mapping-like objects (i.e. a persistent dictionaries, os.environ, etc)? Would you want similar functionality for other services (i.e. case-insensitive filenames or other homomorphisms). You can also add to the discussion by trying out your own usability tests on people who haven't been exposed to this thread or the pep. My early results indicate that the API still needs work. * When shown code that uses a TransformDict, students don't seem to be able to deduce what the code does just from the context (this contrasts with something like OrderedDict and Counter where the name says what it does). * When given a description of the mechanics of a TransformDict, they don't seem to be able to figure-out what you would do with it without being given an example. * When given a example of using a TransformDict, they understand the example but don't seem to be able to come-up with other examples other than the one they were just shown. And when shown multiple examples, they can't think of other use cases where they've ever needed this in their own code. * This contrasts with the results when I show something less general like a CaseInsensitiveDict. People seem to get that right away. As you might expect, the generalized solution is harder to wrap your head around than a specific solution with a clear name. * One student asked, why give regular dicts a key-function like sorted(), min() and max()? I didn't have a good answer, but I haven't yet had time to read this whole thread. * Another issue is that we're accumulating too many dictionary variants and that is making it difficult to differentiate and choose between them. I haven't found anyone (even in advanced classes with very experienced pythonistas) would knew about all the variations: dict, defaultdict, Mapping, MutableMapping, mapping views, OrderedDict, Counter, ChainMap, andTransformDict. David Beazley on twitter recently proposed that we add a MinDict and MaxDict. There seems to be no shortage of ideas of things that can be done with dictionaries. Besides choosing among the dict variants, there is also confusion about other mapping topics such as 1) when to subclass from dict rather than inherit from MutableMapping, 2) the difference between defaultdict(int) and Counter's use of __missing__ to return zero, and 3) it seems that many experienced users can't even name all the existing methods on dictionaries (they forget clear(), copy(), pop(), popitem(), setdefault(), update() and the fromkeys() classmethod). Overall, my impression at this point is that key transformations are useful, but I'm not sure how to incorporate them without taking Python further away from being a language that just fits in your head. Raymond___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 455: TransformDict
2013/10/4 Raymond Hettinger raymond.hettin...@gmail.com: * Another issue is that we're accumulating too many dictionary variants and that is making it difficult to differentiate and choose between them. I haven't found anyone (even in advanced classes with very experienced pythonistas) would knew about all the variations: dict, defaultdict, Mapping, MutableMapping, mapping views, OrderedDict, Counter, ChainMap, andTransformDict. Ok, but none of these classes address use cases described of the PEP 455. If it became hard to choose the best container for an use case, it's maybe a documentation issue. The PEP 455 contains a long list of existing implementations, so it means that these use cases are common (even if the Python stdlib according to the PEP). It's a good thing that Python proposes a standard implementation (efficient, well tested, documented, etc.) to answer to these use cases. I'm not convinced by your usability test. The problem is maybe the name, TransformDict. We may find a more explicit name, like TranformKeyDict or NormalizedKeyMapping. Or we can use names of the Transformers movies: OptimusPrimeDict, BumblebeeMapping, JazzDictionary, etc. (If we cannot find a better name, we may add more specialized classes: KeyInsensitiveDict and IdentiyDict. But I like the idea of using my own transform function.) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 455: TransformDict
Good evening, On Fri, 4 Oct 2013 13:38:05 -0700 Raymond Hettinger raymond.hettin...@gmail.com wrote: You can also add to the discussion by trying out your own usability tests on people who haven't been exposed to this thread or the pep. I think usability tests should be conducted on people who actually have a need for the API. Otherwise they simply don't make sense: if you don't need an API, then you don't have to learn / understand it either. As an example, if you conduct random usability tests about yield from (PEP 380, accepted) or single-dispatch generic functions (PEP 443, accepted), you'll probably get a negative outcome, especially on students. Or if you conduct usability tests about the ssl module on someone who's never done any network programming, you'll get the similar kind of negative results. * When given a description of the mechanics of a TransformDict, they don't seem to be able to figure-out what you would do with it without being given an example. Well, the documentation is the place where we give examples. * When given a example of using a TransformDict, they understand the example but don't seem to be able to come-up with other examples other than the one they were just shown. Is it any different for e.g. defaultdict? Because the mechanics are exactly the same: a generic construct which you can specialize for various use cases. * This contrasts with the results when I show something less general like a CaseInsensitiveDict. People seem to get that right away. As you might expect, the generalized solution is harder to wrap your head around than a specific solution with a clear name. Yet the generic solution is applicable to far many cases than the specialized one. I'm not against adding a CaseInsensitiveDict, but that would be a rather bizarre thing to do given we can add a generic construct that's far more powerful, and not significantly more difficult. * One student asked, why give regular dicts a key-function like sorted(), min() and max()? I didn't have a good answer, but I haven't yet had time to read this whole thread. :-) The key answer is: when you want to retain the original key. * Another issue is that we're accumulating too many dictionary variants and that is making it difficult to differentiate and choose between them. It shouldn't be difficult, actually, because it doesn't make sense to choose at all. The use cases for OrderedDict, Counter, TransformDict and defaultdict are completely different. I haven't found anyone (even in advanced classes with very experienced pythonistas) would knew about all the variations: dict, defaultdict, Mapping, MutableMapping, mapping views, OrderedDict, Counter, ChainMap, andTransformDict. Is that actually a problem? Overall, my impression at this point is that key transformations are useful, but I'm not sure how to incorporate them without taking Python further away from being a language that just fits in your head. The language fits in your head, but the stdlib doesn't. I don't think it has done so for ages :-) I'm not proposing TransformDict as a builtin, though. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 455: TransformDict
On Oct 4, 2013, at 2:06 PM, Victor Stinner victor.stin...@gmail.com wrote: I'm not convinced by your usability test. You're not the one who needs to be convinced ;-) Please do conduct your own API tests and report back. This is necessary for a new class like TransformDict that was constructed from scratch and proposed for direct admission to the standard library. This contrasts with other tools like OrderedDict, ChainMap, and namedtuple which started their lives outside the standard library where we we able observe their fitness for real problems being solved by real users. None of my consulting client's have anything like a general purpose transforming dict in their utility modules, so we lack the real world experience that informed the design of the other tools in the collections module. To make up for that lack of information, we need to put it in front of users as well as do research into how other languages have tackled the use cases. In short, we need to know whether the API will make sense to people, whether their code will be more readable with a TransformDict, and whether the zoo of dict variants should continue to grow. Right now, I don't know those things. All I have to go on is that I personally think the TransformDict is a cool idea. However, that alone isn't sufficient for accepting the PEP. Raymond “… in order to get things merged you need to solve not only just your own problem but also realize that the world is bigger than your company and try to solve things in a way where it makes sense for other people, even if primarily it is for your own situation.” -- Linus Torvalds http://www.extremeta.com/2013/09/linus-torvalds-said-linuxcon-kernel-developer-panel/390 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
On 10/04/2013 11:15 AM, Victor Stinner wrote: 2013/10/4 Armin Rigo ar...@tunes.org: The current hash randomization is simply not preventing anything; someone posted long ago a way to recover bit-by-bit the hash randomized used by a remote web program in Python running on a server. Oh interesting, is it public? http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html Quoting the synopsis: We also describe a vulnerability of Python's new randomized hash, allowing an attacker to easily recover the 128-bit secret seed. I found all that while reading this interesting, yet moribund, bug report: http://bugs.python.org/issue14621 I guess there was enough bike shedding that people ran out of steam, or something. It happens. //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
2013/10/5 Larry Hastings la...@hastings.org: On 10/04/2013 11:15 AM, Victor Stinner wrote: 2013/10/4 Armin Rigo ar...@tunes.org: The current hash randomization is simply not preventing anything; someone posted long ago a way to recover bit-by-bit the hash randomized used by a remote web program in Python running on a server. Oh interesting, is it public? http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html Quoting the synopsis: We also describe a vulnerability of Python's new randomized hash, allowing an attacker to easily recover the 128-bit secret seed. SipHash homepage contains a proof of concept to compute the secret: https://131002.net/siphash/poc.py But the script is not an exploit on a web server, but a script running locally. It requires for example to know the hash of strings \0 and \0\0. I would like to know if it's possible to retrieve such information in practice. And how do you retrieve the whole hash value from an HTTP page? You may retrieve some bits using specific HTTP requests, but not directly the whole hash value. I don't know any web page displaying directly the hash value of a string coming from the user request!? I'm not saying that the hash DoS does not exist, I'm just trying to estimate the risk (compared to other DoS attacks). Changing the default hash function is also risky and has a (well, minor) impact on performances. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
2013/10/4 mar...@v.loewis.de: Quoting Victor Stinner victor.stin...@gmail.com: I still fail to understand the real impact of a hash DoS compared to other kinds of DoS. I think the key question is: how many attacking nodes do you need to control to effectively make some system deny service. A threat is bigger if you can do it in 10 requests/s from a single host, instead of needing 10,000 hosts, each making 1000 request/s. Correct. I know that they are some other cheap attacks directly at the network layer. For example, the spamhaus/CloudFlare attack which made a lot of noise (300 Gbit/sec) used a DNS trick: The traffic is being generated primarily from DNS amplification attacks. Small requests are sent to DNS servers, generating responses from those servers that are about 50-100 times larger. http://arstechnica.com/security/2013/03/spamhaus-ddos-grows-to-internet-threatening-size/ In this case, you still need many computers to DoS a server (= DDoS). With the hash DoS, the threat is that if you manage to fill some dictionary with colliding keys, then each lookup will take a very long time, and you might arrange to put many lookups into a single HTTP request. So a single HTTP request might get very costly CPU-wise. Ok, but why should we invest time to fix this specific DoS wheras there are other DoS like XML bomb? Why not setting a limit on the CPU time in your favorite web framework instead? I don't know the complexity of adding sandbox-like features to a web framework. (It's probably complex because we are discussing how to fix the issue directly in Python :-)) Whether this is a serious threat or not depends on what other threats the system being attacked is vulnerable to. Maybe there is something even simpler, or maybe the hash attack is the only hope of bringing the system to its knees. Popular DDoS attack are usually the simplest, like flooding the server with ping requests, flooding the DNS server, flooding with HTTP requests which take a lot of time ot process, etc. Using a botnet, you don't care of using an inefficient DoS attack, because your power is the number of zombi. I have no idea of the price of renting a botnet, it's probably expensive (and illegal as well). IMO, the hash attack is particularly tricky since it is very easy to argue and very difficult to demonstrate. So it can result in fear and uncertainty very easily, causing people to overreact just so that they won't be accused of inactivity. It would be easy to evaluate the risk with a public exploit on a real world application :-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 455: TransformDict
On Oct 4, 2013, at 2:14 PM, Antoine Pitrou solip...@pitrou.net wrote: I think usability tests should be conducted on people who actually have a need for the API. Otherwise they simply don't make sense: if you don't need an API, then you don't have to learn / understand it either. You're right. Students don't make the best test subjects. It might be nice to present this at a Python meet-up or somesuch. Or some people on this list can present it at work to see how their colleagues do with it. Also, it might be nice to get feedback from existing users of IdentityDicts or CaseInsensitiveDicts to see if they are bothered by the implementation having two underlying dictionaries. Raymond___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Make str/bytes hash algorithm pluggable?
On Sat, Oct 05, 2013 at 01:27:37AM +0200, Victor Stinner wrote: I have no idea of the price of renting a botnet, it's probably expensive (and illegal as well). Twelve cents per machine. Cheaper in bulk, and cheaper still for machines outside of the US. For those on a budget, you can get ten thousand zombie machines scattered all over the world for two cents each. http://threatpost.com/how-much-does-botnet-cost-022813/77573 I believe you can also rent a botnet for $2 an hour. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 451: ModuleSpec
After a few rounds on import-sig PEP 451 is really for general consumption. I also have a patch up now. HTML: http://www.python.org/dev/peps/pep-0451/ implementation: http://bugs.python.org/issue18864 Your comments would be appreciated. -eric = PEP: 451 Title: A ModuleSpec Type for the Import System Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow ericsnowcurren...@gmail.com Discussions-To: import-...@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 8-Aug-2013 Python-Version: 3.4 Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013, 24-Sep-2013 Resolution: Abstract This PEP proposes to add a new class to importlib.machinery called ModuleSpec. It will provide all the import-related information used to load a module and will be available without needing to load the module first. Finders will directly provide a module's spec instead of a loader (which they will continue to provide indirectly). The import machinery will be adjusted to take advantage of module specs, including using them to load modules. Terms and Concepts == The changes in this proposal are an opportunity to make several existing terms and concepts more clear, whereas currently they are (unfortunately) ambiguous. New concepts are also introduced in this proposal. Finally, it's worth explaining a few other existing terms with which people may not be so familiar. For the sake of context, here is a brief summary of all three groups of terms and concepts. A more detailed explanation of the import system is found at [import_system_docs]_. finder -- A finder is an object that identifies the loader that the import system should use to load a module. Currently this is accomplished by calling the finder's find_module() method, which returns the loader. Finders are strictly responsible for providing the loader, which they do through their find_module() method. The import system then uses that loader to load the module. loader -- A loader is an object that is used to load a module during import. Currently this is done by calling the loader's load_module() method. A loader may also provide APIs for getting information about the modules it can load, as well as about data from sources associated with such a module. Right now loaders (via load_module()) are responsible for certain boilerplate, import-related operations. These are: 1. perform some (module-related) validation; 2. create the module object; 3. set import-related attributes on the module; 4. register the module to sys.modules; 5. exec the module; 6. clean up in the event of failure while loading the module. This all takes place during the import system's call to Loader.load_module(). origin -- This is a new term and concept. The idea of it exists subtly in the import system already, but this proposal makes the concept explicit. origin in an import context means the system (or resource within a system) from which a module originates. For the purposes of this proposal, origin is also a string which identifies such a resource or system. origin is applicable to all modules. For example, the origin for built-in and frozen modules is the interpreter itself. The import system already identifies this origin as built-in and frozen, respectively. This is demonstrated in the following module repr: module 'sys' (built-in). In fact, the module repr is already a relatively reliable, though implicit, indicator of a module's origin. Other modules also indicate their origin through other means, as described in the entry for location. It is up to the loader to decide on how to interpret and use a module's origin, if at all. location This is a new term. However the concept already exists clearly in the import system, as associated with the ``__file__`` and ``__path__`` attributes of modules, as well as the name/term path elsewhere. A location is a resource or place, rather than a system at large, from which a module is loaded. It qualifies as an origin. Examples of locations include filesystem paths and URLs. A location is identified by the name of the resource, but may not necessarily identify the system to which the resource pertains. In such cases the loader would have to identify the system itself. In contrast to other kinds of module origin, a location cannot be inferred by the loader just by the module name. Instead, the loader must be provided with a string to identify the location, usually by the finder that generates the loader. The loader then uses this information to locate the resource from which it will load the module. In theory you could load the module at a given location under various names. The most common example of locations in the import system are the files from which source and extension modules are loaded. For these modules the location is identified by the string in the ``__file__``