[issue12226] use HTTPS by default for uploading packages to pypi
Giovanni Bajo added the comment: Please notice that a redesign of PyPI and package security is ongoing in catalog-sig. -- nosy: +Giovanni.Bajo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12226 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 00:20, Domen Kožar rep...@bugs.python.org ha scritto: Domen Kožar added the comment: According to talk at 29c3: http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html Quote: We also describe a vulnerability of Python's new randomized hash, allowing an attacker to easily recover the 128-bit secret seed. As a reliable fix to hash-flooding, we introduce SipHash, a family of cryptographically strong keyed hash function competitive in performance with the weak hashes, and already adopted in OpenDNS, Perl 5, Ruby, and in the Rust language. That is exactly the vulnerability that was previously mentioned in the context of this bug. SipHash is currently the only solution for a collision-resistant fast-enough hash. -- Giovanni Bajo -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 06:52, Christian Heimes rep...@bugs.python.org ha scritto: Christian Heimes added the comment: Thanks for the information! I'm working on a PEP for the issue at hand. Since you're collecting ideas on this, I would like to stress that, in the Python 3 transition, it was deemed acceptable to switch all objects to use unicode strings for attribute names, making the hash computation of such attributes (in the context of the instance dictionary) at least twice as slow than it used to be (the 'at least' refers to the fact that longer strings might have even worse effects because of a higher number of cache misses). SipHash isn't twice as slow as the current hash function, not even for short strings. So there is a precedent in slowing down the hash computation time in a very important use case, and it doesn't look like hell froze over. -- Giovanni Bajo -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 19:51, Christian Heimes rep...@bugs.python.org ha scritto: Christian Heimes added the comment: Giovanni, why do you think that hashing of unicode strings is slower than byte strings? First of all ASCII only unicode strings are packed and use just one byte per char. CPython's FNV implementation processes one element in each cycle, that is one byte for bytes and ASCII unicode, two bytes for UCS-2 and four bytes for UCS-4. Bytes and UCS-4 strings require the same amount of CPU instructions. Ah sorry, I stand corrected (though packing wasn't there in 3.0, was it? I was specifically referred to the 2.x - 3.0 transition). -- Giovanni Bajo My Blog: http://giovanni.bajo.it -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: Il giorno 11/nov/2012, alle ore 05:56, Chris Rebert rep...@bugs.python.org ha scritto: Chris Rebert added the comment: What about CityHash? (http://code.google.com/p/cityhash/ ; unofficial C port: http://code.google.com/p/cityhash-c/ ) It's good enough for Google... It's good enough for Google in a context that does not require protection against collision attacks. If you have a look at SipHash' page, you will find a program to generate collisions to CityHash. -- Giovanni Bajo My Blog: http://giovanni.bajo.it -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: Until it's broken with a yet-unknown attack, SipHash is a pseudo-random function and as such it does uniformly distribute values across the output space, and never leak any information on the key (the randomized seed). Being designed by cryptographers, it is likely that it doesn't turn out to be a fail like the solution that was just released (no offense intended, but it's been a large-scale PR failure). As long as we don't introduce bias while reducing SipHash's output to fit the hash table size (so for instance, usage of modulus is not appropriate), then the hash function should behave very well. Any data type can be supplied to SipHash, including numbers; you just need to take their (platform-dependent) memory representation and feed it to SipHash. Obviously it will be much much slower than the current function which used to be hash(x) = x (before randomization), but that's the price to pay to avoid security issues. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: Il giorno 07/nov/2012, alle ore 08:40, Serhiy Storchaka rep...@bugs.python.org ha scritto: Serhiy Storchaka added the comment: I tested different kind of strings. $ ./python -m timeit -n 1 -s t = b'a' * 10**8 hash(t) $ ./python -m timeit -n 1 -s t = 'a' * 10**8 hash(t) $ ./python -m timeit -n 1 -s t = '\u0100' * 10**8 hash(t) $ ./python -m timeit -n 1 -s t = '\U0001' * 10**8 hash(t) current SipHash bytes 181 msec 453 msec 2.5x UCS1 429 msec 453 msec 1.06x UCS2 179 msec 897 msec 5x UCS4 183 msec 1.79 sec 9.8x Hi Serhiy, can you please attach the generated assembly code for the siphash function with your compiler and your optimization flags (that is, the one that produces the above results)? Thanks! -- Giovanni Bajo -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: Il giorno 07/nov/2012, alle ore 12:59, Marc-Andre Lemburg rep...@bugs.python.org ha scritto: Marc-Andre Lemburg added the comment: On 07.11.2012 12:55, Mark Dickinson wrote: Mark Dickinson added the comment: [MAL] I don't understand why we are only trying to fix the string problem and completely ignore other key types. [Armin] estimating the risks of giving up on a valid query for a truly random hash, at an overestimated one billion queries per second ... That's fine in principle, but if this gets extended to integers, note that our current integer hash is about as far from 'truly random' as you can get: Python 3.4.0a0 (default:f02555353544, Nov 4 2012, 11:50:12) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type help, copyright, credits or license for more information. [hash(i) for i in range(20)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] Moreover, it's going to be *very* hard to change the int hash while preserving the `x == y implies hash(x) == hash(y)` invariant across all the numeric types (int, float, complex, Decimal, Fraction, 3rd-party types that need to remain compatible). Exactly. And that's why trying to find secure hash functions isn't going to solve the problem. Together with randomization they may make things better for strings, but they are no solution for numeric types, and they also don't allow detecting possible attacks on your systems. But yeah, I'm repeating myself :-) I don't see how it follows. Python has several hash functions in its core, one of which is the string hash function; it is currently severely broken from a security standpoint; it also happens to be probably the most common case for dictionaries in Python, and the ones that it is more easily exploited in web frameworks. If we can manage to fix the string hash function (eg: through SipHash) we will be one step further in mitigating the possible attacks. Solving collisions and mitigating attacks on numeric types is a totally different problem because it is a totally different function. I suggest we keep different discussions and different bugs for it. For instance, I'm only personally interested in mitigating attacks on the string hash function. -- Giovanni Bajo -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: Christian, there are good semi-crypto hash functions that don't leak as bad as Python's own modified FNV hash, without going all the way to HMAC. SipHash has very good collision resistance and doesn't leak anything: https://www.131002.net/siphash/ (notice: they distribute a python program to recover python's seed) It's obviously slower than Python's FNV, but it's hard to beat a sum+multiplication per character. -- nosy: +Giovanni.Bajo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14621] Hash function is not randomized properly
Giovanni Bajo added the comment: For short strings, you might want to have a look at the way you fetch the final partial word from memory. If the string is = 8 bytes, you can fetch the last partial word as an unaligned memory fetch followed by a shift, instead of using a switch like in the reference code. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6721] Locks in python standard library should be sanitized on fork
Giovanni Bajo giovannib...@gmail.com added the comment: If there's agreement that the general problem is unsolvable (so fork and threads just don't get along with each other), what we could attempt is trying to limit the side effects in the standard library, so that fewest users as possible are affected by this problem. For instance, having deadlocks just because of print statements sounds like a bad QoI that we could attempt to improve. Is there a reason while BufferedIO needs to hold its internal data-structure lock (used to make it thread-safe) while it's doing I/O and releasing the GIL? I would think that it's feasible to patch it so that its internal lock is only used to synchronize accesses to the internal data structures, but it is never held while I/O is performed (and thus the GIL is released -- at which point, if another threads forks, the problem appears). -- nosy: +Giovanni.Bajo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6721 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7213] subprocess leaks open file descriptors between Popen instances causing hangs
Giovanni Bajo giovannib...@gmail.com added the comment: Hi Gregory, will you backport Mirko's patches to subprocess32? The last thing left in this bug is my proposal to change the default of close_fds to True to Windows too, but at the same time detect whether this is possible or not (depending on the pipe redirections). So basically close_fds=True would be changed to mean close the FDs, if it is possible, otherwise never mind. This is not a break in compatibility on Linux/UNIX (where it is always possible), nor Windows (where currently it just raises a ValueError if you ask it to raise close the file descriptors while doing redirections). The rationale for this change is again cross-compatibility. I don't like when my code breaks because of a limitation of an OS that has a clear workaround. Subprocess is a high-level library after all, it's not like os.fork() or similar low-level libraries which expose the underlying platform differences. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7213 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7213] Popen.subprocess change close_fds default to True
Giovanni Bajo giovannib...@gmail.com added the comment: Hi Gregory, I saw your commit here: http://code.activestate.com/lists/python-checkins/91914/ This basically means that in 3.2 it is mandatory to specify close_fds to avoid a DeprecationWarning. *BUT* there is no good value that works both on Windows and Linux if you redirect stdout/stderr, as shown in this bug. So basically in 3.2 to avoid a warning, each and every usage of Popen() with a redirection should be guarded by an if that checks the platform. I don't think this is acceptable. Have I misunderstood something? Also: can you please explain how the behaviour is going to change in 3.3? I assume that you are planning to change the default to True; but would that also cover Windows' singularity in redirection cases? -- nosy: +Giovanni.Bajo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7213 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7213] Popen.subprocess change close_fds default to True
Giovanni Bajo giovannib...@gmail.com added the comment: Setting CLOEXEC on the pipes seems like a very good fix for this bug. I'm +1 on it, but I think it should be the default; instead, your proposed patch adds a new argument to the public API. Why do you think it's necessary to do so? At the same time, we need a solution to handle close_fds, because the current status of the 3.2 with the DeprecationWarning (on 90% of subprocess uses in the world, if it ever gets backported to 2.7) and no way to fix it in a multi-platform way is really sad. I don't think a new constant DISREGARD_FDS is necessary. I think we can just use None as intermediate default (just like the current 3.2 does), and later switch it to True. The only further required action is to make True always work on Windows and never error out (just make it do nothing if there are some redirections), which is an obviously good thing to do to increase portability of subprocess. Otherwise, if this can't make 3.2, I think the DeprecationWarning should be reverted until we agree on a different solution. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7213 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7213] Popen.subprocess change close_fds default to True
Giovanni Bajo giovannib...@gmail.com added the comment: Would you mind elaborating on where is the race condition? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7213 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2128] sys.argv is wrong for unicode strings
Giovanni Bajo added the comment: mbstowcs uses LC_CTYPE. Is that correct and consistent with the way default encoding under UNIX is handled by Py3k? Would a Py_MainW or similar wrapper be easier on the UNIX guys? I'm just asking, I don't have a definite idea. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2128 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2128] sys.argv is wrong for unicode strings
Giovanni Bajo added the comment: I'm attaching a simple patch that seems to work under Py3k. The trick is that Py3k already attempts (not sure how or why) to decode argv using utf-8. So it's sufficient to setup argv as UTF8-encoded strings. Notice that brings the output of python à from this: Fatal Python error: no mem for sys.argv UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data to this: TypeError: zipimporter() argument 1 must be string without null bytes, not str which is expected since zipimporter_init() doesn't even know to ignore unicode strings (let alone handle them correctly...). Added file: http://bugs.python.org/file9449/argv_unicode.patch __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2128 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2128] sys.argv is wrong for unicode strings
New submission from Giovanni Bajo: Under Windows, sys.argv is created through the Windows ANSI API. When you have a file/directory which can't be represented in the system encoding (eg: a japanese-named file or directory on a Western Windows), Windows will encode the filename to the system encoding using what we call the replace policy, and thus sys.argv[] will contain an entry like c:\\foo\\??.dat. My suggestion is that: * At the Python level, we still expose a single sys.argv[], which will contain unicode strings. I think this exactly matches what Py3k does now. * At the C level, I believe it involves using GetCommandLineW() and CommandLineToArgvW() in WinMain.c, but should Py_Main/PySys_SetArgv() be changed to also accept wchar_t** arguments? Or is it better to allow for NULL to be passed (under Windows at least), so that the Windows code-path in there can use GetCommandLineW()/CommandLineToArgvW() to get the current process' arguments? -- components: Interpreter Core messages: 62458 nosy: giovannibajo severity: normal status: open title: sys.argv is wrong for unicode strings type: behavior versions: Python 3.0 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2128 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs
Giovanni Bajo added the comment: Making the standard Windows Python DLL larger is not only a problem of disk size: it will make all packages produced by PyInstaller or py2exe larger, and that means lots of wasted bandwidth. I see that MvL is still -1 on simply splitting CJK codecs out, and vetos it by asking for a generalization work of insane proportion (a hard-to-define PEP, an entirely new build system for Windows, etc.). I understand (and *agree*) that having a general rule would be a much superior solution, but CJK is already almost 50% of the python.dll, so it *is* already a special case by any means. And special cases like these could be handled with special-case decisions. Thus, I still strongly disagree with MvL and would like CJK be split out of python.dll as soon as possible. I would not really ask this for any other modules but CJK, and understand that further actions would really require a PEP and a new build system for Windows. So, I ask again MvL to soften his position and reconsider the CJK splitting in all its singularity. Please! (in case it's not clear, I would prepare a patch to split CJK out anyday if there were hopes that it gets accepted) -- nosy: +giovannibajo __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2066 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1342] Crash on Windows if Python runs from a directory with umlauts
Changes by Giovanni Bajo: -- nosy: +rasky __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1342 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com