[issue12226] use HTTPS by default for uploading packages to pypi

2013-02-24 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Please notice that a redesign of PyPI and package security is ongoing in 
catalog-sig.

--
nosy: +Giovanni.Bajo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12226
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Il giorno 02/gen/2013, alle ore 00:20, Domen Kožar rep...@bugs.python.org ha 
scritto:

 
 Domen Kožar added the comment:
 
 According to talk at 29c3: 
 http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html
 
 Quote: We also describe a vulnerability of Python's new randomized hash, 
 allowing an attacker to easily recover the 128-bit secret seed. As a reliable 
 fix to hash-flooding, we introduce SipHash, a family of cryptographically 
 strong keyed hash function competitive in performance with the weak hashes, 
 and already adopted in OpenDNS, Perl 5, Ruby, and in the Rust language.

That is exactly the vulnerability that was previously mentioned in the context 
of this bug. SipHash is currently the only solution for a collision-resistant 
fast-enough hash. 
-- 
Giovanni Bajo

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Il giorno 02/gen/2013, alle ore 06:52, Christian Heimes 
rep...@bugs.python.org ha scritto:

 
 Christian Heimes added the comment:
 
 Thanks for the information! I'm working on a PEP for the issue at hand.

Since you're collecting ideas on this, I would like to stress that, in the 
Python 3 transition, it was deemed acceptable to switch all objects to use 
unicode strings for attribute names, making the hash computation of such 
attributes (in the context of the instance dictionary) at least twice as slow 
than it used to be (the 'at least' refers to the fact that longer strings might 
have even worse effects because of a higher number of cache misses). SipHash 
isn't twice as slow as the current hash function, not even for short strings.

So there is a precedent in slowing down the hash computation time in a very 
important use case, and it doesn't look like hell froze over.
-- 
Giovanni Bajo

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Il giorno 02/gen/2013, alle ore 19:51, Christian Heimes 
rep...@bugs.python.org ha scritto:

 
 Christian Heimes added the comment:
 
 Giovanni, why do you think that hashing of unicode strings is slower than 
 byte strings? 
 
 First of all ASCII only unicode strings are packed and use just one byte per 
 char. CPython's FNV implementation processes one element in each cycle, that 
 is one byte for bytes and ASCII unicode, two bytes for UCS-2 and four bytes 
 for UCS-4. Bytes and UCS-4 strings require the same amount of CPU 
 instructions.

Ah sorry, I stand corrected (though packing wasn't there in 3.0, was it? I was 
specifically referred to the 2.x - 3.0 transition).
-- 
Giovanni Bajo

My Blog: http://giovanni.bajo.it

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2012-11-11 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Il giorno 11/nov/2012, alle ore 05:56, Chris Rebert rep...@bugs.python.org ha 
scritto:

 
 Chris Rebert added the comment:
 
 What about CityHash? (http://code.google.com/p/cityhash/ ; unofficial C port: 
 http://code.google.com/p/cityhash-c/ )
 It's good enough for Google...

It's good enough for Google in a context that does not require protection 
against collision attacks. If you have a look at SipHash' page, you will find a 
program to generate collisions to CityHash.
-- 
Giovanni Bajo

My Blog: http://giovanni.bajo.it

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Until it's broken with a yet-unknown attack, SipHash is a pseudo-random 
function and as such it does uniformly distribute values across the output 
space, and never leak any information on the key (the randomized seed). Being 
designed by cryptographers, it is likely that it doesn't turn out to be a 
fail like the solution that was just released (no offense intended, but it's 
been a large-scale PR failure).

As long as we don't introduce bias while reducing SipHash's output to fit the 
hash table size (so for instance, usage of modulus is not appropriate), then 
the hash function should behave very well.

Any data type can be supplied to SipHash, including numbers; you just need to 
take their (platform-dependent) memory representation and feed it to SipHash. 
Obviously it will be much much slower than the current function which used to 
be hash(x) = x (before randomization), but that's the price to pay to avoid 
security issues.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Il giorno 07/nov/2012, alle ore 08:40, Serhiy Storchaka 
rep...@bugs.python.org ha scritto:

 Serhiy Storchaka added the comment:
 
 I tested different kind of strings.
 
 $ ./python -m timeit -n 1 -s t = b'a' * 10**8  hash(t)
 $ ./python -m timeit -n 1 -s t = 'a' * 10**8  hash(t)
 $ ./python -m timeit -n 1 -s t = '\u0100' * 10**8  hash(t)
 $ ./python -m timeit -n 1 -s t = '\U0001' * 10**8  hash(t)
 
   current   SipHash
 
 bytes  181 msec  453 msec  2.5x
 UCS1   429 msec  453 msec  1.06x
 UCS2   179 msec  897 msec  5x
 UCS4   183 msec  1.79 sec  9.8x

Hi Serhiy,

can you please attach the generated assembly code for the siphash function with 
your compiler and your optimization flags (that is, the one that produces the 
above results)?

Thanks!
-- 
Giovanni Bajo

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Il giorno 07/nov/2012, alle ore 12:59, Marc-Andre Lemburg 
rep...@bugs.python.org ha scritto:

 
 Marc-Andre Lemburg added the comment:
 
 On 07.11.2012 12:55, Mark Dickinson wrote:
 
 Mark Dickinson added the comment:
 
 [MAL]
 I don't understand why we are only trying to fix the string problem
 and completely ignore other key types.
 
 [Armin]
 estimating the risks of giving up on a valid query for a truly random
 hash, at an overestimated one billion queries per second ...
 
 That's fine in principle, but if this gets extended to integers, note that 
 our current integer hash is about as far from 'truly random' as you can get:
 
Python 3.4.0a0 (default:f02555353544, Nov  4 2012, 11:50:12) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type help, copyright, credits or license for more information.
 [hash(i) for i in range(20)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
 
 Moreover, it's going to be *very* hard to change the int hash while 
 preserving the `x == y implies hash(x) == hash(y)` invariant across all the 
 numeric types (int, float, complex, Decimal, Fraction, 3rd-party types that 
 need to remain compatible).
 
 Exactly. And that's why trying to find secure hash functions isn't
 going to solve the problem. Together with randomization they may
 make things better for strings, but they are no solution for numeric
 types, and they also don't allow detecting possible attacks on your
 systems.
 
 But yeah, I'm repeating myself :-)
 

I don't see how it follows. Python has several hash functions in its core, one 
of which is the string hash function; it is currently severely broken from a 
security standpoint; it also happens to be probably the most common case for 
dictionaries in Python, and the ones that it is more easily exploited in web 
frameworks. 

If we can manage to fix the string hash function (eg: through SipHash) we will 
be one step further in mitigating the possible attacks.

Solving collisions and mitigating attacks on numeric types is a totally 
different problem because it is a totally different function. I suggest we keep 
different discussions and different bugs for it. For instance, I'm only 
personally interested in mitigating attacks on the string hash function.
-- 
Giovanni Bajo

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2012-11-06 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Christian, there are good semi-crypto hash functions that don't leak as bad as 
Python's own modified FNV hash, without going all the way to HMAC.

SipHash has very good collision resistance and doesn't leak anything:
https://www.131002.net/siphash/
(notice: they distribute a python program to recover python's seed)

It's obviously slower than Python's FNV, but it's hard to beat a 
sum+multiplication per character.

--
nosy: +Giovanni.Bajo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14621] Hash function is not randomized properly

2012-11-06 Thread Giovanni Bajo

Giovanni Bajo added the comment:

For short strings, you might want to have a look at the way you fetch the final 
partial word from memory.

If the string is = 8 bytes, you can fetch the last partial word as an 
unaligned memory fetch followed by a shift, instead of using a switch like in 
the reference code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14621
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6721] Locks in python standard library should be sanitized on fork

2011-06-25 Thread Giovanni Bajo

Giovanni Bajo giovannib...@gmail.com added the comment:

If there's agreement that the general problem is unsolvable (so fork and 
threads just don't get along with each other), what we could attempt is trying 
to limit the side effects in the standard library, so that fewest users as 
possible are affected by this problem.

For instance, having deadlocks just because of print statements sounds like a 
bad QoI that we could attempt to improve. Is there a reason while BufferedIO 
needs to hold its internal data-structure lock (used to make it thread-safe) 
while it's doing I/O and releasing the GIL? I would think that it's feasible to 
patch it so that its internal lock is only used to synchronize accesses to the 
internal data structures, but it is never held while I/O is performed (and thus 
the GIL is released -- at which point, if another threads forks, the problem 
appears).

--
nosy: +Giovanni.Bajo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6721
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7213] subprocess leaks open file descriptors between Popen instances causing hangs

2010-12-13 Thread Giovanni Bajo

Giovanni Bajo giovannib...@gmail.com added the comment:

Hi Gregory,

will you backport Mirko's patches to subprocess32?

The last thing left in this bug is my proposal to change the default of 
close_fds to True to Windows too, but at the same time detect whether this is 
possible or not (depending on the pipe redirections). 

So basically close_fds=True would be changed to mean close the FDs, if it is 
possible, otherwise never mind. This is not a break in compatibility on 
Linux/UNIX (where it is always possible), nor Windows (where currently it 
just raises a ValueError if you ask it to raise close the file descriptors 
while doing redirections).

The rationale for this change is again cross-compatibility. I don't like when 
my code breaks because of a limitation of an OS that has a clear workaround. 
Subprocess is a high-level library after all, it's not like os.fork() or 
similar low-level libraries which expose the underlying platform differences.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7213
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7213] Popen.subprocess change close_fds default to True

2010-12-10 Thread Giovanni Bajo

Giovanni Bajo giovannib...@gmail.com added the comment:

Hi Gregory, I saw your commit here:
http://code.activestate.com/lists/python-checkins/91914/

This basically means that in 3.2 it is mandatory to specify close_fds to avoid 
a DeprecationWarning. *BUT* there is no good value that works both on Windows 
and Linux if you redirect stdout/stderr, as shown in this bug.

So basically in 3.2 to avoid a warning, each and every usage of Popen() with a 
redirection should be guarded by an if that checks the platform. I don't think 
this is acceptable.

Have I misunderstood something? Also: can you please explain how the behaviour 
is going to change in 3.3? I assume that you are planning to change the default 
to True; but would that also cover Windows' singularity in redirection cases?

--
nosy: +Giovanni.Bajo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7213
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7213] Popen.subprocess change close_fds default to True

2010-12-10 Thread Giovanni Bajo

Giovanni Bajo giovannib...@gmail.com added the comment:

Setting CLOEXEC on the pipes seems like a very good fix for this bug. I'm +1 on 
it, but I think it should be the default; instead, your proposed patch adds a 
new argument to the public API. Why do you think it's necessary to do so?

At the same time, we need a solution to handle close_fds, because the current 
status of the 3.2 with the DeprecationWarning (on 90% of subprocess uses in the 
world, if it ever gets backported to 2.7) and no way to fix it in a 
multi-platform way is really sad.

I don't think a new constant DISREGARD_FDS is necessary. I think we can just 
use None as intermediate default (just like the current 3.2 does), and later 
switch it to True. The only further required action is to make True always 
work on Windows and never error out (just make it do nothing if there are some 
redirections), which is an obviously good thing to do to increase portability 
of subprocess.

Otherwise, if this can't make 3.2, I think the DeprecationWarning should be 
reverted until we agree on a different solution.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7213
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7213] Popen.subprocess change close_fds default to True

2010-12-10 Thread Giovanni Bajo

Giovanni Bajo giovannib...@gmail.com added the comment:

Would you mind elaborating on where is the race condition?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7213
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2128] sys.argv is wrong for unicode strings

2008-02-21 Thread Giovanni Bajo

Giovanni Bajo added the comment:

mbstowcs uses LC_CTYPE. Is that correct and consistent with the way
default encoding under UNIX is handled by Py3k?

Would a Py_MainW or similar wrapper be easier on the UNIX guys? I'm just
asking, I don't have a definite idea.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2128
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2128] sys.argv is wrong for unicode strings

2008-02-17 Thread Giovanni Bajo

Giovanni Bajo added the comment:

I'm attaching a simple patch that seems to work under Py3k. The trick is
that Py3k already attempts (not sure how or why) to decode argv using
utf-8. So it's sufficient to setup argv as UTF8-encoded strings.

Notice that brings the output of python à from this:

Fatal Python error: no mem for sys.argv
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
invalid data

to this:

TypeError: zipimporter() argument 1 must be string without null bytes,
not str

which is expected since zipimporter_init() doesn't even know to ignore
unicode strings (let alone handle them correctly...).

Added file: http://bugs.python.org/file9449/argv_unicode.patch

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2128
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2128] sys.argv is wrong for unicode strings

2008-02-16 Thread Giovanni Bajo

New submission from Giovanni Bajo:

Under Windows, sys.argv is created through the Windows ANSI API.

When you have a file/directory which can't be represented in the 
system encoding (eg: a japanese-named file or directory on a Western 
Windows), Windows will encode the filename to the system encoding using
what we call the replace policy, and thus sys.argv[] will contain an
entry like c:\\foo\\??.dat.

My suggestion is that:

* At the Python level, we still expose a single sys.argv[], which will 
contain unicode strings. I think this exactly matches what Py3k does now. 

* At the C level, I believe it involves using GetCommandLineW() and 
CommandLineToArgvW() in WinMain.c, but should Py_Main/PySys_SetArgv() be 
changed to also accept wchar_t** arguments? Or is it better to allow for 
NULL to be passed (under Windows at least), so that the Windows
code-path in there can use GetCommandLineW()/CommandLineToArgvW() to get
the current process' arguments?

--
components: Interpreter Core
messages: 62458
nosy: giovannibajo
severity: normal
status: open
title: sys.argv is wrong for unicode strings
type: behavior
versions: Python 3.0

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2128
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2008-02-16 Thread Giovanni Bajo

Giovanni Bajo added the comment:

Making the standard Windows Python DLL larger is not only a problem of
disk size: it will make all packages produced by PyInstaller or py2exe
larger, and that means lots of wasted bandwidth.

I see that MvL is still -1 on simply splitting CJK codecs out, and vetos
it by asking for a generalization work of insane proportion (a
hard-to-define PEP, an entirely new build system for Windows, etc.).

I understand (and *agree*) that having a general rule would be a much
superior solution, but CJK is already almost 50% of the python.dll, so
it *is* already a special case by any means. And special cases like
these  could be handled with special-case decisions.

Thus, I still strongly disagree with MvL and would like CJK be split out
 of python.dll as soon as possible. I would not really ask this for any
other modules but CJK, and understand that further actions would really
require a PEP and a new build system for Windows.

So, I ask again MvL to soften his position and reconsider the CJK
splitting in all its singularity. Please!

(in case it's not clear, I would prepare a patch to split CJK out anyday
if there were hopes that it gets accepted)

--
nosy: +giovannibajo

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2066
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1342] Crash on Windows if Python runs from a directory with umlauts

2008-01-26 Thread Giovanni Bajo

Changes by Giovanni Bajo:


--
nosy: +rasky

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1342
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com