Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Peter Otten
Guido van Rossum wrote:

> I wonder if it wouldn't make sense to change urlencode() to generate
> URLs that don't depend on the hash order, for all versions of Python
> that support PYTHONHASHSEED? It seems a one-line fix:
> 
> query = query.items()
> 
> with this:
> 
> query = sorted(query.items())
> 
> This would not prevent breakage of unit tests, but it would make a
> much simpler fix possible: simply sort the parameters in the URL.
> 
> Thoughts?

There may be people who mix bytes and str or pass other non-str keys:

>>> query = {b"a":b"b", "c":"d", 5:6}
>>> urlencode(query)
'a=b&c=d&5=6'
>>> sorted(query.items())
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unorderable types: str() < bytes()

Not pretty, but a bugfix should not break such constructs.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Antoine Pitrou
On Sat, 18 Aug 2012 14:23:13 +0900
"Stephen J. Turnbull"  wrote:
> Joao S. O. Bueno writes:
> 
>  > I don't think this behavior is only desirable to unit tests: having
>  > URL's been formed in predictable way  a good thing in any way one
>  > thinks about it.
> 
> Especially if you're a hacker.  One more thing you may be able to use
> against careless sites that don't expect the unexpected to occur in
> URLs.

That's unsubstantiated. Give an example of how sorted URLs compromise
security.

Regards

Antoine.


-- 
Software development and contracting: http://pro.pitrou.net


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Joao S. O. Bueno
On 18 August 2012 02:23, Stephen J. Turnbull  wrote:
> Joao S. O. Bueno writes:
>
>  > I don't think this behavior is only desirable to unit tests: having
>  > URL's been formed in predictable way  a good thing in any way one
>  > thinks about it.
>
> Especially if you're a hacker.  One more thing you may be able to use
> against careless sites that don't expect the unexpected to occur in
> URLs.
>
> I'm not saying this is a bad thing, but we should remember that the
> whole point of PYTHONHASHSEED is that regularities can be exploited
> for devious and malicious purposes, and reducing regularity makes many
> attacks more difficult.  "*Any* way one thinks about it" is far too
> strong a claim.

Ageeded that "any way one thinks about it" is far too strong a claim -
but I still hold to the point. Maybe "most ways one thinks about it"
:-)  .


>
> Steve
>
>
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Christian Heimes
Am 17.08.2012 21:27, schrieb Guido van Rossum:
> I wonder if it wouldn't make sense to change urlencode() to generate
> URLs that don't depend on the hash order, for all versions of Python
> that support PYTHONHASHSEED? It seems a one-line fix:
> 
> query = query.items()
> 
> with this:
> 
> query = sorted(query.items())
> 
> This would not prevent breakage of unit tests, but it would make a
> much simpler fix possible: simply sort the parameters in the URL.

I vote -0. The issue can also be addressed with a small and simple
helper function that wraps urlparse and compares the query parameter. Or
you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
application.

The order of query string parameter is actually important for some
applications, for example Zope, colander+deform and other form
frameworks use the parameter order to group parameters.

Therefore I propose that the query string is only sorted when the query
is exactly a dict and not some subclass or class that has an items() method.

if type(query) is dict:
query = sorted(query.items())
else:
query = query.items()

Christian

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Guido van Rossum
On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes  wrote:
> Am 17.08.2012 21:27, schrieb Guido van Rossum:
>> I wonder if it wouldn't make sense to change urlencode() to generate
>> URLs that don't depend on the hash order, for all versions of Python
>> that support PYTHONHASHSEED? It seems a one-line fix:
>>
>> query = query.items()
>>
>> with this:
>>
>> query = sorted(query.items())
>>
>> This would not prevent breakage of unit tests, but it would make a
>> much simpler fix possible: simply sort the parameters in the URL.
>
> I vote -0. The issue can also be addressed with a small and simple
> helper function that wraps urlparse and compares the query parameter. Or
> you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
> application.

Hm. That's actually a good point.

> The order of query string parameter is actually important for some
> applications, for example Zope, colander+deform and other form
> frameworks use the parameter order to group parameters.
>
> Therefore I propose that the query string is only sorted when the query
> is exactly a dict and not some subclass or class that has an items() method.
>
> if type(query) is dict:
> query = sorted(query.items())
> else:
> query = query.items()

That's already in the bug I filed. :-) I also added that the sort may
fail if the keys mix e.g. bytes and str (or int and str, for that
matter).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread MRAB

On 18/08/2012 18:34, Guido van Rossum wrote:

On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes  wrote:

Am 17.08.2012 21:27, schrieb Guido van Rossum:

I wonder if it wouldn't make sense to change urlencode() to generate
URLs that don't depend on the hash order, for all versions of Python
that support PYTHONHASHSEED? It seems a one-line fix:

query = query.items()

with this:

query = sorted(query.items())

This would not prevent breakage of unit tests, but it would make a
much simpler fix possible: simply sort the parameters in the URL.


I vote -0. The issue can also be addressed with a small and simple
helper function that wraps urlparse and compares the query parameter. Or
you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
application.


Hm. That's actually a good point.


The order of query string parameter is actually important for some
applications, for example Zope, colander+deform and other form
frameworks use the parameter order to group parameters.

Therefore I propose that the query string is only sorted when the query
is exactly a dict and not some subclass or class that has an items() method.

if type(query) is dict:
query = sorted(query.items())
else:
query = query.items()


That's already in the bug I filed. :-) I also added that the sort may
fail if the keys mix e.g. bytes and str (or int and str, for that
matter).


One possible way around that is to add the class names, perhaps only if
sorting raises an exception:

def make_key(pair):
return type(pair[0]).__name__, type(pair[1]).__name__, pair

if type(query) is dict:
try:
query = sorted(query.items())
except TypeError:
query = sorted(query.items(), key=make_key)
else:
query = query.items()

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Guido van Rossum
On Saturday, August 18, 2012, MRAB wrote:

> On 18/08/2012 18:34, Guido van Rossum wrote:
>
>> On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes 
>> wrote:
>>
>>> Am 17.08.2012 21:27, schrieb Guido van Rossum:
>>>
 I wonder if it wouldn't make sense to change urlencode() to generate
 URLs that don't depend on the hash order, for all versions of Python
 that support PYTHONHASHSEED? It seems a one-line fix:

 query = query.items()

 with this:

 query = sorted(query.items())

 This would not prevent breakage of unit tests, but it would make a
 much simpler fix possible: simply sort the parameters in the URL.

>>>
>>> I vote -0. The issue can also be addressed with a small and simple
>>> helper function that wraps urlparse and compares the query parameter. Or
>>> you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
>>> application.
>>>
>>
>> Hm. That's actually a good point.
>>
>>  The order of query string parameter is actually important for some
>>> applications, for example Zope, colander+deform and other form
>>> frameworks use the parameter order to group parameters.
>>>
>>> Therefore I propose that the query string is only sorted when the query
>>> is exactly a dict and not some subclass or class that has an items()
>>> method.
>>>
>>> if type(query) is dict:
>>> query = sorted(query.items())
>>> else:
>>> query = query.items()
>>>
>>
>> That's already in the bug I filed. :-) I also added that the sort may
>> fail if the keys mix e.g. bytes and str (or int and str, for that
>> matter).
>>
>>  One possible way around that is to add the class names, perhaps only if
> sorting raises an exception:
>
> def make_key(pair):
> return type(pair[0]).__name__, type(pair[1]).__name__, pair
>
> if type(query) is dict:
> try:
> query = sorted(query.items())
> except TypeError:
> query = sorted(query.items(), key=make_key)
> else:
> query = query.items()


Doesn't strike me as necessary.

>
> __**_
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/**mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> guido%40python.org
>


-- 
Sent from Gmail Mobile
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Glenn Linderman

On 8/18/2012 11:47 AM, MRAB wrote:

I vote -0. The issue can also be addressed with a small and simple
helper function that wraps urlparse and compares the query parameter. Or
you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
application.


Hm. That's actually a good point. 


Seems adequate to me. Most programs wouldn't care about the order, 
because most web frameworks grab whatever is there in whatever order, 
and present it to the web app in their own order.


Programs that care, or which talk to web apps that care, are unlikely to 
want the order from a non-randomized dict, and so have already taken 
care of ordering issues, so undoing the randomization seems like a 
solution in search of a problem (other than for poorly written test cases).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 3.3 str timings

2012-08-18 Thread Terry Reedy
The issue came up in python-list about string operations being slower in 
3.3. (The categorical claim is false as some things are actually 
faster.) Some things I understand, this one I do not.


Win7-64, 3.3.0b2 versus 3.2.3
print(timeit("c in a", "c  = '…'; a = 'a'*1000+c")) # ord(c) = 8230
# .6 in 3.2, 1.2 in 3.3

Why is searching for a two-byte char in a two-bytes per char string so 
much faster in 3.2? Is this worth a tracker issue (I searched and could 
not find one) or is there a known and un-fixable cause?


print(timeit("a.encode()", "a = 'a'*1000"))
# 1.5 in 3.2, .26 in 3.3

print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
# 1.7 in 3.2, .51 in 3.3

This is one of the 3.3 improvements. But since the results are equal:
('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
and 3.3 should know that for an all-ascii string, I do not see why 
adding the parameter should double the the time. Another issue or known 
and un-fixable?


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.3 str timings

2012-08-18 Thread Antoine Pitrou
On Sat, 18 Aug 2012 17:17:14 -0400
Terry Reedy  wrote:
> The issue came up in python-list about string operations being slower in 
> 3.3. (The categorical claim is false as some things are actually 
> faster.) Some things I understand, this one I do not.
> 
> Win7-64, 3.3.0b2 versus 3.2.3
> print(timeit("c in a", "c  = '…'; a = 'a'*1000+c")) # ord(c) = 8230
> # .6 in 3.2, 1.2 in 3.3

I get opposite numbers:

$ python3.2 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
100 loops, best of 3: 0.599 usec per loop
$ python3.3 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
1000 loops, best of 3: 0.119 usec per loop

However, in both cases the operation is blindingly fast (less than
1µs), which should make it pretty much a non-issue.

> Why is searching for a two-byte char in a two-bytes per char string so 
> much faster in 3.2? Is this worth a tracker issue (I searched and could 
> not find one) or is there a known and un-fixable cause?

I don't think it's worth a tracker issue. First, because as said above
it's practically a non-issue. Second, given the nature and depth of
changes brought by the switch to the PEP 393 implementation, an
individual micro-benchmark like this is not very useful; you'd need to
make a more extensive analysis of string performance (as a hint, we
have the stringbench benchmark in the Tools directory).

> This is one of the 3.3 improvements. But since the results are equal:
> ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
> and 3.3 should know that for an all-ascii string, I do not see why 
> adding the parameter should double the the time. Another issue or known 
> and un-fixable?

When observing performance differences, you should ask yourself whether
they matter at all or not.

Regards

Antoine.



-- 
Software development and contracting: http://pro.pitrou.net


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.3 str timings

2012-08-18 Thread martin


Zitat von Terry Reedy :

Is this worth a tracker issue (I searched and could not find one) or  
is there a known and un-fixable cause?


There is a third option: it's not known, but it's also unimportant.
I'd say posting it to python-dev is enough: either there is somebody
with sufficient time and interest to research it and provide you
with an explanation (or a fix). If nobody picks it up right away,
it's IMO fine to wait for somebody to report it who has a real
problem with this change in runtime.

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.3 str timings

2012-08-18 Thread R. David Murray
On Sat, 18 Aug 2012 17:17:14 -0400, Terry Reedy  wrote:
> print(timeit("a.encode()", "a = 'a'*1000"))
> # 1.5 in 3.2, .26 in 3.3
> 
> print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
> # 1.7 in 3.2, .51 in 3.3
> 
> This is one of the 3.3 improvements. But since the results are equal:
> ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
> and 3.3 should know that for an all-ascii string, I do not see why 
> adding the parameter should double the the time. Another issue or known 
> and un-fixable?

At one point there was an issue with certain spellings taking a fast path
(avoiding a codec lookup?) and other spellings not.  I thought we'd fixed
that, but perhaps we didn't?

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.3 str timings

2012-08-18 Thread Terry Reedy

On 8/18/2012 5:27 PM, Antoine Pitrou wrote:

On Sat, 18 Aug 2012 17:17:14 -0400
Terry Reedy  wrote:

The issue came up in python-list about string operations being slower in
3.3. (The categorical claim is false as some things are actually
faster.) Some things I understand, this one I do not.

Win7-64, 3.3.0b2 versus 3.2.3
print(timeit("c in a", "c  = '…'; a = 'a'*1000+c")) # ord(c) = 8230
# .6 in 3.2, 1.2 in 3.3


I get opposite numbers:


Just curious, what system?


$ python3.2 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
100 loops, best of 3: 0.599 usec per loop
$ python3.3 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
1000 loops, best of 3: 0.119 usec per loop

However, in both cases the operation is blindingly fast (less than
1µs), which should make it pretty much a non-issue.


The current default 'number' of 100 is higher that I remember. Good 
to know.



Why is searching for a two-byte char in a two-bytes per char string so
much faster in 3.2? Is this worth a tracker issue (I searched and could
not find one) or is there a known and un-fixable cause?


I don't think it's worth a tracker issue. First, because as said above
it's practically a non-issue. Second, given the nature and depth of
changes brought by the switch to the PEP 393 implementation, an
individual micro-benchmark like this is not very useful; you'd need to
make a more extensive analysis of string performance (as a hint, we
have the stringbench benchmark in the Tools directory).


It is not in my 3.3.0b2 windows install, but I have heard of it. Another 
good reminder. My main interest was in refuting '3.3 strings ops are 
always slower'. Both points above are also good 'ammo'. I am sure this 
discussion will re-occur after the release.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com