[issue30483] urllib.parse.parse_qsl does not handle unicode data properly

2020-04-16 Thread Dmitry Tsirkov

Dmitry Tsirkov  added the comment:

I have recently stumbled upon this bug, and I can present the example and a 
solution I've used.
The issue happens when we try to parse x-www-form-urlencoded of type bytes:
```
>>> from urllib.parse import urlencode, parse_qs
>>> urlencode([('v', 'ö')])
'v=%C3%B6'
>>> parse_qs('v=%C3%B6')
{'v': ['ö']}
>>> parse_qs(b'v=%C3%B6')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.6/urllib/parse.py", line 669, in parse_qs
encoding=encoding, errors=errors)
  File "/usr/lib64/python3.6/urllib/parse.py", line 722, in parse_qsl
value = _coerce_result(value)
  File "/usr/lib64/python3.6/urllib/parse.py", line 103, in _encode_result
return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 0: 
ordinal not in range(128)
```
This happens in the parse_qsl function because _coerce_result is a synonym of 
_encode_result and is called with default parameter encoding='ascii'. As far as 
I understand, it should be called with the encoding parameter of the parse_qsl 
function:
```
742c742
< name = _coerce_result(name)
---
> name = _coerce_result(name, encoding=encoding, errors=errors)
745c745
< value = _coerce_result(value)
---
> value = _coerce_result(value, encoding=encoding, errors=errors)
```
I am not sure whether I should commit this to the repo and create a pull 
request, as described in the devguide.

--
nosy: +cyrkov

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30483] urllib.parse.parse_qsl does not handle unicode data properly

2018-05-29 Thread Cheryl Sabella


Cheryl Sabella  added the comment:

Would you be able to include an example for recreating this?

Looking at the code, it uses the ascii encoding for bytes (which can only 
contain ASCII literal characters) and should not be using that encoding for 
strings.

Thanks!

--
nosy: +cheryl.sabella
stage:  -> test needed
type:  -> behavior
versions: +Python 3.7, Python 3.8 -Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30483] urllib.parse.parse_qsl does not handle unicode data properly

2017-05-26 Thread Abhilash Raj

New submission from Abhilash Raj:

After decoding percentage encoded `name` and `values` in the query string, it 
tries to `_coerce_result` or encode the result to ascii (which is the value of 
_implicit_encoding).

```
  File "/usr/lib/python3.6/urllib/parse.py", line 691, in parse_qsl
value = _coerce_result(value)
  File "/usr/lib/python3.6/urllib/parse.py", line 95, in _encode_result
return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: 
ordinal not in range(128)
```

As seen in the partial traceback above, it breaks things when trying to parse 
unicode encode query string values.

--
messages: 294541
nosy: maxking
priority: normal
severity: normal
status: open
title: urllib.parse.parse_qsl does not handle unicode data properly
versions: Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com