[issue42453] utf-8 codec error when pip uninstalling a package which has files containing unicode filename on Windows

2020-11-24 Thread

赵豪杰 <1292756...@qq.com> added the comment:

got it.

--

___
Python tracker 
<https://bugs.python.org/issue42453>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42453] utf-8 codec error when pip uninstalling a package which has files containing unicode filename on Windows

2020-11-24 Thread

New submission from 赵豪杰 <1292756...@qq.com>:

When using `pip install package_name` installing a package, it will generate a 
`installed-files.txt` file, which records the file that the package contains. 

When updating or uninstalling the package, pip will need to read the 
`installed-files.txt` file, then delete the old files. 

If the package installed contains files whose name has unicode character like 
`文件`, the problem will occur. 

In China (I don't know other places), for historical reasons, the Windows 
default system codec is `gbk`, so the `installed-files.txt` file is also 
written with `gbk` codec when installing a package. 

When it comes to updating or uninstalling, the pip will use `utf-8` codec to 
read the `installed-files.txt` file. Since the file contains non ascii 
characters, it went error: 

```
  File 
"d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pkg_resources\__init__.py",
 line 1424, in get_metadata
return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 343: 
invalid start byte in installed-files.txt file at path: 
d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\Markdown_Toolbox-0.0.8-py3.9.egg-info\installed-files.txt
```

I hate that default `gbk` system codec, but this set is fixed on Windows. 

So, my suggestion is, make a `try except` at the error point, if the `utf-8` 
codec went wrong reading `installed-files.txt`, then let `gbk` codec have a go. 

Or, more foundamental solution is, when pip writing text files, strictly use 
`utf-8` codec instead of the default system codec.

--
components: Windows
messages: 381753
nosy: HaujetZhao, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: utf-8 codec error when pip uninstalling a package which has files 
containing unicode filename on Windows
type: crash
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue42453>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42448] re.findall have different match result against re.search or re.sub

2020-11-23 Thread

赵豪杰 <1292756...@qq.com> added the comment:

AhAh, got it, I misunderstood the usage, the findall returns tuple of groups 
the expression set. Thanks @serhiy.storchaka @rhettinger

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue42448>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42448] re.findall have different match result against re.search or re.sub

2020-11-23 Thread

New submission from 赵豪杰 <1292756...@qq.com>:

```
>>> import re
>>> text = '121212 and 121212'
>>> pattern = '(12)+'
>>> print(re.findall(pattern, text))
['12', '12']
>>> 
>>> 
>>> print(re.search(pattern, text))

>>> 
>>> 
>>> print(re.sub(pattern, '', text))
 and 
# The re.findall have different search result against re.search or re.sub
# re.findall 和 re.search 、 re.sub 的匹配结果不相同
```

With same pattern and string, the re.findall is supposed to have same match 
with re.search, but it didn't. 

This result is from python3.8.5

--
components: Regular Expressions
messages: 381689
nosy: HaujetZhao, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.findall have different match result against re.search or re.sub
type: behavior
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue42448>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com