[issue44439] PickleBuffer doesn't have __len__ method

2021-06-17 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +25350 stage: -> patch review pull_request: https://github.com/python/cpython/pull/26764 ___ Python tracker <https://bugs.python.org/issu

[issue44439] PickleBuffer doesn't have __len__ method

2021-06-17 Thread Ma Lin
Ma Lin added the comment: Ok, I'm working on a PR. -- ___ Python tracker <https://bugs.python.org/issue44439> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue44439] PickleBuffer doesn't have __len__ method

2021-06-16 Thread Ma Lin
New submission from Ma Lin : If run this code, it will raise an exception: import pickle import lzma import pandas as pd with lzma.open("test.xz", "wb") as file: pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5) The exception: Tr

[issue44134] lzma: stream padding in xz files

2021-05-15 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue44134> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue43650] MemoryError on zip.read in shutil._unpack_zipfile

2021-05-15 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43650> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin
Ma Lin added the comment: Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution. Please imagine this scenario: - before the patch - in 64-bit build - use zlib.decompress() function - the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB) If set the `b

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +24779 pull_request: https://github.com/python/cpython/pull/26143 ___ Python tracker <https://bugs.python.org/issue41

[issue44114] Incorrect function signatures in dictobject.c

2021-05-12 Thread Ma Lin
Change by Ma Lin : -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue44114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-05-10 Thread Ma Lin
Ma Lin added the comment: Erlend, please take a look at this bug. -- ___ Python tracker <https://bugs.python.org/issue33376> ___ ___ Python-bugs-list mailin

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin
Ma Lin added the comment: Found a backward incompatible behavior. Before the patch, in 64-bit build, zlib module allows the initial size > UINT32_MAX. It creates a bytes object, and uses a sliding window to deal with the UINT32_MAX limit: https://github.com/python/cpython/blob/v3.

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +24429 pull_request: https://github.com/python/cpython/pull/25738 ___ Python tracker <https://bugs.python.org/issue41

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-28 Thread Ma Lin
Ma Lin added the comment: Thanks for reviewing this big patch. Your review makes the code better. -- ___ Python tracker <https://bugs.python.org/issue41

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-27 Thread Ma Lin
Ma Lin added the comment: The above changes were made in this commit: split core code and wrappers 55705f6dc28ff4dc6183e0eb57312c885d19090a After that commit, there is a new commit, it resolves the code conflicts introduced by PR 22126 one hour ago. Merge branch 'master

[issue41735] Thread locks in zlib module may go wrong in rare case

2021-04-27 Thread Ma Lin
Ma Lin added the comment: Thanks for review. -- ___ Python tracker <https://bugs.python.org/issue41735> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-26 Thread Ma Lin
Ma Lin added the comment: Very sorry for update at the last moment. But after the update, we should no need to touch it in the future, so I think it's worthy. Please review the last commit in PR 21740, the previous commits have not been changed. IMO if use a Git client such as TortoiseGit

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-25 Thread Ma Lin
Ma Lin added the comment: > The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put > the core code together, these defines can be put in a thin wrapper in > _bz2module.c/_lzmamodule.c/zlibmodule.c files. I tried, it looks well. I will updated the PR within o

[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.

2021-04-12 Thread Ma Lin
Ma Lin added the comment: I think this change is safe. The behaviors should be exactly the same, except the iterators are different objects (obj vs obj._buffer). -- ___ Python tracker <https://bugs.python.org/issue43

[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.

2021-04-12 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43787> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-11 Thread Ma Lin
Ma Lin added the comment: > I don't really _like_ that this is a .h file acting as a C template to inject > effectively the same static code into each module that wants to use it... > Which I think is the concern Victor is expressing in a comment above. I think so too. Th

[issue43785] Remove RLock from BZ2File

2021-04-09 Thread Ma Lin
Ma Lin added the comment: This change is backwards incompatible, it may break some code silently. If someone really needs better performance, they can write a BZ2File class without RLock by themselves, it should be easy. FYI, zlib module was added in 1997, bz2 module was added in 2002, lzma

[issue43785] bz2 performance issue.

2021-04-09 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43785> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-05 Thread Ma Lin
Ma Lin added the comment: ping -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-26 Thread Ma Lin
Ma Lin added the comment: Close as invalid. They the same effect: PyErr_GivenExceptionMatches(v, PyExc_BlockingIOError)) PyErr_GivenExceptionMatches(t, PyExc_BlockingIOError)) -- resolution: -> wont fix stage: -> resolved status: open -&g

[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-26 Thread Ma Lin
Ma Lin added the comment: I am trying to write a test-case. -- ___ Python tracker <https://bugs.python.org/issue43305> ___ ___ Python-bugs-list mailin

[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-23 Thread Ma Lin
New submission from Ma Lin : 654PyErr_Fetch(, , ); 655if (v == NULL || !PyErr_GivenExceptionMatches(v, PyExc_BlockingIOError)) { ↑ this should be t https://github.com/python/cpython/blob/v3.10.0a5/Modules/_io/bufferedio.c#L654-L655

[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-02-23 Thread Ma Lin
Change by Ma Lin : -- nosy: +erlendaasland ___ Python tracker <https://bugs.python.org/issue33376> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43027] Calling _PyBytes_Resize() on 1-byte bytes may raise error

2021-01-25 Thread Ma Lin
New submission from Ma Lin : PyBytes_FromStringAndSize() uses a global cache for 1-byte bytes: https://github.com/python/cpython/blob/v3.10.0a4/Objects/bytesobject.c#L147 if (size == 1 && str != NULL) { struct _Py_bytes_state *state = get_bytes_state(); op

[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin
Ma Lin added the comment: Found a new issue, can be combined with this issue. -- stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/i

[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +23149 stage: -> patch review pull_request: https://github.com/python/cpython/pull/24330 ___ Python tracker <https://bugs.python.org/issu

[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin
New submission from Ma Lin : Above code already cover this check: if (Py_SIZE(v) == newsize) { /* return early if newsize equals to v->ob_size */ return 0; } if (Py_SIZE(v) == 0) { - if (newsize == 0) { - return 0; - }

[issue42550] re库匹配问题

2020-12-02 Thread Ma Lin
Ma Lin added the comment: This issue can be closed. '0x' 2 'd26935a5ee4cd542e8a3a7e74fb7a99855975b59' 40 '\n' 1 2+40+1 = 43 -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue42

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-17 Thread Ma Lin
Ma Lin added the comment: Last benchmark was wrong, \Ob3 option was not enabled. Apply `pgo_ob3.diff`, it slows, so I close this issue. +-++--+ | Benchmark | py39_pgo_a | py39_pgo_b

[issue42369] Reading ZipFile not thread-safe

2020-11-16 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue42369> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin
Ma Lin added the comment: In PGO build, the improvement is not much. (3.9 branch, with PGO, build.bat -p X64 --pgo) +-+--+--+ | Benchmark | baseline-pgo | ob3-pgo

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin
Ma Lin added the comment: > Could you please try again with PGO? Please wait. BTW, this option was advised in another project. In that project, even enable `\Ob3`, it still slower than GCC 9 build. If you are interested, see: https://github.com/facebook/zstd/issues/2

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin
New submission from Ma Lin : MSVC2019 has a new option `/Ob3`, it specifies more aggressive inlining than /Ob2: https://docs.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-160 If use this option in MSVC2017, it will emit a warning: cl : Command line warning

[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-10 Thread Ma Lin
Ma Lin added the comment: > I do not think that this is suitable for newcomers because you need to have > deep understanding why it was written in such form at first place and what > will be changed if you change it. I agree contributors need to understand code, rather than simpl

[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-10 Thread Ma Lin
Ma Lin added the comment: > What is the problem exactly? There are several different problems, such as: https://github.com/python/cpython/blob/v3.10.0a2/Modules/mathmodule.c#L2033 In addition, `utf16_decode` also has this problem, I forgot this: https://github.com/python/cpython/b

[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-09 Thread Ma Lin
New submission from Ma Lin : C type `long` is 4-byte integer in 64-bit Windows build (MSVC behavior). [1] In other compilers, `long` is 8-byte integer in 64-bit build. This leads to a bit unnecessary performance waste, issue38252 fixed this problem in a situation. Search `SIZEOF_LONG

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-10-28 Thread Ma Lin
Ma Lin added the comment: I modify lzma module to use different growth factors, see attached picture different_factors.png 1.5x should be the growth factor of _PyBytesWriter under Windows. So if change _PyBytesWriter to use memory blocks, maybe there will be no performance improvement

[issue38252] Use 8-byte step to detect ASCII sequence in 64bit Windows builds

2020-10-16 Thread Ma Lin
Ma Lin added the comment: Although the improvement is not great, it's a very hot code path. Could you review the PR? -- components: +Windows nosy: +paul.moore, tim.golden ___ Python tracker <https://bugs.python.org/issue38

[issue41735] Thread locks in zlib module may go wrong in rare case

2020-09-07 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +21213 pull_request: https://github.com/python/cpython/pull/22132 ___ Python tracker <https://bugs.python.org/issue41

[issue41735] Thread locks in zlib module may go wrong in rare case

2020-09-07 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +21211 pull_request: https://github.com/python/cpython/pull/22130 ___ Python tracker <https://bugs.python.org/issue41

[issue41735] Thread locks in zlib module may go wrong in rare case

2020-09-06 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +21208 stage: -> patch review pull_request: https://github.com/python/cpython/pull/22126 ___ Python tracker <https://bugs.python.org/issu

[issue41735] Thread locks in zlib module may go wrong in rare case

2020-09-06 Thread Ma Lin
New submission from Ma Lin : The code in zlib module: self->zst.next_in = data->buf; // set next_in ... ENTER_ZLIB(self); // acquire thread lock `self->zst` is a `z_stream` struct defined in zlib, used to record states of a compress/decompress stream: typed

[issue37095] [Feature Request]: Add zstd support in tarfile

2020-08-29 Thread Ma Lin
Ma Lin added the comment: I have spent two weeks, almost complete the code, a preview: https://github.com/animalize/cpython/pull/8/files Write directly for stdlib, since there are already zstd modules on pypi. In addition, the API of zstd is simple, not as complicated as lzma. Can also use

[issue35228] Index search in CHM help crashes viewer

2020-08-28 Thread Ma Lin
Ma Lin added the comment: > when I delete the file %APPDATA%\Microsoft\HTML Help\hh.dat, > the problem seems to go away. It doesn't work for me. Moreover, `Binary Index=Yes` no longer works on my PC. A few days ago, I installed a clean Windows 10 2004, then CHM's index cannot be

[issue35228] Index search in CHM help crashes viewer

2020-08-27 Thread Ma Lin
Ma Lin added the comment: > More realistically, including the docs as unbundled HTML files > and relying on the default browser is probably an all-around better idea. CHM's index function is very convenient, I almost always use this feature when I use CHM. How about use tkinter to

[issue37095] [Feature Request]: Add zstd support in tarfile

2020-08-15 Thread Ma Lin
Ma Lin added the comment: There are two zstd modules on pypi: https://pypi.org/project/zstd/ https://pypi.org/project/zstandard/ The first one is too simple. The second one is powerful, but has too many APIs: ZstdCompressorIterator ZstdDecompressorIterator

[issue41555] re.sub replaces twice

2020-08-15 Thread Ma Lin
Ma Lin added the comment: There can be at most one empty match at a position. IIRC, Perl's regex engine has very similar behavior. If don't want empty match, use + is fine. -- ___ Python tracker <https://bugs.python.org/issue41

[issue41555] re.sub replaces twice

2020-08-14 Thread Ma Lin
Ma Lin added the comment: The re.sub() doc said: Changed in version 3.7: Empty matches for the pattern are replaced when adjacent to a previous non-empty match. IMO 3.7+ behavior is more reasonable, and it fixed a bug, see issue25054. -- nosy: +malin

[issue41265] lzma/bz2 module: inefficient buffer growth algorithm

2020-08-05 Thread Ma Lin
Ma Lin added the comment: A more thorough solution was used, see issue41486. So I close this issue. -- stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/i

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +20886 stage: -> patch review pull_request: https://github.com/python/cpython/pull/21740 ___ Python tracker <https://bugs.python.org/issu

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49368/benchmark_real.py ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailin

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49367/benchmark.py ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailin

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49365/0to200MB_step2MB.png ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list m

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49366/0to20MB_step64KB.png ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list m

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file49364/0to2GB_step30MB.png ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list m

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-08-05 Thread Ma Lin
New submission from Ma Lin :  bz2/lzma module's current growth algorithm bz2/lzma module's initial output buffer size is 8KB [1][2], and they are using this output buffer growth algorithm [3][4]: newsize = size + (size >> 3) + 6 [1] https://github.com/python/cpython/blob/v3

[issue41330] Inefficient error-handle for CJK encodings

2020-08-03 Thread Ma Lin
Ma Lin added the comment: I'm working on issue41265. If nothing happens, I also would like to write a zstd module for stdlib before the end of the year, but I dare not promise this. If anyone wants to work on this issue, very grateful

[issue41452] Inefficient BufferedReader.read(-1)

2020-08-01 Thread Ma Lin
Ma Lin added the comment: Some underlying stream has fast-path for .readall(). So close this issue. -- stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/i

[issue41330] Inefficient error-handle for CJK encodings

2020-07-31 Thread Ma Lin
Ma Lin added the comment: At least fix this bug: the error-handler object is not cached, it needs to be looked up from a dict every time, which is very inefficient. The code: https://github.com/python/cpython/blob/v3.9.0b4/Modules/cjkcodecs/multibytecodec.c#L81-L98 I will submit

[issue41452] Inefficient BufferedReader.read(-1)

2020-07-31 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +20842 stage: -> patch review pull_request: https://github.com/python/cpython/pull/21698 ___ Python tracker <https://bugs.python.org/issu

[issue41452] Inefficient BufferedReader.read(-1)

2020-07-31 Thread Ma Lin
New submission from Ma Lin : BufferedReader's constructor has a `buffer_size` parameter, it's the size of this buffer: When reading data from BufferedReader object, a larger amount of data may be requested from the underlying raw stream, and kept in an internal buffer

[issue41265] lzma/bz2 module: inefficient buffer growth algorithm

2020-07-22 Thread Ma Lin
Ma Lin added the comment: I'm working on a patch. lzma decompressing speed increases: baseline: 0.275722 sec patched: 0.140405 sec (Uncompressed data size 52.57 MB) The new algorithm looks like this: #define INITIAL_BUFFER_SIZE (16*1024) static inline Py_ssize_t get_newsize

[issue41330] Inefficient error-handle for CJK encodings

2020-07-18 Thread Ma Lin
Ma Lin added the comment: > But how many new Python web application use CJK codec instead of UTF-8? A CJK character usually takes 2-bytes in CJK encodings, but takes 3-bytes in UTF-8. I tested a Chinese book: in GBK: 853,025 bytes in UTF-8: 1,267,523 bytes For CJK content, UT

[issue41330] Inefficient error-handle for CJK encodings

2020-07-18 Thread Ma Lin
Ma Lin added the comment: IMO "xmlcharrefreplace" is useful for Web application. For example, the page's charset is "gbk", then this statement can generate the bytes content easily & safely: s.encode('gbk', 'xmlcharrefreplace') Maybe some HTML-related framework

[issue41330] Inefficient error-handle for CJK encodings

2020-07-17 Thread Ma Lin
New submission from Ma Lin : CJK encode/decode functions only have three error-handler fast-paths: replace ignore strict See the code: [1][2] If use other built-in error-handlers, need to get the error-handler object, and call it with an Unicode Exception argument. See the code

[issue37095] [Feature Request]: Add zstd support in tarfile

2020-07-14 Thread Ma Lin
Ma Lin added the comment: > Add zstd support in tarfile This requires the stdlib to contain a Zstandard module. You can ask in the Idea forum: https://discuss.python.org/c/ideas -- nosy: +malin ___ Python tracker <https://bugs.pyth

[issue41210] Docs: More description of reason about LZMA1 data handling with FORMAT_ALONE

2020-07-13 Thread Ma Lin
Ma Lin added the comment: It is better to raise a warning when using problematic combination. But IMO either "raising a warning" or "adding more description to doc" is too dependent on the implementation detail of liblzma. -- __

[issue41265] lzma/bz2 module: inefficient buffer growth algorithm

2020-07-10 Thread Ma Lin
Ma Lin added the comment: Maybe the zlib module can also use the same algorithm. zlib module's initial buffer size is 16KB [1], each time the size doubles [2]. [1] zlib module's initial buffer size: https://github.com/python/cpython/blob/v3.9.0b4/Modules/zlibmodule.c#L32 [2] zlib module

[issue41265] lzma/bz2 module: inefficient buffer growth algorithm

2020-07-10 Thread Ma Lin
New submission from Ma Lin : lzma/bz2 modules are using the same buffer growth algorithm: [1][2] newsize = size + (size >> 3) + 6; lzma/bz2 modules' default output buffer is 8192 bytes [3][4], so the growth step is below. For many cases, maybe the buffer is resized too many

[issue41210] LZMADecompressor.decompress(FORMAT_RAW) truncate output when input is paticular LZMA+BCJ data

2020-07-07 Thread Ma Lin
Ma Lin added the comment: There was a similar issue (issue21872). When decompressing a lzma.FORMAT_ALONE format data, and it doesn't have the end marker (but has the correct "Uncompressed Size" in the .lzma header), sometimes the last one to dozens bytes can't be output. issue2

[issue41210] LZMADecompressor.decompress(FORMAT_RAW) truncate output when input is paticular LZMA+BCJ data

2020-07-06 Thread Ma Lin
Ma Lin added the comment: The docs[1] said: Compression filters: FILTER_LZMA1 (for use with FORMAT_ALONE) FILTER_LZMA2 (for use with FORMAT_XZ and FORMAT_RAW) But your code uses a combination of `FILTER_LZMA1` and `FORMAT_RAW`, is this ok? [1] https

[issue41210] LZMADecompressor.decompress(FORMAT_RAW) truncate output when input is paticular LZMA+BCJ data

2020-07-05 Thread Ma Lin
Change by Ma Lin : -- components: +Library (Lib) -Extension Modules nosy: +malin ___ Python tracker <https://bugs.python.org/issue41210> ___ ___ Python-bug

[issue35859] Capture behavior depends on the order of an alternation

2020-06-29 Thread Ma Lin
Ma Lin added the comment: Do I need to write a detailed review guide? I suppose that after reading it from beginning to end, it will be easy to understand PR 12427, no need to read anything else. Or plan to replace the sre module with the regex module in a future version

[issue40980] group names of bytes regexes are strings

2020-06-16 Thread Ma Lin
Ma Lin added the comment: Why you always want to use "utf-8" encoded identifier as group name in `bytes` pattern. The direction is: a group name written in `bytes` pattern, and will convert to `str. Not this direction: `str` group name -(utf8)-> `bytes` pattern -> `

[issue40980] group names of bytes regexes are strings

2020-06-16 Thread Ma Lin
Ma Lin added the comment: Please look at these: >>> orig_name = "Ř" >>> orig_ch = orig_name.encode("cp1250") # Because why not? >>> orig_ch b'\xd8' >>> name = list(re.match(b"(?P<" + orig_ch + b">

[issue40980] group names of bytes regexes are strings

2020-06-16 Thread Ma Lin
Ma Lin added the comment: > this limitation to the latin-1 subset is not compatible with the > documentation, which says that valid Python identifiers are valid group names. Not all latin-1 characters are valid identifier, for example: >>> '\x94'.encode('lati

[issue40980] group names of bytes regexes are strings

2020-06-16 Thread Ma Lin
Ma Lin added the comment: It seems you don't know some knowledge of encoding yet. Naturally, `bytes` cannot contain character which Unicode code point is greater than \u00ff. So you can only use "latin1" encoding, which map from character to byte (or reverse) directly. "

[issue40980] group names of bytes regexes are strings

2020-06-16 Thread Ma Lin
Ma Lin added the comment: In this case, you can only use 'latin1', which directly map one character (\u-\u00FF) to/from one byte. If use 'utf-8', it may map one character to multiple bytes, such as 'Δ' -> b'\xce\x94' '\x94' is an invalid identifier, it will raise an er

[issue40980] group names of bytes regexes are strings

2020-06-16 Thread Ma Lin
Ma Lin added the comment: `latin1` is the character set that Unicode code point from \u to \u00ff, and the characters are directly mapped from/to bytes. So b'\xe9' is mapped to \u00e9, it is `é`. Of course, characters with Unicode code point greater than 0xff are impossible to appear

[issue40980] group names of bytes regexes are strings

2020-06-16 Thread Ma Lin
Ma Lin added the comment: > a non-ascii group name will raise an error in bytes, even if encoded Looks like this is a language limitation: >>> b'é' File "", line 1 SyntaxError: bytes can only contain ASCII literal characters. No problem if you

[issue40980] group names of bytes regexes are strings

2020-06-15 Thread Ma Lin
Ma Lin added the comment: Group name is `str` is very reasonable. Essentially it is just a name, it has nothing to do with `bytes`. Other names in Python are also `str` type, such as codec names, hashlib names. -- nosy: +Ma Lin ___ Python tracker

[issue29242] Crash on GC when compiling PyPy

2020-06-09 Thread Ma Lin
Ma Lin added the comment: I suggest not to close this issue, this is an opportunity to investigate whether Python3 has this problem as well. -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue29

[issue40861] On Windows, liblzma is always built without optimization

2020-06-06 Thread Ma Lin
Ma Lin added the comment: Good catch. You can submit a PR to fix this. If you start from zero and do it slowly, it will take about a week or two. -- components: +Windows -Build nosy: +Ma Lin, paul.moore, steve.dower, tim.golden, zach.ware

[issue40859] Update Windows build to use xz-5.2.5

2020-06-03 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +19847 stage: -> patch review pull_request: https://github.com/python/cpython/pull/20622 ___ Python tracker <https://bugs.python.org/issu

[issue40859] Update Windows build to use xz-5.2.5

2020-06-03 Thread Ma Lin
New submission from Ma Lin : The Windows build is using xz-5.2.2, it was released on 2015-09-29. xz-5.2.5 was released recently, maybe we can update this library. When preparing cpython-source-deps, don't forget to copy `xz-5.2.5\windows\vs2019\config.h` to `xz-5.2.5\windows\` folder

[issue35859] Capture behavior depends on the order of an alternation

2020-05-31 Thread Ma Lin
Ma Lin added the comment: Is there hope to merge to 3.9 branch? -- ___ Python tracker <https://bugs.python.org/issue35859> ___ ___ Python-bugs-list mailin

[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-02 Thread Ma Lin
Ma Lin added the comment: I did a git bisect, this commit fixed the bug: https://github.com/python/cpython/commit/ac22f6aa989f18c33c12615af1c66c73cf75d5e7 -- ___ Python tracker <https://bugs.python.org/issue40

[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-02 Thread Ma Lin
Ma Lin added the comment: On Windows 10, Python 3.7, I get the same message as above reply. If use Python 3.8, it works well. -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue40

[issue40060] socket.TCP_NOTSENT_LOWAT is missing in official macOS builds

2020-04-07 Thread Ma Lin
Ma Lin added the comment: It seems that people usually use the socket module like this, I think it's safe to respect this habit: if hasattr(socket, "FLAG_NAME"): do_something If use PR19402, your program will have problem on the older version system, not only &q

[issue40060] socket.TCP_NOTSENT_LOWAT is missing in official macOS builds

2020-04-07 Thread Ma Lin
Ma Lin added the comment: Windows build encountered a similar problem, see issue32394. The solution is to check the runtime system version when importing socket module, if it is an older system, delete the constants. [1] issue32394 has a small script (winsdk_watchdog.py) to help find

[issue39974] A race condition with GIL releasing exists in stringlib_bytes_join

2020-03-16 Thread Ma Lin
Ma Lin added the comment: I also planned to review this commit at some moment, I feel a bit unsteady about it. If an optimization needs to be fine-tuned, and may introduces some pitfalls for future code maintenance, IMHO it is best to avoid doing this kind of optimization. -- nosy

[issue39033] zipimport raises NameError: name '_boostrap_external' is not defined

2019-12-12 Thread Ma Lin
Ma Lin added the comment: Is it possible to scan stdlib to find similar bugs? -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue39033> ___ ___

[issue37527] Timestamp conversion on windows fails with timestamps close to EPOCH

2019-11-01 Thread Ma Lin
Ma Lin added the comment: issue29097 fixed bug in `datetime.fromtimestamp()`. But this issue is about `datetime.timestamp()`, not fixed yet. -- ___ Python tracker <https://bugs.python.org/issue37

[issue23692] Undocumented feature prevents re module from finding certain matches

2019-10-27 Thread Ma Lin
Change by Ma Lin : -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue23692> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue38582] re: backreference number in replace string can't >= 100

2019-10-25 Thread Ma Lin
Ma Lin added the comment: > I'd still retain \0 as a special case, since it really is useful. Yes, maybe \0 is used widely, I didn't think of it. Changing is troublesome, let's keep it as is. -- ___ Python tracker <https://bugs.pyth

[issue38582] re: backreference number in replace string can't >= 100

2019-10-25 Thread Ma Lin
Ma Lin added the comment: Octal escape: \oooCharacter with octal value ooo As in Standard C, up to three octal digits are accepted. It only accepts UCS1 characters (ooo <= 0o377): >>> ord('\377') 255 >>> len('\378') 2 >>>

[issue38582] re: backreference number in replace string can't >= 100

2019-10-24 Thread Ma Lin
Ma Lin added the comment: @veaba Post only in English is fine. > Is this actually needed? Maybe very very few people dynamically generate some large patterns. > However, \g<...> is not accepted in a pattern. > in the "regex" module I added support for it in a patter

  1   2   3   4   >