[issue47248] Possible slowdown of regex searching in 3.11

2022-04-08 Thread Ma Lin


Ma Lin  added the comment:

> Possibly related to the new atomic grouping support from GH-31982?

It seems not likely.
I will do some benchmarks for this issue, more information (version/platform) 
is welcome.

--

___
Python tracker 
<https://bugs.python.org/issue47248>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).

2022-04-08 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +30437
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32411

___
Python tracker 
<https://bugs.python.org/issue47256>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).

2022-04-08 Thread Ma Lin

New submission from Ma Lin :

These changes reduce sizeof(match_context):
- 32-bit build: 36 bytes, no change.
- 64-bit build: 72 bytes -> 56 bytes.

sre uses stack and `match_context` struct to simulate recursive call, smaller 
struct brings:
- deeper recursive call
- less memory consume
- less memory realloc

Here is a test, if limit the stack size to 1 GiB, the max available value of n 
is:

re.match(r'(ab)*', n * 'ab')   # need to save MARKs
72 bytes: n = 11,184,808
64 bytes: n = 12,201,609
56 bytes: n = 13,421,770

re.match(r'(?:ab)*', n * 'ab') # no need to save MARKs
72 bytes: n = 13,421,770
64 bytes: n = 14,913,078
56 bytes: n = 16,777,213

1,073,741,823 capturing groups should enough for almost all users.
If limit it to 16,383 (2-byte integer), the context size may reduce more. But 
maybe some patterns generated by program will have more than this number of 
capturing groups.

1️⃣Performance:

Before
regex_dna: Mean +- std dev: 149 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.22 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.3 ms +- 0.1 ms
my benchmark[1]: 13.9 sec +- 0.0 sec

Commit 1. limit the maximum capture group to 1,073,741,823
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.16 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.3 ms +- 0.1 ms
my benchmark: 13.8 sec +- 0.0 sec

Commit 2. further reduce sizeof(SRE(match_context))
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.16 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.2 ms +- 0.1 ms
my benchmark: 13.8 sec +- 0.1 sec

If further change the types of toplevel/jump from int to char, in 32-bit build 
sizeof(match_context) will be reduced from 36 to 32 (In 64-bit build still 56). 
But it's slower on 64-bit build, so I didn't adopt it:
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.18 ms +- 0.01 ms
regex_v8: Mean +- std dev: 22.4 ms +- 0.1 ms
my benchmark: 14.1 sec +- 0.0 sec

2️⃣ The type of match_context.count is Py_ssize_t
- If change it to 4-byte integer, need to modify some engine code.
- If keep it as Py_ssize_t, SRE_MAXREPEAT may >= 4 GiB in future versions.  
  Currently SRE_MAXREPEAT can't >= 4 GiB.
So the type of match_context.count is unchanged.

[1] My re benchmark, it uses 16 patterns to process 100 MiB text data:
https://github.com/animalize/re_benchmarks

--
components: Library (Lib)
messages: 416960
nosy: ezio.melotti, malin, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
status: open
title: re: limit the maximum capturing group to 1,073,741,823, reduce 
sizeof(match_context).
type: resource usage
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue47256>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47248] Possible slowdown of regex searching in 3.11

2022-04-07 Thread Ma Lin


Ma Lin  added the comment:

Could you give the two versions? I will do a git bisect.

I tested 356997c~1 and 356997c [1], msvc2022 non-pgo release build:

# regex_dna ###
an +- std dev: 151 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x slower
t significant

# regex_effbot ###
an +- std dev: 2.47 ms +- 0.01 ms -> 2.46 ms +- 0.02 ms: 1.00x faster
t significant

# regex_v8 ###
an +- std dev: 21.7 ms +- 0.1 ms -> 22.4 ms +- 0.1 ms: 1.03x slower
gnificant (t=-30.82)

https://github.com/python/cpython/commit/35699721a3391175d20e9ef03d434675b496

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue47248>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47211] Remove re.template() and re.TEMPLATE

2022-04-06 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue47211>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Ma Lin


Ma Lin  added the comment:

> cryptic name

In very early versions, "mark" was called register/region.
https://github.com/python/cpython/blob/v1.0.1/Modules/regexpr.h#L48-L52

If span is accessed repeatedly, it's faster than Match.span().
Maybe consider renaming it, and make it as public attribute.

--

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Ma Lin


Ma Lin  added the comment:

Match.regs is an undocumented attribute, it seems it has existed since 1991. 
Can it be removed?

https://github.com/python/cpython/blob/ff2cf1d7d5fb25224f3ff2e0c678d36f78e1f3cb/Modules/_sre/sre.c#L2871

--

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-04-03 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30344
pull_request: https://github.com/python/cpython/pull/32283

___
Python tracker 
<https://bugs.python.org/issue23689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread Ma Lin


Ma Lin  added the comment:

In `Modules` folder, there are _sre.c/sre.h/sre_constants.h/sre_lib.h files. 
Will them be put into a folder?

--

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47199] multiprocessing: micro-optimize Connection.send_bytes() method

2022-04-02 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +30318
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32247

___
Python tracker 
<https://bugs.python.org/issue47199>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47199] multiprocessing: micro-optimize Connection.send_bytes() method

2022-04-02 Thread Ma Lin


New submission from Ma Lin :

`bytes(m)` can be replaced by memoryview.cast('B'), then no need for data 
copying.

m = memoryview(buf)
# HACK for byte-indexing of non-bytewise buffers (e.g. array.array)
if m.itemsize > 1:
m = memoryview(bytes(m))
n = len(m)

https://github.com/python/cpython/blob/v3.11.0a6/Lib/multiprocessing/connection.py#L190-L194

--
components: Library (Lib)
messages: 416538
nosy: malin
priority: normal
severity: normal
status: open
title: multiprocessing: micro-optimize Connection.send_bytes() method
type: resource usage
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue47199>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-31 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30298
pull_request: https://github.com/python/cpython/pull/32223

___
Python tracker 
<https://bugs.python.org/issue23689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-30 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30266
pull_request: https://github.com/python/cpython/pull/32188

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-30 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30265
pull_request: https://github.com/python/cpython/pull/32188

___
Python tracker 
<https://bugs.python.org/issue23689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-29 Thread Ma Lin


Ma Lin  added the comment:

Please don't merge too close to the 3.11 beta1 release date, I'll submit PRs 
after this merged.

--

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-29 Thread Ma Lin


Ma Lin  added the comment:

My PR methods are suboptimal, so I closed them.

The number of REPEAT can be counted when compiling a pattern, and allocate a 
`SRE_REPEAT` array in `SRE_STATE` (with that number items).

It seem at any time, a REPEAT will only have one in active, so a `SRE_REPEAT` 
array is fine.
regex module does like this:
https://github.com/mrabarnett/mrab-regex/blob/hg/regex_3/_regex.c#L18287-L18288

Can the number of REPEAT be placed in `SRE_OP_INFO`?
And add a field to `SRE_OP_REPEAT` to indicate the index of this REPEAT.

--

___
Python tracker 
<https://bugs.python.org/issue23689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35859] Capture behavior depends on the order of an alternation

2022-03-29 Thread Ma Lin


Ma Lin  added the comment:

Thanks for your review.

3.11 has a more powerful re module, also thank you for rebasing the atomic 
grouping code.

--

___
Python tracker 
<https://bugs.python.org/issue35859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-24 Thread Ma Lin

Ma Lin  added the comment:

> I posted remove-bytes-hash.patch in this issue. Would you measure how this 
> affects whole application performance rather than micro benchmarks?

I guess not much difference in benchmarks.
But if put a bytes object into multiple dicts/sets, and len(bytes_key) is 
large, it will take a long time. (1 GiB 0.40 seconds on i5-11500 DDR4-3200)
The length of bytes can be arbitrary,so computing time may be very different.

Is it possible to let code objects use other types? In addition to ob_hash, 
maybe the extra byte \x00 at the end can be saved.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-23 Thread Ma Lin


Ma Lin  added the comment:

If put a bytes object into multiple dicts/sets, the hash need to be computed 
multiple times. This seems a common usage.

bytes is a very basic type, users may use it in various ways. And unskilled 
users may checking the same bytes object against dicts/sets many times.

FYI, 1 GiB data:

function seconds
hash()   0.40
binascii.crc32() 1.66   (Gregory P. Smith is trying to improve this)
zlib.crc32() 0.65

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Ma Lin


Ma Lin  added the comment:

RAM is now relatively cheaper than CPU.
1 million bytes object additionally use 7.629 MiB RAM for ob_shash. 
(100_*8/1024/1024).
This causes hash() performance regression anyway.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Ma Lin


Ma Lin  added the comment:

Since hash() is a public function, maybe some users use hash value to manage 
bytes objects in their own way, then there may be a performance regression.

For a rough example, dispatch data to 16 servers.

h = hash(b)
sendto(server_number=h & 0xF, data=b)

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Ma Lin


Ma Lin  added the comment:

If run this code, would it be slower?

bytes_hash = hash(bytes_data)
bytes_hash = hash(bytes_data)  # get hash twice

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Fix confusing versionchanged note in crc32 and adler32

2022-03-19 Thread Ma Lin


Ma Lin  added the comment:

PR 32002 is for 3.10/3.9 branches.

--

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Fix confusing versionchanged note in crc32 and adler32

2022-03-19 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30090
pull_request: https://github.com/python/cpython/pull/32002

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] stdlib wrongly uses len() for bytes-like object

2022-03-19 Thread Ma Lin


Ma Lin  added the comment:

`_Stream.write` method in tarfile.py also has this code:
https://github.com/python/cpython/blob/v3.11.0a6/Lib/tarfile.py#L434

But this bug will not be triggered. When calling this method, always pass bytes 
data.

`_ConnectionBase.send_bytes` method in multiprocessing\connection.py can be 
micro-optimized:
https://github.com/python/cpython/blob/v3.11.0a6/Lib/multiprocessing/connection.py#L193
This can be done in another issue.

So I think this issue can be closed.

--
stage: patch review -> resolved
status: pending -> closed

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Remove invalid versionchanged in doc

2022-03-17 Thread Ma Lin


Ma Lin  added the comment:

`binascii.crc32` doc also has this invalid document:
doc: https://docs.python.org/3/library/binascii.html#binascii.crc32
3.0.0 code: https://github.com/python/cpython/blob/v3.0/Modules/binascii.c#L1035

In addition, `binascii.crc32` has an `USE_ZLIB_CRC32` code path, but it's buggy.
The length of zlib `crc32()` function is `unsigned int`, so if use 
`USE_ZLIB_CRC32` code path and the data > 4GiB, the result is wrong.
Should we remove `USE_ZLIB_CRC32` code path in `binascii.c`, or fix it?

`USE_ZLIB_CRC32` code path in binascii.c (bug code): 
https://github.com/python/cpython/blob/v3.11.0a6/Modules/binascii.c#L756-L767
crc32 in zlibmodule.c, it uses an UINT_MAX sliding window (right code):
 https://github.com/python/cpython/blob/v3.11.0a6/Modules/zlibmodule.c#L1436

--
title: Remove an invalid versionchanged in doc -> Remove invalid versionchanged 
in doc

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Remove an invalid versionchanged in doc

2022-03-16 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +30046
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31955

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Remove an invalid versionchanged in doc

2022-03-16 Thread Ma Lin


New submission from Ma Lin :

Since CPython 3.0.0, the checksums are always truncated to `unsigned int`:
https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L930
https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L950

--
assignee: docs@python
components: Documentation, Library (Lib)
messages: 415386
nosy: docs@python, gregory.p.smith, malin
priority: normal
severity: normal
status: open
title: Remove an invalid versionchanged in doc
versions: Python 3.10, Python 3.11, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-19 Thread Ma Lin


Change by Ma Lin :


--
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-04 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +28606
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30397

___
Python tracker 
<https://bugs.python.org/issue46255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-04 Thread Ma Lin


New submission from Ma Lin :

These methods are METH_NOARGS, in all cases the second parameter will be NULL.

{"_checkClosed",   _PyIOBase_check_closed, METH_NOARGS},
{"_checkSeekable", _PyIOBase_check_seekable, METH_NOARGS},
{"_checkReadable", _PyIOBase_check_readable, METH_NOARGS},
{"_checkWritable", _PyIOBase_check_writable, METH_NOARGS},

--
components: IO
messages: 409672
nosy: malin
priority: normal
severity: normal
status: open
title: Remove unnecessary check in _IOBase._check*() methods
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23224] bz2/lzma: Compressor/Decompressor objects are only initialized in __init__

2021-12-19 Thread Ma Lin


Ma Lin  added the comment:

These can be done in .__new__() method:
- create thread lock
- create (de)?compression context
- initialize (de)?compressor states

In .__init__() method, only set (de)?compression parameters. And prevent 
.__init__() method from being called multiple times. 

This mode works fine in my pyzstd module (A Python bindings to zstd library).
But I think very few people will encounter this problem, we can leave it.

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue23224>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44092] [sqlite3] Remove special rollback handling

2021-12-06 Thread Ma Lin


Ma Lin  added the comment:

If the special rollback handling is removed, the behavior of 
Connection.rollback() and 'ON CONFLICT ROLLBACK' clause will be consistent.
See attached file on_conflict_rollback.py.

--
Added file: https://bugs.python.org/file50481/on_conflict_rollback.py

___
Python tracker 
<https://bugs.python.org/issue44092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44092] [sqlite3] Remove special rollback handling

2021-12-06 Thread Ma Lin


Ma Lin  added the comment:

Imagine a person write a code with Python 3.11 and SQLite 3.8.7.2+, and then 
deploying it to Python 3.11 and SQLite 3.8.7.1-, error may occur. However, this 
situation is difficult to happen.

> Can you provide a reproducer? We've run this change through the ref.
> leak bots, and they are all green, so if there's a ref. leak, the
> test suite needs improvements.

The statement in cache will be never reused. If you don't mind, it's not a big 
problem.

--

___
Python tracker 
<https://bugs.python.org/issue44092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44092] [sqlite3] Remove special rollback handling

2021-12-06 Thread Ma Lin


Ma Lin  added the comment:

> How realistic is this scenario? If you compile with, for example 3.14.0 or
> newer, you'd link with sqlite3_trace_v2, not sqlite3_trace, so the loader
> would prevent you from running with anything pre 3.14. AFAIK, we've never
> had such problems.

I mean, after this change, different versions of SQLite will behave 
differently. And give a message for SQLITE_ABORT_ROLLBACK to explain this 
problem.

> It is a change of behaviour of the internal machinery. Does the change lead
> to wrong results (duplicate rows, wrong rows returned, no rows returned)?
> Corrupted/garbage data? Non-deterministic behaviour? Does any of the API's
> provided by sqlite3 not behave according to the documentation anymore?

It just leaks resource, apart from this, there seems to be no problem.

--

___
Python tracker 
<https://bugs.python.org/issue44092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44092] [sqlite3] Remove special rollback handling

2021-12-05 Thread Ma Lin


Ma Lin  added the comment:

I think this change is no problem.
Erlend E. Aasland's explanation is very clear. 

There is only one situation that a problem may occur. Write code with SQLite 
3.8.7.2+ (2014-11-18), and run it on 3.7.15 (2012-12-12) ~ 3.8.7.1-, but this 
situation may be difficult to happen, we can note this situation in doc.

More securely, if run on SQLite 3.8.7.1-, and encounter SQLITE_ABORT_ROLLBACK 
error code, a prompt can be given to explain the reason.

Also note that the current main branch is buggy. If don't adopt this change or 
revert this change later, don't forget to fix the bug of msg407185 
(`pysqlite_Statement.in_use` flag is not reset).

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue44092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-11-27 Thread Ma Lin


Ma Lin  added the comment:

This issue is not resolved, but was covered by a problematic behavior.
Maybe this issue will be solved in issue44092, I'll study that issue later.

--

___
Python tracker 
<https://bugs.python.org/issue33376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-11-27 Thread Ma Lin


Ma Lin  added the comment:

Since 243b6c3b8fd3144450c477d99f01e31e7c3ebc0f (21-08-19), this bug can't be 
reproduced.

In `pysqlite_do_all_statements()`, 243b6c3 resets statements like this:

sqlite3_stmt *stmt = NULL;
while ((stmt = sqlite3_next_stmt(self->db, stmt))) {
if (sqlite3_stmt_busy(stmt)) {
(void)sqlite3_reset(stmt);
}
}

But the `pysqlite_Statement.in_use` flag is not reset.
In `_pysqlite_query_execute()` function, if `pysqlite_Statement.in_use` flag is 
1, it creates a new `pysqlite_Statement` instance. So this line will use a new 
statement:

gen = conn.execute("SELECT c FROM t WHERE ?", (1,))

The duplicate row is from `pysqlite_Cursor.next_row` before 
3df0fc89bc2714f5ef03e36a926bc795dcd5e05a (21-08-25).

A digressive suggestion is whether it can be changed like this, and add a check 
for resetting statement. So that statements are not allowed to be reset by 
other Cursors, which may improve code robust:

typedef struct
{
...
-   int in_use;
+   pysqlite_Cursor *in_use; // points to the attached cursor
...
} pysqlite_Statement;

--

___
Python tracker 
<https://bugs.python.org/issue33376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41735] Thread locks in zlib module may go wrong in rare case

2021-11-26 Thread Ma Lin


Ma Lin  added the comment:

Thanks for review!

--

___
Python tracker 
<https://bugs.python.org/issue41735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45816] Python does not support standalone MSVC v143 (VS 2022) Build Tools

2021-11-17 Thread Ma Lin


Ma Lin  added the comment:

They are LNK1268 error:

LINK : fatal error LNK1268: inconsistent option 'pdbthreads:5' specified with 
/USEPROFILE but not with /GENPROFILE [e:\dev\cpython\PCbuild\_queue.vcx
proj]

LINK : fatal error LNK1268: inconsistent option 'pdbthreads:1' specified with 
/USEPROFILE but not with /GENPROFILE [e:\dev\cpython\PCbuild\_asyncio.v
cxproj]

LINK : fatal error LNK1268: inconsistent option 'pdbthreads:5' specified with 
/USEPROFILE but not with /GENPROFILE [e:\dev\cpython\PCbuild\_elementtr
ee.vcxproj]

LINK : fatal error LNK1268: inconsistent option 'cgthreads:8' specified with 
/USEPROFILE but not with /GENPROFILE [e:\dev\cpython\PCbuild\_hashlib.vc
xproj]

...

--

___
Python tracker 
<https://bugs.python.org/issue45816>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45816] Python does not support standalone MSVC v143 (VS 2022) Build Tools

2021-11-17 Thread Ma Lin


Ma Lin  added the comment:

There are 5 link errors when building the PGO build.
Command: build --pgo

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue45816>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41735] Thread locks in zlib module may go wrong in rare case

2021-11-16 Thread Ma Lin


Ma Lin  added the comment:

Sorry, I found an omission.

The previous PRs fixed the bug in these methods:

zlib.Compress.compress()
zlib.Decompress.decompress()

This method also has this bug, fix in PR29587 (main/3.10) and PR29588 (3.9-):

zlib.Decompress.flush()

Attached file `test_flush.py` can reliably reproduce the bug.

This time I carefully checked bz2/lzma/zlib modules, it should be no problem 
anymore.

Gregory P. Smith should understand these codes, add him to nosy list.

--
nosy: +gregory.p.smith
resolution: fixed -> later
status: closed -> open
Added file: https://bugs.python.org/file50445/test_flush.py

___
Python tracker 
<https://bugs.python.org/issue41735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41735] Thread locks in zlib module may go wrong in rare case

2021-11-16 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +27831
pull_request: https://github.com/python/cpython/pull/29588

___
Python tracker 
<https://bugs.python.org/issue41735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41735] Thread locks in zlib module may go wrong in rare case

2021-11-16 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +27830
pull_request: https://github.com/python/cpython/pull/29587

___
Python tracker 
<https://bugs.python.org/issue41735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] stdlib wrongly uses len() for bytes-like object

2021-11-08 Thread Ma Lin


Ma Lin  added the comment:

Serhiy Storchaka:

Sorry, I found `zipfile` module also has this bug, fixed in PR29468.

This bug was reported & fixed by GitHub user `marcoffee` firstly, so I list him 
as a co-author, his work:
https://github.com/animalize/pyzstd/issues/4

The second commit fixes an omission of issue41735, a very simple fix, I fix it 
in PR29468 by the way.

--
resolution: fixed -> later
status: closed -> open

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] stdlib wrongly uses len() for bytes-like object

2021-11-08 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +27721
pull_request: https://github.com/python/cpython/pull/29468

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-08 Thread Ma Lin


Ma Lin  added the comment:

Today I tested with msvc2022-preview, `__forceinline` attribute will not hang 
the build.

64-bit PGO builds:

28d28e0~1,vc2022   : baseline
28d28e0~1+F,vc2022 : 1.02x slower  <1>
28d28e0,vc2022 : 1.03x slower  <2>
28d28e0+F,vc2022   : 1.03x slower
3.10 final,vc2022  : 1.03x slower
3.10 final+F,vc2022: 1.03x slower
28d28e0~1,vc2019   : 1.00x slower  <3>

28d28e0~1 is the last fast commit, 28d28e0 is the first slow commit.
`+F` means add `__forceinline` attribute to all inline functions in object.h
vc2019 and vc2022 are the latest version.

<1> Forcing inline is slower.
<2> 28d28e0 is still slow, but not that much.
<3> Normally, msvc2019 and msvc2022 have the same performance.

Is it possible to write a PGO profile for 28d28e0? 
https://github.com/python/cpython/commit/28d28e053db6b69d91c2dfd579207cd8ccbc39e7

msvc2022 will be released in November this year, and maybe subsequent versions 
can be built with msvc2022.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Ma Lin


Ma Lin  added the comment:

I think this is a bug of MSVC2019, not a really regression of CPython. So 
changing the code of CPython is just a workaround, maybe the right direction is 
to prompt MSVC to fix the bug, otherwise there will be more trouble when 3.11 
is released a year later.

Seeing MSVC's reply, it seems they didn't realize that it was a bug, but 
suggested to adjust the training samples and use `__forceinline`. They don't 
know `__forceinline` hangs the build process since 28d28e0.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread Ma Lin


Ma Lin  added the comment:

PR28475:
64-bit build is 1.03x slower than 28d28e0~1
32-bit build is 1.04x slower than 28d28e0~1

28d28e0~1 is the last good commit.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-19 Thread Ma Lin


Ma Lin  added the comment:

Like OP's benchmark, if convert the inline functions to macros in object.h, the 
3.10 branch is 1.03x faster, but still 1.07x slower than 28d28e0~1.
@vstinner could you prepare such a PR as a candidate fix.

There seem to be two ways to solve it in short-term.
1, Split the giant function.
2, Contact MSVC team to see if there is a quick solution, such as undocumented 
options.

But the release time is too close. The worst result is to release with the 
performance regression, and note  in the download page that there is a 
performance regression, if you care about performance please use 3.9.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-18 Thread Ma Lin


Ma Lin  added the comment:

> In my case, pgo got stuck on linking with the object.h.

Me too. Since commit 28d28e0 (the first commit to slow down the PGO build), if 
add `__forceinline` attribute to _Py_DECREF() function in object.h, the PGO 
build hangs (>50 minutes).

So PR 28427 may not be a short-term solution.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread Ma Lin


Ma Lin  added the comment:

MSVC 2019 has a /Ob3 option:
https://docs.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion

>From the experience of another project, I conjecture /Ob3 increase the "global 
>budget" mentioned in the blog.
I used /Ob3 for the 3.10 branch, and there seems to be no significant 
performance change. If you have time, neonene welcome to verify this.

--

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-08 Thread Ma Lin


Ma Lin  added the comment:

This article briefly introduces the inlining decisions in MSVC. 
https://devblogs.microsoft.com/cppblog/inlining-decisions-in-visual-studio/

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue45116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44912] doc: macOS supports os.fsync(fd)

2021-08-14 Thread Ma Lin


Ma Lin  added the comment:

Unix includes macOS.

Very sorry, close as invalid.

--
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue44912>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44912] doc: macOS supports os.fsync(fd)

2021-08-13 Thread Ma Lin


New submission from Ma Lin :

The doc of os.fsync() said:
Availability: Unix, Windows.
https://docs.python.org/3.11/library/os.html#os.fsync

But it seems that macOS supports fsync.
(I'm not a macOS user)

--
assignee: docs@python
components: Documentation, macOS
messages: 399583
nosy: docs@python, malin, ned.deily, ronaldoussoren
priority: normal
severity: normal
status: open
title: doc: macOS supports os.fsync(fd)

___
Python tracker 
<https://bugs.python.org/issue44912>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44711] Optimize type check in pipes.py

2021-07-22 Thread Ma Lin


Ma Lin  added the comment:

> I suppose it is a very old code

I also found a few old code may have performance loss.

memoryview.cast() method was add in Python 3.3.
This code doesn't use memoryview.cast(), which will bring extra memory overhead 
when the amount of data is very large.
https://github.com/python/cpython/blob/v3.10.0b4/Lib/multiprocessing/connection.py#L190-L194

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue44711>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44549] BZip 1.0.6 Critical Vulnerability

2021-07-04 Thread Ma Lin

Ma Lin  added the comment:

If you update python/cpython-source-deps, I can submit a simple PR to 
python/cpython.

I want to submit a PR to python/cpython-source-deps, but I think it’s better 
for a credible person to do this.

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue44549>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] stdlib wrongly uses len() for bytes-like object

2021-06-22 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +25427
pull_request: https://github.com/python/cpython/pull/26846

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44458] Duplicate symbol _BUFFER_BLOCK_SIZE when statically linking multiple modules

2021-06-22 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue44458>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] stdlib wrongly uses len() for bytes-like object

2021-06-21 Thread Ma Lin


Ma Lin  added the comment:

I am checking all the .py files in `Lib` folder.
hmac.py has two len() bugs:
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L212
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L214

I think PR 26764 is prepared, it fixes the len() bugs in bz2.py/lzma.py files.

--
nosy: +christian.heimes
title: PickleBuffer doesn't have __len__ method -> stdlib wrongly uses len() 
for bytes-like object

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] PickleBuffer doesn't have __len__ method

2021-06-17 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +25350
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/26764

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] PickleBuffer doesn't have __len__ method

2021-06-17 Thread Ma Lin


Ma Lin  added the comment:

Ok, I'm working on a PR.

--

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] PickleBuffer doesn't have __len__ method

2021-06-16 Thread Ma Lin


New submission from Ma Lin :

If run this code, it will raise an exception: 

import pickle
import lzma
import pandas as pd
with lzma.open("test.xz", "wb") as file:
pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)

The exception:

Traceback (most recent call last):
  File "E:\testlen.py", line 7, in 
pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)
  File "D:\Python39\lib\lzma.py", line 234, in write
self._pos += len(data)
TypeError: object of type 'pickle.PickleBuffer' has no len()

The exception is raised in lzma.LZMAFile.write() method:
https://github.com/python/cpython/blob/v3.10.0b2/Lib/lzma.py#L238

PickleBuffer doesn't have .__len__ method, is it intended?

--
messages: 395971
nosy: malin, pitrou
priority: normal
severity: normal
status: open
title: PickleBuffer doesn't have __len__ method

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44134] lzma: stream padding in xz files

2021-05-15 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue44134>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43650] MemoryError on zip.read in shutil._unpack_zipfile

2021-05-15 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue43650>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin


Ma Lin  added the comment:

Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution.

Please imagine this scenario:
- before the patch
- in 64-bit build
- use zlib.decompress() function
- the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB)

If set the `bufsize` argument to the decompressed size, it used to have a fast 
path:

zlib.decompress(data, bufsize=10*1024*1024*1024)

Fast path when (the initial size == the actual size):
https://github.com/python/cpython/blob/v3.9.5/Modules/zlibmodule.c#L424-L426

https://github.com/python/cpython/blob/v3.9.5/Objects/bytesobject.c#L3008-L3011

But in the current code, the initial size is clamped to UINT32_MAX, so there 
are two regressions:

1. allocate double RAM. (~20 GiB, blocks and the final bytes)
2. need to memcpy from blocks to the final bytes.

PR 26143 uses an UINT32_MAX sliding window for the first block, now the initial 
buffer size can be greater than UINT32_MAX.

_BlocksOutputBuffer_Finish() already has a fast path for single block. 
Benchmark this code:

zlib.decompress(data, bufsize=10*1024*1024*1024)

  time  RAM
before: 7.92 sec, ~20 GiB
after:  6.61 sec,  10 GiB
(AMD 3600X, DDR4-3200, decompressed data is 10_GiB * b'a')

Maybe some user code rely on this corner case.
This should be the last revision, then there is no regression in any case.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +24779
pull_request: https://github.com/python/cpython/pull/26143

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44114] Incorrect function signatures in dictobject.c

2021-05-12 Thread Ma Lin


Change by Ma Lin :


--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue44114>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-05-10 Thread Ma Lin


Ma Lin  added the comment:

Erlend, please take a look at this bug.

--

___
Python tracker 
<https://bugs.python.org/issue33376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin


Ma Lin  added the comment:

Found a backward incompatible behavior. 

Before the patch, in 64-bit build, zlib module allows the initial size > 
UINT32_MAX.
It creates a bytes object, and uses a sliding window to deal with the 
UINT32_MAX limit:
https://github.com/python/cpython/blob/v3.9.4/Modules/zlibmodule.c#L183

After the patch, when init_size > UINT32_MAX, it raises a ValueError.

PR 25738 fixes this backward incompatibility.
If the initial size > UINT32_MAX, it clamps to UINT32_MAX, rather than raising 
an exception.

Moreover, if you don't mind, I would like to take this opportunity to rename 
the wrapper functions from Buffer_* to OutputBuffer_*, so that the readers can 
easily distinguish between input buffer and output buffer.
If you don't think it's necessary, you may merge PR 25738 as is.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +24429
pull_request: https://github.com/python/cpython/pull/25738

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-28 Thread Ma Lin


Ma Lin  added the comment:

Thanks for reviewing this big patch.
Your review makes the code better.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-27 Thread Ma Lin

Ma Lin  added the comment:

The above changes were made in this commit:

split core code and wrappers
55705f6dc28ff4dc6183e0eb57312c885d19090a

After that commit, there is a new commit, it resolves the code conflicts 
introduced by PR 22126 one hour ago.

Merge branch 'master' into blocks_output_buffer
45d752649925765b1b3cf39e9045270e92082164

Sorry to complicate the review again.
I should ask Łukasz Langa to merge PR 22126 after this issue is resolved, since 
resolving code conflicts in PR 22126 is easier.

For the change from 55705f6 to 45d7526, see the uploaded file (45d7526.diff), 
it can also be easily seen with a Git client.

--
Added file: https://bugs.python.org/file49993/45d7526.diff

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41735] Thread locks in zlib module may go wrong in rare case

2021-04-27 Thread Ma Lin


Ma Lin  added the comment:

Thanks for review.

--

___
Python tracker 
<https://bugs.python.org/issue41735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-26 Thread Ma Lin


Ma Lin  added the comment:

Very sorry for update at the last moment.
But after the update, we should no need to touch it in the future, so I think 
it's worthy. 

Please review the last commit in PR 21740, the previous commits have not been 
changed.
IMO if use a Git client such as TortoiseGit, reviewing may be more convenient. 

The changes:

1, Move `Modules/blocks_output_buffer.h` to 
`Include/internal/pycore_blocks_output_buffer.h`
Keep the `Modules` folder clean.

2, Ask the user to initialize the struct instance like this, and use assertions 
to check it:
_BlocksOutputBuffer buffer = {.list = NULL};

Then no longer worry about whether buffer.list is uninitialized in error 
handling.
There is an extra assignment, but it's beneficial to long-term code maintenance.

3, Change the type of BUFFER_BLOCK_SIZE from `int` to `Py_ssize_t`.
The core code can remove a few type casts.

4, These functions return allocated size on success, return -1 on failure:
_BlocksOutputBuffer_Init()
_BlocksOutputBuffer_InitAndGrow()
_BlocksOutputBuffer_InitWithSize()
_BlocksOutputBuffer_Grow()
If the code is used in other sites, this API is simpler.

5, All functions are decorated with `inline`.
If the compiler is smart enough, it's possible to eliminate some code when 
`max_length` is constant and < 0.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-25 Thread Ma Lin


Ma Lin  added the comment:

> The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put 
> the core code together, these defines can be put in a thin wrapper in 
> _bz2module.c/_lzmamodule.c/zlibmodule.c files.

I tried, it looks well.
I will updated the PR within one or two days.
The code is more concise, and the burden of review is not big.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.

2021-04-12 Thread Ma Lin


Ma Lin  added the comment:

I think this change is safe.

The behaviors should be exactly the same, except the iterators are different 
objects (obj vs obj._buffer).

--

___
Python tracker 
<https://bugs.python.org/issue43787>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.

2021-04-12 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue43787>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-11 Thread Ma Lin


Ma Lin  added the comment:

> I don't really _like_ that this is a .h file acting as a C template to inject
> effectively the same static code into each module that wants to use it...
> Which I think is the concern Victor is expressing in a comment above.

I think so too.

The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put the 
core code together, these defines can be put in a thin wrapper in 
_bz2module.c/_lzmamodule.c/zlibmodule.c files. This can be done now, but it's 
ideal to improve it more thoroughly in 3.11.

_PyBytesWriter has different behavior, user may access existing data as plain 
data, which is impossible for _BlocksOutputBuffer. An API/code can be carefully 
designed, efficient/flexible/elegant, then the code may be used in some sites 
in CPython.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43785] Remove RLock from BZ2File

2021-04-09 Thread Ma Lin


Ma Lin  added the comment:

This change is backwards incompatible, it may break some code silently.

If someone really needs better performance, they can write a BZ2File class 
without RLock by themselves, it should be easy.

FYI, zlib module was added in 1997, bz2 module was added in 2002, lzma module 
was added in 2011. (Just curious for these years)

--

___
Python tracker 
<https://bugs.python.org/issue43785>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43785] bz2 performance issue.

2021-04-09 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue43785>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-05 Thread Ma Lin


Ma Lin  added the comment:

ping

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-26 Thread Ma Lin


Ma Lin  added the comment:

Close as invalid.

They the same effect:

PyErr_GivenExceptionMatches(v, PyExc_BlockingIOError))
PyErr_GivenExceptionMatches(t, PyExc_BlockingIOError))

--
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue43305>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-26 Thread Ma Lin


Ma Lin  added the comment:

I am trying to write a test-case.

--

___
Python tracker 
<https://bugs.python.org/issue43305>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-23 Thread Ma Lin

New submission from Ma Lin :

654PyErr_Fetch(, , );
655if (v == NULL || !PyErr_GivenExceptionMatches(v, PyExc_BlockingIOError)) 
{
 ↑  this should be t
https://github.com/python/cpython/blob/v3.10.0a5/Modules/_io/bufferedio.c#L654-L655

Does this need a test case?

--
components: IO
messages: 387570
nosy: malin
priority: normal
severity: normal
status: open
title: A typo in /Modules/_io/bufferedio.c
type: behavior
versions: Python 3.10, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue43305>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-02-23 Thread Ma Lin


Change by Ma Lin :


--
nosy: +erlendaasland

___
Python tracker 
<https://bugs.python.org/issue33376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43027] Calling _PyBytes_Resize() on 1-byte bytes may raise error

2021-01-25 Thread Ma Lin

New submission from Ma Lin :

PyBytes_FromStringAndSize() uses a global cache for 1-byte bytes:
https://github.com/python/cpython/blob/v3.10.0a4/Objects/bytesobject.c#L147

if (size == 1 && str != NULL) {
struct _Py_bytes_state *state = get_bytes_state();
op = state->characters[*str & UCHAR_MAX];
if (op != NULL) {
Py_INCREF(op);
return (PyObject *)op;
}
}

_PyBytes_Resize() will raise an error when (refcount != 1):
https://github.com/python/cpython/blob/v3.10.0a4/Objects/bytesobject.c#L3029

Then this code will raise an exception:

obj1 = PyBytes_FromStringAndSize("a", 1);
obj2 = PyBytes_FromStringAndSize("a", 1);
ret = _PyBytes_Resize(, 2);  // ret is -1

BTW, 0-byte bytes comes from a global singleton, but _PyBytes_Resize() 
processes it before checking refcount:
https://github.com/python/cpython/blob/v3.10.0a4/Objects/bytesobject.c#L3021

if (Py_SIZE(v) == 0) {
if (newsize == 0) {
return 0;
}
*pv = _PyBytes_FromSize(newsize, 0);
Py_DECREF(v);
return (*pv == NULL) ? -1 : 0;
}
if (Py_REFCNT(v) != 1) {
goto error;
}

_PyBytes_Resize() doc:

Only use this to build up a brand new bytes object; don’t use this if the
bytes may already be known in other parts of the code. It is an error to
call this function if the refcount on the input bytes object is not one.

--
components: Interpreter Core
messages: 385689
nosy: malin
priority: normal
severity: normal
status: open
title: Calling _PyBytes_Resize() on 1-byte bytes may raise error
type: behavior
versions: Python 3.10, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue43027>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin


Ma Lin  added the comment:

Found a new issue, can be combined with this issue.

--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue43023>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +23149
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24330

___
Python tracker 
<https://bugs.python.org/issue43023>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin


New submission from Ma Lin :

Above code already cover this check:

if (Py_SIZE(v) == newsize) {
/* return early if newsize equals to v->ob_size */
return 0;
}
if (Py_SIZE(v) == 0) {
-   if (newsize == 0) {
-   return 0;
-   }
*pv = _PyBytes_FromSize(newsize, 0);
Py_DECREF(v);
return (*pv == NULL) ? -1 : 0;
}

--
messages: 385626
nosy: malin
priority: normal
severity: normal
status: open
title: Remove a redundant check in _PyBytes_Resize()
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue43023>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42550] re库匹配问题

2020-12-02 Thread Ma Lin


Ma Lin  added the comment:

This issue can be closed.

'0x'  2
'd26935a5ee4cd542e8a3a7e74fb7a99855975b59'  40
'\n'  1

2+40+1 = 43

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue42550>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-17 Thread Ma Lin


Ma Lin  added the comment:

Last benchmark was wrong, \Ob3 option was not enabled.

Apply `pgo_ob3.diff`, it slows, so I close this issue.

+-++--+
| Benchmark   | py39_pgo_a | py39_pgo_b   |
+=++==+
| 2to3| 461 ms | 465 ms: 1.01x slower (+1%)   |
+-++--+
| chameleon   | 13.4 ms| 13.7 ms: 1.03x slower (+3%)  |
+-++--+
| chaos   | 138 ms | 141 ms: 1.02x slower (+2%)   |
+-++--+
| crypto_pyaes| 141 ms | 143 ms: 1.01x slower (+1%)   |
+-++--+
| deltablue   | 9.01 ms| 9.20 ms: 1.02x slower (+2%)  |
+-++--+
| django_template | 64.7 ms| 65.4 ms: 1.01x slower (+1%)  |
+-++--+
| dulwich_log | 78.2 ms| 78.8 ms: 1.01x slower (+1%)  |
+-++--+
| fannkuch| 640 ms | 668 ms: 1.04x slower (+4%)   |
+-++--+
| float   | 165 ms | 163 ms: 1.01x faster (-1%)   |
+-++--+
| genshi_text | 40.7 ms| 41.5 ms: 1.02x slower (+2%)  |
+-++--+
| genshi_xml  | 87.2 ms| 88.4 ms: 1.01x slower (+1%)  |
+-++--+
| go  | 309 ms | 314 ms: 1.01x slower (+1%)   |
+-++--+
| hexiom  | 12.3 ms| 12.7 ms: 1.03x slower (+3%)  |
+-++--+
| json_dumps  | 16.7 ms| 16.8 ms: 1.01x slower (+1%)  |
+-++--+
| json_loads  | 32.1 us| 32.5 us: 1.01x slower (+1%)  |
+-++--+
| logging_format  | 14.6 us| 15.0 us: 1.03x slower (+3%)  |
+-++--+
| logging_silent  | 247 ns | 257 ns: 1.04x slower (+4%)   |
+-++--+
| logging_simple  | 13.2 us| 13.6 us: 1.03x slower (+3%)  |
+-++--+
| mako| 22.1 ms| 22.8 ms: 1.03x slower (+3%)  |
+-++--+
| meteor_contest  | 135 ms | 137 ms: 1.01x slower (+1%)   |
+-++--+
| nbody   | 184 ms | 191 ms: 1.04x slower (+4%)   |
+-++--+
| nqueens | 132 ms | 137 ms: 1.04x slower (+4%)   |
+-++--+
| pathlib | 156 ms | 162 ms: 1.04x slower (+4%)   |
+-++--+
| pickle  | 16.3 us| 15.4 us: 1.05x faster (-5%)  |
+-++--+
| pickle_dict | 39.7 us| 40.0 us: 1.01x slower (+1%)  |
+-++--+
| pickle_list | 5.93 us| 6.15 us: 1.04x slower (+4%)  |
+-++--+
| pickle_pure_python  | 581 us | 587 us: 1.01x slower (+1%)   |
+-++--+
| pidigits| 243 ms | 242 ms: 1.00x faster (-0%)   |
+-++--+
| pyflate | 885 ms | 908 ms: 1.03x slower (+3%)   |
+-++--+
| python_startup  | 27.8 ms| 28.0 ms: 1.01x slower (+1%)  |
+-++--+
| python_startup_no_site  | 22.0 ms| 22.1 ms: 1.00x slower (+0%)  |
+-++--+
| raytrace| 630 ms | 632 ms: 1.00x slower (+0%)   |
+-++--+
| regex_compile   | 215

[issue42369] Reading ZipFile not thread-safe

2020-11-16 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue42369>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin


Ma Lin  added the comment:

In PGO build, the improvement is not much.

(3.9 branch, with PGO, build.bat -p X64 --pgo)

+-+--+--+
| Benchmark   | baseline-pgo | ob3-pgo  |
+=+==+==+
| 2to3| 464 ms   | 462 ms: 1.01x faster (-1%)   |
+-+--+--+
| chameleon   | 14.0 ms  | 13.5 ms: 1.03x faster (-3%)  |
+-+--+--+
| crypto_pyaes| 142 ms   | 143 ms: 1.00x slower (+0%)   |
+-+--+--+
| django_template | 65.0 ms  | 65.4 ms: 1.01x slower (+1%)  |
+-+--+--+
| fannkuch| 665 ms   | 650 ms: 1.02x faster (-2%)   |
+-+--+--+
| float   | 166 ms   | 164 ms: 1.01x faster (-1%)   |
+-+--+--+
| genshi_text | 41.4 ms  | 41.0 ms: 1.01x faster (-1%)  |
+-+--+--+
| genshi_xml  | 88.1 ms  | 87.0 ms: 1.01x faster (-1%)  |
+-+--+--+
| go  | 315 ms   | 311 ms: 1.01x faster (-1%)   |
+-+--+--+
| hexiom  | 12.7 ms  | 12.6 ms: 1.01x faster (-1%)  |
+-+--+--+
| json_dumps  | 16.7 ms  | 16.6 ms: 1.01x faster (-1%)  |
+-+--+--+
| json_loads  | 33.5 us  | 32.1 us: 1.04x faster (-4%)  |
+-+--+--+
| logging_simple  | 13.6 us  | 13.3 us: 1.02x faster (-2%)  |
+-+--+--+
| mako| 22.7 ms  | 22.8 ms: 1.01x slower (+1%)  |
+-+--+--+
| meteor_contest  | 136 ms   | 138 ms: 1.01x slower (+1%)   |
+-+--+--+
| nbody   | 189 ms   | 186 ms: 1.02x faster (-2%)   |
+-+--+--+
| nqueens | 135 ms   | 135 ms: 1.01x faster (-1%)   |
+-+--+--+
| pathlib | 157 ms   | 154 ms: 1.02x faster (-2%)   |
+-+--+--+
| pickle  | 16.8 us  | 16.4 us: 1.02x faster (-2%)  |
+-+--+--+
| pickle_dict | 41.3 us  | 40.4 us: 1.02x faster (-2%)  |
+-+--+--+
| pickle_list | 6.34 us  | 6.42 us: 1.01x slower (+1%)  |
+-+--+--+
| pickle_pure_python  | 588 us   | 584 us: 1.01x faster (-1%)   |
+-+--+--+
| pidigits| 242 ms   | 242 ms: 1.00x faster (-0%)   |
+-+--+--+
| pyflate | 905 ms   | 898 ms: 1.01x faster (-1%)   |
+-+--+--+
| python_startup  | 28.0 ms  | 27.9 ms: 1.00x faster (-0%)  |
+-+--+--+
| regex_compile   | 220 ms   | 218 ms: 1.01x faster (-1%)   |
+-+--+--+
| regex_v8| 33.1 ms  | 32.9 ms: 1.01x faster (-1%)  |
+-+--+--+
| richards| 88.9 ms  | 88.3 ms: 1.01x faster (-1%)  |
+-+--+--+
| scimark_fft | 494 ms   | 486 ms: 1.02x faster (-2%)   |
+-+--+--+
| scimark_lu  | 210 ms   | 207 ms: 1.02x faster (-2%)   |
+-+--+--+
| scimark_monte_carlo | 141 ms   | 137 ms: 1.03x faster (-3%)   |
+-+--+--+
| scimark_sor | 263 ms   | 255 ms: 1.03x faster

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin


Ma Lin  added the comment:

> Could you please try again with PGO?

Please wait.

BTW, this option was advised in another project.
In that project, even enable `\Ob3`, it still slower than GCC 9 build.
If you are interested, see: https://github.com/facebook/zstd/issues/2314

--

___
Python tracker 
<https://bugs.python.org/issue42366>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin


New submission from Ma Lin :

MSVC2019 has a new option `/Ob3`, it specifies more aggressive inlining than 
/Ob2:
https://docs.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-160

If use this option in MSVC2017, it will emit a warning:
cl : Command line warning D9002 : ignoring unknown option '/Ob3'

Just apply `Ob3.diff`, get this improvement:
(Python 3.9 branch, No PGO, build.bat -p X64)

+-+--+--+
| Benchmark   | baseline | ob3  |
+=+==+==+
| 2to3| 563 ms   | 552 ms: 1.02x faster (-2%)   |
+-+--+--+
| chameleon   | 16.5 ms  | 16.1 ms: 1.03x faster (-3%)  |
+-+--+--+
| chaos   | 200 ms   | 197 ms: 1.02x faster (-2%)   |
+-+--+--+
| crypto_pyaes| 186 ms   | 184 ms: 1.01x faster (-1%)   |
+-+--+--+
| deltablue   | 13.0 ms  | 12.6 ms: 1.03x faster (-3%)  |
+-+--+--+
| dulwich_log | 94.5 ms  | 93.9 ms: 1.01x faster (-1%)  |
+-+--+--+
| fannkuch| 806 ms   | 761 ms: 1.06x faster (-6%)   |
+-+--+--+
| float   | 211 ms   | 199 ms: 1.06x faster (-6%)   |
+-+--+--+
| genshi_text | 48.3 ms  | 47.7 ms: 1.01x faster (-1%)  |
+-+--+--+
| go  | 446 ms   | 437 ms: 1.02x faster (-2%)   |
+-+--+--+
| hexiom  | 16.6 ms  | 15.9 ms: 1.04x faster (-4%)  |
+-+--+--+
| json_dumps  | 19.9 ms  | 19.3 ms: 1.03x faster (-3%)  |
+-+--+--+
| json_loads  | 45.5 us  | 43.9 us: 1.04x faster (-3%)  |
+-+--+--+
| logging_format  | 21.4 us  | 20.7 us: 1.03x faster (-3%)  |
+-+--+--+
| logging_silent  | 343 ns   | 319 ns: 1.07x faster (-7%)   |
+-+--+--+
| mako| 29.0 ms  | 27.6 ms: 1.05x faster (-5%)  |
+-+--+--+
| meteor_contest  | 168 ms   | 162 ms: 1.04x faster (-3%)   |
+-+--+--+
| nbody   | 256 ms   | 244 ms: 1.05x faster (-5%)   |
+-+--+--+
| nqueens | 168 ms   | 162 ms: 1.04x faster (-4%)   |
+-+--+--+
| pathlib | 175 ms   | 168 ms: 1.04x faster (-4%)   |
+-+--+--+
| pickle  | 17.9 us  | 17.3 us: 1.04x faster (-4%)  |
+-+--+--+
| pickle_dict | 41.0 us  | 33.2 us: 1.24x faster (-19%) |
+-+--+--+
| pickle_list | 6.73 us  | 5.89 us: 1.14x faster (-12%) |
+-+--+--+
| pickle_pure_python  | 829 us   | 793 us: 1.05x faster (-4%)   |
+-+--+--+
| pidigits| 243 ms   | 243 ms: 1.00x faster (-0%)   |
+-+--+--+
| pyflate | 1.21 sec | 1.18 sec: 1.03x faster (-2%) |
+-+--+--+
| raytrace| 947 ms   | 915 ms: 1.03x faster (-3%)   |
+-+--+--+
| regex_compile   | 291 ms   | 284 ms: 1.03x faster (-2%)   |
+-+--+--+
| regex_dna   | 217 ms   | 222 ms: 1.02x slower (+2%)   |
+-+--+--+
| regex_effbot| 3.97 ms  | 4.13 ms: 1.04x slower (+4%)  |
+-+--+--+
| regex_v8| 35.2 ms  | 34.6 ms: 1.02x faster (-2%)  |
+-+--+--+
| richards

[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-10 Thread Ma Lin


Ma Lin  added the comment:

> I do not think that this is suitable for newcomers because you need to have 
> deep understanding why it was written in such form at first place and what 
> will be changed if you change it.

I agree contributors need to understand code, rather than simply replace the 
type. Maybe two weeks is enough to understand the code.

> And it could negatively affect performance, especially on 32-bit platforms.

`long` type can be replaced by `ssize_t`.
`unsigned long` type can be replaced by `size_t`.
And use `PyLong_FromSize_t`/`PyLong_FromSize_t`, then there is no negative 
impact.

> I don't think that it's worth it to optimize this one.

Although the speedup is small, it's free.
I don't see it as optimization, just no more waste.

> I suggest to fix in it bpo-38252.

I forgot it in that issue, I just searched "0x80808080" in the code, it was 
missed.

--

___
Python tracker 
<https://bugs.python.org/issue42304>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-10 Thread Ma Lin


Ma Lin  added the comment:

> What is the problem exactly?

There are several different problems, such as:
https://github.com/python/cpython/blob/v3.10.0a2/Modules/mathmodule.c#L2033

In addition, `utf16_decode` also has this problem, I forgot this:
https://github.com/python/cpython/blob/v3.10.0a2/Objects/stringlib/codecs.h#L465

Maybe these small problems are suitable for newcomer to familiarize the 
contribution process.

--

___
Python tracker 
<https://bugs.python.org/issue42304>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-09 Thread Ma Lin


New submission from Ma Lin :

C type `long` is 4-byte integer in 64-bit Windows build (MSVC behavior). [1]
In other compilers, `long` is 8-byte integer in 64-bit build.

This leads to a bit unnecessary performance waste, issue38252 fixed this 
problem in a situation.

Search `SIZEOF_LONG` in CPython code, there's still a few long type waste.

Novices are welcome to try contribution.

[1] https://stackoverflow.com/questions/384502

--
components: Windows
messages: 380638
nosy: malin, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: [easy C] long type performance waste in 64-bit Windows build
type: performance
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue42304>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-10-28 Thread Ma Lin


Ma Lin  added the comment:

I modify lzma module to use different growth factors, see attached picture 
different_factors.png

1.5x should be the growth factor of _PyBytesWriter under Windows.

So if change _PyBytesWriter to use memory blocks, maybe there will be no 
performance improvement.

Over allocate factor of _PyBytesWriter:

# ifdef MS_WINDOWS
# define OVERALLOCATE_FACTOR 2
# else
# define OVERALLOCATE_FACTOR 4
# endif

(I'm using Windows 10)

--
Added file: https://bugs.python.org/file49544/different_factors.png

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   4   >