[issue43260] Never release buffer when MemoryError in print()

2021-02-22 Thread STINNER Victor


STINNER Victor  added the comment:

I didn't understand the bug but thanks for fixing it :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-21 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-21 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 6e2f144f53982d7103d4cfc2d9361fc445a1817e by Inada Naoki in branch 
'3.8':
bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592)
https://github.com/python/cpython/commit/6e2f144f53982d7103d4cfc2d9361fc445a1817e


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-21 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset d51436f95bf5978f82d917e53e9a456fdaa83a9d by Inada Naoki in branch 
'3.9':
bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592)
https://github.com/python/cpython/commit/d51436f95bf5978f82d917e53e9a456fdaa83a9d


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-21 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +23399
pull_request: https://github.com/python/cpython/pull/24618

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-21 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +23398
pull_request: https://github.com/python/cpython/pull/24617

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-21 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 01806d5beba3d208bb56adba6829097d803bf54f by Inada Naoki in branch 
'master':
bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592)
https://github.com/python/cpython/commit/01806d5beba3d208bb56adba6829097d803bf54f


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-20 Thread Ramin Farajpour Cami


Change by Ramin Farajpour Cami :


--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Eryk Sun


Eryk Sun  added the comment:

> Isn't `PyUnicode_GET_LENGTH(text) < self->chunk_size` enough?

Yes, that's simpler, except with `<=`" instead of `<`, since the maximum count 
is chunk_size when pending_bytes is a list or ASCII string. When I wrote the 
more complex check, I did't take into account that pending_bytes would be 
flushed anyway if it causes pending_bytes_count to exceed chunk_size.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Inada Naoki


Inada Naoki  added the comment:

You are right. I misunderstood.

```
if (PyUnicode_IS_ASCII(text) &&
  (PyUnicode_GET_LENGTH(text) +
(self->pending_bytes ? self->pending_bytes_count : 0)) <=
  self->chunk_size &&
  is_asciicompat_encoding(self->encodefunc)) {
b = text;
Py_INCREF(b);
}
```

This seems too complex and defensive.  Isn't `PyUnicode_GET_LENGTH(text) < 
self->chunk_size` enough?

When `PyUnicode_GET_LENGTH(text) + self->pending_bytes_count > 
self->chunk_size`, `self->pending_bytes` is preflushed anyway.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Eryk Sun


Eryk Sun  added the comment:

> In your code, huge data passed to .write(huge) may be 
> remained in the internal buffer.

If you mean the buffered writer, then I don't see the problem. A large bytes 
object in pending_bytes gets temporarily referenced by 
_textiowrapper_writeflush(), and self->pending_bytes is cleared. If the 
buffer's write() method fails, then the bytes object is simply deallocated.

If you mean pending_bytes in the text wrapper, then I also don't see the 
problem. It always gets flushed if pending_bytes_count exceeds the chunk size. 
If pending_bytes is a list, it never exceeds the chunk size. It gets 
pre-flushed to avoid growing beyond the chunk size. If pending_bytes isn't a 
list, then it can only exceed the chunk size if it's a bytes object -- never 
for an ASCII str() object. _textiowrapper_writeflush() does not call 
PyBytes_AsStringAndSize() for that case.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Inada Naoki


Inada Naoki  added the comment:

In your code, huge data passed to .write(huge) may be remained in the internal 
buffer.

```
[NEW PRE-FLUSH]

else if ((self->pending_bytes_count + bytes_len) > self->chunk_size) {
if (_textiowrapper_writeflush(self) < 0) {
Py_DECREF(b);
return NULL;
}
self->pending_bytes = b;
}
(snip)
self->pending_bytes_count += bytes_len;
if (self->pending_bytes_count > self->chunk_size || needflush ||
text_needflush) {
if (_textiowrapper_writeflush(self) < 0)
return NULL;
}
```

In my opinion, when .write(huge) fails with MemoryError, TextIOWrapper must not 
keep the `huge` in the internal buffer.

See my PR-24592.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Eryk Sun


Eryk Sun  added the comment:

> stdout.write("small text")
> stdout.write("very large text")  # Calls writeflush, but can not allocate 
> buffer.

Without the optimization, in most cases this will likely fail in 
_io_TextIOWrapper_write_impl() at the line `b = (*self->encodefunc)((PyObject 
*) self, text)`. In some cases, it could be that the latter succeeds, but its 
size combined with the existing pending_bytes_count leads to a memory error in 
_textiowrapper_writeflush().

> * If input text is large (>1M?)

I'd change write() to only optimize ASCII writes so long as the new total size 
of pending writes would not exceed the text wrapper's chunk size. Then 
rearrange the logic to pre-flush the text wrapper if the pending bytes plus the 
write would exceed the chunk size. Thus the total size of a list of pending 
writes (aggregating small writes as a chunk), or that of a single ASCII str() 
object, would be limited to the chunk size, in which case 
PyBytes_FromStringAndSize in _textiowrapper_writeflush() shouldn't fail in any 
normal circumstances. For example:

if (self->encodefunc != NULL) {

[NEW CONDITION]

if (PyUnicode_IS_ASCII(text) &&
  (PyUnicode_GET_LENGTH(text) +
(self->pending_bytes ? self->pending_bytes_count : 0)) <=
  self->chunk_size &&
  is_asciicompat_encoding(self->encodefunc)) {
b = text;
Py_INCREF(b);
}
else {
b = (*self->encodefunc)((PyObject *) self, text);
}
self->encoding_start_of_stream = 0;
}
else {
b = PyObject_CallMethodOneArg(self->encoder, _PyIO_str_encode, text);
}

Py_DECREF(text);
if (b == NULL)
return NULL;
if (b != text && !PyBytes_Check(b)) {
PyErr_Format(PyExc_TypeError,
 "encoder should return a bytes object, not '%.200s'",
 Py_TYPE(b)->tp_name);
Py_DECREF(b);
return NULL;
}

Py_ssize_t bytes_len;
if (b == text) {
bytes_len = PyUnicode_GET_LENGTH(b);
}
else {
bytes_len = PyBytes_GET_SIZE(b);
}

if (self->pending_bytes == NULL) {
self->pending_bytes_count = 0;
self->pending_bytes = b;
}

[NEW PRE-FLUSH]

else if ((self->pending_bytes_count + bytes_len) > self->chunk_size) {
if (_textiowrapper_writeflush(self) < 0) {
Py_DECREF(b);
return NULL;
}
self->pending_bytes = b;
}
else if (!PyList_CheckExact(self->pending_bytes)) {
PyObject *list = PyList_New(2);
if (list == NULL) {
Py_DECREF(b);
return NULL;
}
PyList_SET_ITEM(list, 0, self->pending_bytes);
PyList_SET_ITEM(list, 1, b);
self->pending_bytes = list;
}
else {
if (PyList_Append(self->pending_bytes, b) < 0) {
Py_DECREF(b);
return NULL;
}
Py_DECREF(b);
}

self->pending_bytes_count += bytes_len;
if (self->pending_bytes_count > self->chunk_size || needflush ||
text_needflush) {
if (_textiowrapper_writeflush(self) < 0)
return NULL;
}

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +23372
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24592

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Inada Naoki


Inada Naoki  added the comment:

This is not the problem only in the optimization. _textiowrapper_writeflush 
need to create buffer bytes object anyway and if it cause MemoryError, the 
TextIOWrapper can not flush internal buffer forever.

  stdout.write("small text")
  stdout.write("very large text")  # Calls writeflush, but can not allocate 
buffer.

This example would stuck on same situation without the optimization.


But the optimization made the problem easy to happen. Now the problem happend 
with only one learge text.

Idea to fix this problem:

* If input text is large (>1M?)
  * Flush buffer before adding the large text to the buffer.
  * Encode the text and write it to self->buffer soon. Do not put it into 
internal buffer (self->pending_bytes).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Eryk Sun


Eryk Sun  added the comment:

Issue 36748 added the ASCII optimization in write(), so I'm adding Inada Naoki 
to the nosy list.

--
nosy: +methane

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-19 Thread Eryk Sun


Eryk Sun  added the comment:

The sys.stdout TextIOWrapper is stuck in a bad state. When the write() method 
is called with an ASCII string, it implements an optimization that stores a 
reference to the str() object in the internal pending_bytes instead of 
immediately encoding the string. Subsequently _textiowrapper_writeflush() 
attempts to create a bytes object for this ASCII string, but 
PyBytes_FromStringAndSize fails with a memory error. It's stuck in this state 
because it will never be able to flush the string. 

The first workaround I thought of was to call to detach() to rewrap 
sys.stdout.buffer with a new TextIOWrapper instance, but detach() attempts to 
flush and fails. A hacky but simple and effective workaround is to just 
re-initialize the wrapper. For example:

sys.stdout.__init__(sys.stdout.buffer, sys.stdout.encoding,
sys.stdout.errors, None, True)

This clears the internal pending_bytes.

--
components: +IO
nosy: +eryksun
priority: normal -> high
versions: +Python 3.10, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-18 Thread Ramin Farajpour Cami


Change by Ramin Farajpour Cami :


--
type:  -> crash

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-18 Thread Ramin Farajpour Cami


Ramin Farajpour Cami  added the comment:

Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 16:33:24) [MSC v.1928 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("a"*10)
Traceback (most recent call last):
  File "", line 1, in 
MemoryError
>>> print("1")
Traceback (most recent call last):
  File "", line 1, in 
MemoryError
>>>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-18 Thread Ramin Farajpour Cami


Ramin Farajpour Cami  added the comment:

Version : 

Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit 
(Intel)] on win32

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43260] Never release buffer when MemoryError in print()

2021-02-18 Thread Ramin Farajpour Cami


Change by Ramin Farajpour Cami :


--
title: Never release buffer when MemoryError in prtin() -> Never release buffer 
when MemoryError in print()

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com