[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Actually such flag already exists. It is unlocked_count.

There is also similar issue with seek().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
pull_requests: +3661

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread Guido van Rossum

Guido van Rossum added the comment:

> Why not simply document the fact that read ahead in Python 2.7
> is not thread-safe and leave it at that ?

Program bugs should not crash the interpreter. (ctypes excepted.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

Ah, didn't see Benjamin's patch: much better solution :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

Why not simply document the fact that read ahead in Python 2.7
is not thread-safe and leave it at that ?

.next() and .readline() already don't work well together, so this
would just add one more case.

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread Benjamin Peterson

Changes by Benjamin Peterson :


--
pull_requests: +3660
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread Guido van Rossum

Guido van Rossum added the comment:

@benjamin can you post your patch as a PR so you'll get credit for it?

--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread M.-A. Lemburg
Why not simply document the fact that read ahead in Python 2.7
is not thread-safe and leave it at that ?

.next() and .readline() already don't work well together, so this
would just add one more case.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread STINNER Victor

STINNER Victor added the comment:

@Serhiy: Would you like to propose a PR to implement your RuntimeError. Maybe 
we can test a few popular Python projects to see if the change would break them?

Which popular applications use threads (and files)? :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy: "What if just deny reentrant reads? Set a flag while read into a 
buffer, check it before reading in other thread, and raise RuntimeError."

io.BufferedReader/io.BufferedWriter raises a RuntimeError exception for 
reentrant call, but only in the same thread. For example, it ease debug for 
signal handlers which trigger such reentrant call.

I'm not sure about 
0001-stop-crashes-when-iterating-over-a-file-on-multiple-.patch since it 
doesn't fix the consistency: two parallel readline() calls can return the same 
line, instead of being mutual exclusive and only return different lines.

I'm not sure about adding a new lock. "Lock" sounds like "dead locks". I 
dislike the risk of introducing dead locks very late in the Python 2.7 
development cycle.

I like the idea of a simple exception on concurrent operations. But I'm not 
sure how much code it will break :-/ Crazy idea: would it make sense to raise 
an exception by default, but add an opt-in option to ignore the exception? I 
wrote "crazy" since it became clear that the code is not thread-safe and so 
that parallel operations on the same file object is likely to corrupt data in 
various funny ways.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread STINNER Victor

STINNER Victor added the comment:

In Python 3, reading ahead is implemented by _io.BufferedReader. This object 
uses a lock to provide a prevent race condition: it's not only to prevent 
crashes, but also provide warranties on how the file is read.

If thread A calls read() first, it gets the next bytes. If thread B calls 
read() while thread A is filling the internal file buffer ("readahead 
buffer"?), the second read is queued. The file position is only controlled by a 
single thread at the same time.

_PyOS_URandom() uses a similar strategy than Benjamin's proposed patch for the 
cached file descriptor of /dev/urandom:

fd = _Py_open("/dev/urandom", O_RDONLY);
if (fd < 0) {
...
return -1;
}
if (urandom_cache.fd >= 0) {
/* urandom_fd was initialized by another thread while we were
   not holding the GIL, keep it. */
close(fd);
fd = urandom_cache.fd;
}
else {
...
urandom_cache.fd = fd;
}

The difference is that opening /dev/urandom multiple times in parallel is safe, 
whereas reading from the same file descriptor in parellel... using the buffered 
fread()... is not safe. readahead() can require multiple fread() calls, so 
multiple read() syscalls. Interlaced reads in parallel is likely to return 
scrambled data.

Adding a lock in Python 2.7.15 can impact performances even on single threaded 
applications.

I'm not sure what whaters more here: performance or correctness?

Note: Even the awesome Python 3 io module has same flaws! 
https://bugs.python.org/issue12215 "TextIOWrapper: issues with interlaced 
read-write"

The question is more *who* reads from the same file object in parallel? Does it 
make sense? :-) Do you expect that file.read(n) is "atomic" in term of 
parallelism?

Note 2: the io module is also available in Python 2.7, just not used by default 
by the builtin open() function ;-) io.open() must be used explicitly.

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The patch is complex. What if just deny reentrant reads? Set a flag while read 
into a buffer, check it before reading in other thread, and raise RuntimeError.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread STINNER Victor

STINNER Victor added the comment:

Python 3 is not affected by this bug.

In Python 3, the full I/O stack was rewritten from scratchn, the new io module 
has a different design. Reading ahead still exists in the io module, but it is 
now done by a dedicated object: io.BufferedReader, and this object uses a lock 
to prevent concurrent reads. A single thread controls the file position at the 
same time. (Except if a different thread uses directly the file descriptor, but 
that's a different story.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread STINNER Victor

STINNER Victor added the comment:

The bug was first reported to the private Python security mailing list. The 
PSRT decided that it's a regular bug and doesn't need to be categorized as a 
vulnerability, since the attacker has to be able to run arbitrary code in 
practice.

The PSRT considers that no Python 2.7 application currently rely on reading 
from the same file object "at the same time" from different thread, since it 
currently crashs.

So an attacker would have to run his/her own code... but if an attacker can 
already run arbitrary code, why relying on an unstable race condition and 
inject machine code (so not portable), whereas Python standard library is full 
of nice features to write your portable exploit?

For more information, see the Python security model:
https://python-security.readthedocs.io/security.html#security-model

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread STINNER Victor

STINNER Victor added the comment:

Benjamin Peterson proposed attached patch.

@Benjamin: Would you mind to convert this patch to a PR to ease review?

--
keywords: +patch
nosy: +benjamin.peterson, serhiy.storchaka
Added file: 
https://bugs.python.org/file47157/0001-stop-crashes-when-iterating-over-a-file-on-multiple-.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread STINNER Victor

New submission from STINNER Victor:

Reading from the same file object in different threads does crash Python 2.7. 
The readahead feature of Objects/fileobject.c is not thread safe.

Try attached script to reproduce the crash.

--
components: Interpreter Core
files: readahead.py
messages: 302610
nosy: haypo
priority: normal
severity: normal
status: open
title: [2.7] Python 2.7 readahead feature of file objects is not thread safe
versions: Python 2.7
Added file: https://bugs.python.org/file47156/readahead.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com