[issue16212] mmap() dumps core upon resizing the underlying file

2020-11-06 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2020-11-06 Thread Irit Katriel


Change by Irit Katriel :


--
versions: +Python 3.10, Python 3.8, Python 3.9 -Python 3.2, Python 3.3, Python 
3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-14 Thread Charles-François Natali

Charles-François Natali added the comment:

 You can't use longjmp from signal handlers. Well, you can, but 99% of the 
 code that does it is broken, because you can only call async-safe functions 
 from within a signal handler, and certainly can't run the intepreter.

 I don't see the reason to run the interpreter. But a quick look at the source 
 shows that the current implementation already uses longjmps to handle SIGFPE 
 (see pyfpe.h and fpectlmodule.c). It seems to be in use on many platforms.

 The only problem I see so far is the possibility that we have to protect too 
 much. However, as I could guess from quickly browsing through the mmap() 
 implementation, the address of the mmap()ed region never leaves the module, 
 all accesses are done using exported methods. If it is really the case, we 
 can wrap them into something similar to PyFPE_START_PROTECT() and 
 PyFPE_END_PROTECT() (see comments in pyfpe.h).

SIGFPE is not handled by default (it's not generated by default, you
get NaN and friends).
I don't think there a re many uses of the fpe module, because it's
inherently unsafe.

For example, in our case, let's say you want to protect mmap_item():
what do you in case of SIGBUS?
Use longjmp to jump out of the signal handler and raise an exception?
That won't work, because you'll still be running on behalf of the
signal handler, so as soon as you call a non async safe function (or
non reentrant code), you'll deadlock, crash, or suffer from all the
consequences of undefined behavior, which is much more dangerous than
crashing on a SIGBUS.

The *only* thing you can do is exit.

Please have a look at this:
https://www.securecoding.cert.org/confluence/display/seccode/SIG35-C.+Do+not+return+from+SIGSEGV,+SIGILL,+or+SIGFPE+signal+handlers
https://www.securecoding.cert.org/confluence/display/seccode/SIG32-C.+Do+not+call+longjmp%28%29+from+inside+a+signal+handler

 A pity Posix isn't smart enough to refuse truncate()ing when there's a mmap 
 open on the affected pages. Python 3's buffer API is superior in that respect 
 :-)

Yes, but Python only cares about the current process.
Here the exact same problem occurs if the file is truncated by another
process: performing such checks for all the processes would be awfuly
expensive (and probably impossible in a race-free way).
Also, it's probably impossible to do on e.g. NFS mmaped files (the
server is stateless).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-14 Thread STINNER Victor

STINNER Victor added the comment:

 I've found a few examples of handling non-restartable signals
 with longjmps, but not that familiar with the codebase to estimate
 how reliably it can be done on all supported platforms.
 I don't really know how this code would behave, say, on Windows.

I proposed a generic signal handler for SIGSEGV (SIGFPE, SIGILL and SIGBUS) 
converting the fatal signal to a classic Python exception: issue issue #3999. 
The idea was rejected by most core developers because it's just impossible to 
guarantee that Python internal structures are still consistent. A signal can 
occur anywhere, the longjmp() will skip all cleanup instructions and so it's 
easy to get into a deadlock case.

So I proposed to display a Python traceback on such signal and just exit.

For your specific case, it would much better if the kernel handles the case.


 I thought that the possibility to crash the interpreter
 is something to be avoided at all costs.

It's quite easy to crash Python using extensions implemented in C. A simple 
example: ctypes.string_at(0).

For your specific case, you should develop an extension which sets a signal 
handler before reading the mmap and then restore the old signal handler after. 
It might be safe if the signal handler protects a single instruction, but it's 
very hard to develop a signal handler for such critical signals (SIGSEGV and 
friends).

I suggest to just close this issue as wont fix. Python cannot do anything 
useful in a safe manner against this OS bug.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-14 Thread Vladimir Ushakov

Vladimir Ushakov added the comment:

SIGBUS as well as SIGFPE or SIGSEGV is a synchronous signal. It is delivered to 
the thread that caused the trouble and the stack contents is well defined.

 https://www.securecoding.cert.org/confluence/display/seccode/SIG35-C.+Do+not+return+from+SIGSEGV,+SIGILL,+or+SIGFPE+signal+handlers

Does not apply here: longjmp is not a return.

 https://www.securecoding.cert.org/confluence/display/seccode/SIG32-C.+Do+not+call+longjmp%28%29+from+inside+a+signal+handler

The grounds are weak. They refer to SIG30-C, which is based on ISO C standard.

POSIX explicitly declares the fact that longjmp is allowed to exit signal 
handlers as an extension to ISO C standard.

(See http://pubs.opengroup.org/onlinepubs/9699919799/)

Besides, POSIX explicitly states that only async-signal-safe functions can be 
called from the handlers, which are invoked asynchronously with process 
execution. That's not out case as mentioned above, we are not in asynchronous 
context as it could be with, say, SIGINT.

(See 
http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html)

Glibc doesn't mind as well:

http://www.gnu.org/software/libc/manual/html_node/Longjmp-in-Handler.html

I can doubt about non-POSIX platforms, but with POSIX everything seems clean to 
me. Especially since the method is already in use with SIGFPE.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-14 Thread Vladimir Ushakov

Vladimir Ushakov added the comment:

 For your specific case, you should...

There's nothing I should. As I said above, the bug doesn't trouble me. I just 
posted it as a generic contribution and don't really care whether it's going to 
be fixed or not. If decided so, I could help. Otherwise I'll just mind my 
business. :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-14 Thread Vladimir Ushakov

Vladimir Ushakov added the comment:

 ...it's just impossible to guarantee that Python internal structures are 
 still consistent.

In generic case, I completely agree. Here the things are much simpler (unless I 
missed something). We know perfectly well the code we need to protect (mmap 
accessors). We can make the protected sections small enough to ensure that no 
internal state gets corrupted. I just feel, I can do it simply and safely.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-14 Thread Vladimir Ushakov

Vladimir Ushakov added the comment:

Update: The correct link to the POSIX definition of longjmp exiting a signal 
handler as an ISO C extension is

http://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-14 Thread Charles-François Natali

Charles-François Natali added the comment:

 SIGBUS as well as SIGFPE or SIGSEGV is a synchronous signal. It is delivered 
 to the thread that caused the trouble and the stack contents is well defined.

Except that signal handlers are per-process, not per thread.
So if another thread (which could be running without the GIL, e.g.
doing some lengthy computation) triggers a SIGBUS while the handler is
setup, it's going to try to longjmp to the jmp_buf set by the other
thread. And this will get really nasty.


 https://www.securecoding.cert.org/confluence/display/seccode/SIG35-C.+Do+not+return+from+SIGSEGV,+SIGILL,+or+SIGFPE+signal+handlers

 Does not apply here: longjmp is not a return.

I still don't understand how you plan to handle the SIGBUS/SIGSEGV.
Where will you longjmp to? If the idea is to dump a traceback and
exit, then there's faulthandler for that.

As for the stack content, variables with automatic storage class (i.e.
local variables) modified should be declared volatile (which may or
may not be an issue here).

 https://www.securecoding.cert.org/confluence/display/seccode/SIG32-C.+Do+not+call+longjmp%28%29+from+inside+a+signal+handler

 The grounds are weak. They refer to SIG30-C, which is based on ISO C standard.

 POSIX explicitly declares the fact that longjmp is allowed to exit signal 
 handlers as an extension to ISO C standard.

It is allowed, but as said earlier, 99% of use cases are just wrong.

 Besides, POSIX explicitly states that only async-signal-safe functions can be 
 called from the handlers, which are invoked asynchronously with process 
 execution. That's not out case as mentioned above, we are not in 
 asynchronous context as it could be with, say, SIGINT.

Well, nothing prevents kill -SEGV PID.

Besides, setjmp() is expensive, and here we would be setting up a
jmp_buf even for a single byte memory access.

But I think we've talked enough :-)
Honestly, if you manage to find a way to handle this, I'll be happy to
commit it, so don't hesitate if you think you've found an approach
that will work.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-14 Thread Vladimir Ushakov

Vladimir Ushakov added the comment:

 But I think we've talked enough...

So do I. Let's go for it. I'll make a patch (which apparently takes some time) 
and post it here, we'll discuss it then. Thanks for your comments. Now I 
believe it's a bit harder than I thought initially, but still doable.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-13 Thread Vladimir Ushakov

Vladimir Ushakov added the comment:

I know how to avoid the problem in my case, the bug does not really affect me. 
I posted it because I thought that the possibility to crash the interpreter is 
something to be avoided at all costs.

I've found a few examples of handling non-restartable signals with longjmps, 
but not that familiar with the codebase to estimate how reliably it can be done 
on all supported platforms. I don't really know how this code would behave, 
say, on Windows.

I'm gonna do some research on the topic when I have a little more time.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-13 Thread Charles-François Natali

Charles-François Natali added the comment:

 I know how to avoid the problem in my case, the bug does not really affect 
 me. I posted it because I thought that the possibility to crash the 
 interpreter is something to be avoided at all costs.

I fully agree with you.
However, that's a problem inherent to mmap(), and I don't think
there's a way to avoid this, but I could be wrong.

 I've found a few examples of handling non-restartable signals with longjmps, 
 but not that familiar with the codebase to estimate how reliably it can be 
 done on all supported platforms. I don't really know how this code would 
 behave, say, on Windows.

You can't use longjmp from signal handlers. Well, you can, but 99% of
the code that does it is broken, because you can only call async-safe
functions from within a signal handler, and certainly can't run the
intepreter.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

A pity Posix isn't smart enough to refuse truncate()ing when there's a mmap 
open on the affected pages. Python 3's buffer API is superior in that respect 
:-)

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-13 Thread Vladimir Ushakov

Vladimir Ushakov added the comment:

 You can't use longjmp from signal handlers. Well, you can, but 99% of the 
 code that does it is broken, because you can only call async-safe functions 
 from within a signal handler, and certainly can't run the intepreter.

I don't see the reason to run the interpreter. But a quick look at the source 
shows that the current implementation already uses longjmps to handle SIGFPE 
(see pyfpe.h and fpectlmodule.c). It seems to be in use on many platforms.

The only problem I see so far is the possibility that we have to protect too 
much. However, as I could guess from quickly browsing through the mmap() 
implementation, the address of the mmap()ed region never leaves the module, all 
accesses are done using exported methods. If it is really the case, we can wrap 
them into something similar to PyFPE_START_PROTECT() and PyFPE_END_PROTECT() 
(see comments in pyfpe.h).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Vladimir Ushakov

New submission from Vladimir Ushakov:

The following code crashes the interpreter on Linux:

#!/usr/bin/python3

import mmap

with open('test', 'wb') as f:
f.write(bytes(1))

with open('test', 'r+b') as f:
m = mmap.mmap(f.fileno(), 0)
f.truncate()
a = m[:]

---

It's not specific to the zero size truncation, it's enough if the file size 
decreases beyond a page border.

--
components: IO
messages: 172745
nosy: Vladimir.Ushakov
priority: normal
severity: normal
status: open
title: mmap() dumps core upon resizing the underlying file
type: crash
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Christian Heimes

Christian Heimes added the comment:

I'm able to reproduce the bug. Here is a stack trace:

#0  0x005a650d in PyBytes_FromStringAndSize (str=0x7f44c127a000 
Address 0x7f44c127a000 out of bounds, size=1)
at Objects/bytesobject.c:82
82  (op = characters[*str  UCHAR_MAX]) != NULL)
(gdb) bt
#0  0x005a650d in PyBytes_FromStringAndSize (str=0x7f44c127a000 
Address 0x7f44c127a000 out of bounds, size=1)
at Objects/bytesobject.c:82
#1  0x7f44bf7c11b2 in mmap_subscript (self=0x7f44bf9ca350, item=slice at 
remote 0x7f44c1168618)
at /home/heimes/dev/python/cpython/Modules/mmapmodule.c:799
#2  0x00590863 in PyObject_GetItem (o=mmap.mmap at remote 
0x7f44bf9ca350, key=slice at remote 0x7f44c1168618)
at Objects/abstract.c:143

--
nosy: +christian.heimes
stage:  - needs patch
versions: +Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Charles-François Natali

Charles-François Natali added the comment:

That's normal.
You're truncating the file after having it mapped, so touching the pages 
corresponding to the truncated part of the file will result in a segfault.
See mmap's man page:

   Use of a mapped region can result in these signals:

   SIGBUS Attempted access to a portion of the buffer that does not 
correspond to
  the file (for example, beyond the end of the file, including the 
case
  where another process has truncated the file).


The only way to guard against it would be to call fstat() in every 
buffer-protocol method in case of file-backed mapping to check against the 
current file size, but that would be awfully slow in case of sequential access 
(just imagine performing an md5sum of an mmap-ed file).

--
nosy: +neologix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Christian Heimes

Christian Heimes added the comment:

The fstat() check isn't bullet proof, too. It has a race condition as another 
process could truncate the file between the fstat() check and the code lines 
that access the mmapped file.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Charles-François Natali

Charles-François Natali added the comment:

No, it's not.
That's why I think there's nothing that can be done.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Vladimir Ushakov

Vladimir Ushakov added the comment:

I think, handling the signal would do.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Charles-François Natali

Charles-François Natali added the comment:

 I think, handling the signal would do.

You can't.
Handling a signal like SIGSEGV or SIGBUS in any other way that exiting
the process will result in an infinite loop (as soon as the signal
handler returns the faulty instruction will be re-executed). Well,
there are some higly non-portable ways to try to escape this (see
http://stackoverflow.com/questions/2663456/write-a-signal-handler-to-catch-sigsegv),
but it won't fly in our case.
You may want to have a look at the faulthandler module for graceful
shutdown, but nothing can't be done.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread STINNER Victor

STINNER Victor added the comment:

Well, there are some higly non-portable ways to try to escape this (see
http://stackoverflow.com/questions/2663456/write-a-signal-handler-to-catch-sigsegv),
 but it won't fly in our case.

There is also the libsigsegv library, but it's hard and unsafe to handle such 
low level signals in Python.
http://www.gnu.org/software/libsigsegv/

It has a race condition as another process could truncate the file between the 
fstat() check and the code lines that access the mmapped file.

You can use file locks to be protected against such race condition. See for 
example:
http://pypi.python.org/pypi/lockfile

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Charles-François Natali

Charles-François Natali added the comment:

 You can use file locks to be protected against such race condition.
 See for example:

Those are advisory locks, not mandatory locks.
It's possible to do some mandatory locking on some systems, but on every file, 
and it's not portable.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16212] mmap() dumps core upon resizing the underlying file

2012-10-12 Thread Jesús Cea Avión

Changes by Jesús Cea Avión j...@jcea.es:


--
nosy: +jcea

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com