[issue17852] Built-in module _io can loose data from buffered files at exit

2017-09-05 Thread Antoine Pitrou

Antoine Pitrou added the comment:

To elaborate a bit on the patch:
- it is pointless to call flush() if the buffered is in a bad state (self->ok 
== 0) or it has started finalizing already
- you need to own the reference, since flush() can release the GIL and, if the 
reference is borrowed, the refcount can fall to 0 in another thread and the 
whole object deallocated under your feet

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-09-05 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Just apply the following patch to the original PR and it should work fine:

diff --git a/Modules/_io/bufferedio.c b/Modules/_io/bufferedio.c
index 50c87c1..2ba98f2 100644
--- a/Modules/_io/bufferedio.c
+++ b/Modules/_io/bufferedio.c
@@ -409,12 +409,12 @@ static void
 buffered_dealloc(buffered *self)
 {
 self->finalizing = 1;
+if (self->next != NULL)
+remove_from_linked_list(self);
 if (_PyIOBase_finalize((PyObject *) self) < 0)
 return;
 _PyObject_GC_UNTRACK(self);
 self->ok = 0;
-if (self->next != NULL)
-remove_from_linked_list(self);
 if (self->weakreflist != NULL)
 PyObject_ClearWeakRefs((PyObject *)self);
 Py_CLEAR(self->raw);
@@ -1860,8 +1860,12 @@ void _PyIO_atexit_flush(void)
 while (buffer_list_end.next != _list_end) {
 buffered *buf = buffer_list_end.next;
 remove_from_linked_list(buf);
-buffered_flush(buf, NULL);
-PyErr_Clear();
+if (buf->ok && !buf->finalizing) {
+Py_INCREF(buf);
+buffered_flush(buf, NULL);
+Py_DECREF(buf);
+PyErr_Clear();
+}
 }
 }

--
nosy: +pitrou
versions: +Python 3.6, Python 3.7 -Python 3.4, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-09-05 Thread Neil Schemenauer

Neil Schemenauer added the comment:

I reverted because of the crash in test_threading.  I'm pretty sure there is a 
bug with the locking of bufferedio.c, related to threads and flush.  Here is 
the stacktrace I get (my patch applied, I'm trying to write a Python test that 
triggers the SEGV without my flush patch).

Attached is the script that triggers the crash.


Thread 3 received signal SIGSEGV, Segmentation fault.
0x0001000c06d4 in PyObject_GetAttr (
v=, name='seek')
at ../Objects/object.c:882
882 PyTypeObject *tp = Py_TYPE(v);
(gdb) bt
#0  0x0001000c06d4 in PyObject_GetAttr (
v=, name='seek')
at ../Objects/object.c:882
#1  0x0001000440e9 in PyObject_CallMethodObjArgs (
callable=, name='seek')
at ../Objects/call.c:1212
#2  0x0001003005a2 in _buffered_raw_seek (self=0x1053a39b8, target=0,
whence=1) at ../Modules/_io/bufferedio.c:742
#3  0x00010030015e in buffered_flush_and_rewind_unlocked (self=0x1053a39b8)
at ../Modules/_io/bufferedio.c:851
#4  0x0001002fed12 in buffered_flush (self=0x1053a39b8, args=0x0)
at ../Modules/_io/bufferedio.c:869
#5  0x0001002feb4d in _PyIO_atexit_flush ()
at ../Modules/_io/bufferedio.c:1863
[...]

--
Added file: http://bugs.python.org/file47121/buffer_crash.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-09-05 Thread STINNER Victor

STINNER Victor added the comment:

> FAIL: test_4_daemon_threads (test.test_threading.ThreadJoinOnShutdown)

Oh, it also failed on:

http://buildbot.python.org/all/builders/x86%20Gentoo%20Installed%20with%20X%203.x/builds/957/steps/test/logs/stdio

and

http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%20with%20X%203.x/builds/942/steps/test/logs/stdio

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-09-05 Thread STINNER Victor

STINNER Victor added the comment:

> New changeset e38d12ed34870c140016bef1e0ff10c8c3d3f213 by Neil Schemenauer in 
> branch 'master':
> bpo-17852: Maintain a list of BufferedWriter objects.  Flush them on exit. 
> (#1908)

This change introduced a regression:



==
FAIL: test_4_daemon_threads (test.test_threading.ThreadJoinOnShutdown)
--
Traceback (most recent call last):
  File 
"/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_threading.py", line 
781, in test_4_daemon_threads
self.assertFalse(err)
AssertionError: b'pthread_cond_destroy: Resource busy\npthread_mutex_destroy: 
Resource busy' is not false

http://buildbot.python.org/all/builders/x86%20Tiger%203.x/builds/1137

> New changeset db564238db440d4a2d8eb9d60ffb94ef291f6d30 by Neil Schemenauer in 
> branch 'master':
> Revert "bpo-17852: Maintain a list of BufferedWriter objects.  Flush them on 
> exit. (#1908)" (#3337)

This revert fixed the regression.

Did you revert the change because of the failure?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-09-04 Thread Neil Schemenauer

Neil Schemenauer added the comment:


New changeset db564238db440d4a2d8eb9d60ffb94ef291f6d30 by Neil Schemenauer in 
branch 'master':
Revert "bpo-17852: Maintain a list of BufferedWriter objects.  Flush them on 
exit. (#1908)" (#3337)
https://github.com/python/cpython/commit/db564238db440d4a2d8eb9d60ffb94ef291f6d30


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-09-04 Thread Neil Schemenauer

Changes by Neil Schemenauer :


--
pull_requests: +3352

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-09-04 Thread Neil Schemenauer

Neil Schemenauer added the comment:


New changeset e38d12ed34870c140016bef1e0ff10c8c3d3f213 by Neil Schemenauer in 
branch 'master':
bpo-17852: Maintain a list of BufferedWriter objects.  Flush them on exit. 
(#1908)
https://github.com/python/cpython/commit/e38d12ed34870c140016bef1e0ff10c8c3d3f213


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-06-01 Thread Xavier G. Domingo

Changes by Xavier G. Domingo :


--
nosy: +xgdomingo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-06-01 Thread Neil Schemenauer

Changes by Neil Schemenauer :


--
pull_requests: +1987

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-06-01 Thread Neil Schemenauer

Neil Schemenauer added the comment:

"Did you get any ResourceWarning?"

I already knew that explicitly closing the file would fix the issue.  However, 
think of the millions of lines of Python 2 that hopefully will be converted to 
Python 3.  There will be many ResourceWarning errors.  It is not reasonable to 
think that everyone is just going to fix them when they see them.  Applications 
that used to be reliable will now randomly lose data.

What I spent hours debugging is trying to figure out why when the open file is 
collected on interpreter exit, buffered data not flushed.  I knew the object 
was not part of an uncollectable garbage cycle.  I initially suspected there 
could be some bug in the _io module or maybe even a kernel bug.

It turns out that in Python 2 the buffer and the open file is a single object 
and so when the finalizer gets called, the data is always correctly written 
out.  In Python 3, the buffer and the underlying file object are separate and 
they can get finalized in different order depending on essentially random 
factors.

Fixing this particular issue is not difficult as Armin's code demonstrates.  I 
don't think blaming the application programmer is the solution.  Obviously we 
keep the warnings and still tell programmers to close their files or use 
context managers.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-06-01 Thread STINNER Victor

STINNER Victor added the comment:

Neil Schemenauer: "Well, I just spent a couple of hours debugging a problem 
caused by this issue."

Did you get any ResourceWarning? You need to run python3 with -Wd to see them.

By the way, I enhanced ResourceWarning in Python 3.6: if you enable 
tracemalloc, these warnings now log the traceback where the leakded resource 
was created ;-)
https://docs.python.org/dev/whatsnew/3.6.html#warnings

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2017-05-31 Thread Neil Schemenauer

Neil Schemenauer added the comment:

Well, I just spent a couple of hours debugging a problem caused by this issue.  
You could argue that I should be calling close() on all of my file-like objects 
but I agree with Armin that the current "most of the time it works" behaviour 
is quite poor.

In my case, the issue is exactly what Antoine Pitrou suggests: if the FileIO 
object gets finalized before the BufferedIO object wrapping it then the 
buffered data gets lost. That depends on the order that the GC calls finalizers.

Armin's suggestion of keeping a list of open buffered files and flushing them 
before exiting seems like a simple and robust solution.  Having a warning for 
unclosed files is fine but in order to make porting Python 2 code as painless 
as possible, matching the safer behavior of Python 2 would be worth the extra 
bit of flush logic.

--
nosy: +nascheme

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2015-02-13 Thread Martin Panter

Changes by Martin Panter vadmium...@gmail.com:


--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Armin Rigo added the comment:

I hate to repeat myself, but if the C standard says files are flushed at 
exit; if Python 2 follows this standard; and if Python 3 follows it most of 
the time but *not always*...  then it seems to me that something is very, very 
buggy in the worst possible way (rare data loss).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Do you have working in 3.4+ example?

--
stage:  - test needed
versions: +Python 3.5 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Armin Rigo added the comment:

If I understood correctly, Python 3.4 tries harder to find cycles and call 
destructors at the end of the program, but that's not a full guarantee.  For 
example you can have a reference from a random C extension module.

While trying to come up with an example, I found one that I don't fully 
understand, but the point is that it shows how easy it is to defeats it:

import sys
f = open('foo.txt', 'w')
f.write('abc')
def func(*args):
return func
sys.settrace(func)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Armin Rigo added the comment:

(Ah, it's probably a reference from the trace function - func_globals - f).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is an example.

--
stage: test needed - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


Added file: http://bugs.python.org/file37357/gcio.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Note the example I posted doesn't involve the shutdown sequence. It calls 
gc.collect() explicitly.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Armin Rigo added the comment:

To add to the confusion: Antoine's example produces an empty file on the 
current trunk cd282dd0cfe8.  When I first tried it on a slightly older trunk 
(157 changesets ago), it correctly emitted a file with barXXX , but only if 
gc.collect() was present.  Without gc.collect() it didn't.  This seems at 
odds with the explanation that gc.collect() should be what occurs at 
shut-down now.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The problem is that the order of tp_finalize calls is arbitrary when there is a 
reference cycle (same thing, of course, with tp_clear). So depending on the 
exact layout of the garbage list, the TextIOWrapper could be collected before 
the BufferedWriter and the FileIO (good), or after (bad).

I don't see an easy way to solve this. Either _io should provide hints to the 
GC (which kind of hints?), or the tp_finalize should be customized to somehow 
call a dependent wrapper's tp_finalize (how? by keeping a weakref?).

--
nosy: +tim.peters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Armin Rigo added the comment:

Maybe accepting the fact that relying on finalizers is a bad idea here?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

You mean encourage people to explicitly use with or call close()? Yes, that's 
the simplest answer. It's why we added ResourceWarning.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread STINNER Victor

STINNER Victor added the comment:

 The problem is that the order of tp_finalize calls is arbitrary when there is 
 a reference cycle

If we want to guaranty that all files are properly flushed at exit, the best 
option is to maintain a list of open files.

I'm not interested to implement that, it sounds too complex. For example, 
open(text.txt, w) creates 3 objects: FileIO, BufferedWriter, TextIOWrapper. 
You have to register these 3 objects and flush them in the right order. Python 
is dynamic, you are free to modify objects, for example using the detach() 
method or by modifying the buffer attribute of the TextIOWrapper.

How do you decide which object should be flushed? In which order? We have to 
take care of signals, threads and forks, stay portable, etc.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Le 04/12/2014 12:05, STINNER Victor a écrit :
 
 The problem is that the order of tp_finalize calls is arbitrary
 when
there is a reference cycle
 
 If we want to guaranty that all files are properly flushed at exit,
the best option is to maintain a list of open files.

I am not talking about files flushing at exit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 How do you decide which object should be flushed? In which order? We have to
 take care of signals, threads and forks, stay portable, etc.

The order doesn't matter. If you call flush() of TextIOWrapper, flushes of 
buffered writer and raw file will be called automatically. And flush() can be 
called multiple times.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Armin Rigo added the comment:

Antoine: sorry if I wasn't clear enough.  Obviously you want to encourage 
people to close their files, but I think personally that it is very bad for the 
implementation to *most of the time* work anyway and only rarely fail to flush 
the files.  So, speaking only about the implementation, it is (imho) a bad idea 
to rely on finalizers to flush the files, and something else should be done.

Victor: it does not sound complicated to me to keep the BufferedWriter objects 
in a doubly-chained list.  You're overthinking the issue: there are no 
multithread issue (we have a GIL) and you don't have to keep track of all 3 
objects created by open.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Le 04/12/2014 13:53, Armin Rigo a écrit :
 
 So, speaking only about the
implementation, it is (imho) a bad idea to rely on finalizers to flush
the files, and something else should be done.

But what else?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Armin Rigo added the comment:

Antoine: I'm trying to explain what in the three lines that follow the parts 
you quoted.  I already tried to explain it a few times above.  Now I feel that 
I'm not going anywhere, so I'll quote back myself from 2013-04-27: Feel free 
to close anyway as not-a-bug; I won't fight the Python 3 behavior, because 
Python 2 works as expected.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Armin: I don't see how linked lists would help.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

How it looks to me. Should be global collection of weak references (WeakSet?). 
Every instance of TextIOWrapper, BufferedWriter and BufferedRandom add itself 
to 
this collection on create and remove on close. A function registered with 
atexit calls flush() for all methods in the collection.

This solution doesn't require changes to finalization machinery, only to the io 
module.

I'm not particularly interested to implement that because I'm not convinced 
that Python should provide such guaranties. But the implementation shouldn't 
be too complex.

More general solution could be to introduce special method (__predel__) which 
should be called before breaking reference loops (or just before __del__ if 
there is no loop). In case of output streams it will call flush() (but not 
close()!).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy:  -pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Armin Rigo added the comment:

Here is a proof-of-concept.  It changes both _pyio.py and bufferedio.c and was 
tested with the examples here.  (See what I meant with linked lists.)  The 
change in textio.c is still missing, but should be very similar to 
bufferedio.c.  This is similar to the implementation in PyPy.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Armin Rigo

Changes by Armin Rigo ar...@users.sourceforge.net:


--
keywords: +patch
Added file: http://bugs.python.org/file37359/pyio.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread STINNER Victor

STINNER Victor added the comment:

By the way, I fixed various issues to ensure that ResourceWarning are
displayed at Python exit. There are still corner cases when the
warnings are not emited. For example, when a daemon thread is used. Or
when the warning comes very late during Python finalization.

Recent example of tricky issue related to this warning: #22898. IMO
it's a bug, it must be fixed and it should be possible to emit the
warning.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-04 Thread Nikolaus Rath

Nikolaus Rath added the comment:

This will probably be too radial, but I think it should at least be mentioned 
as a possible option.
 
We could just not attempt to implicitly flush buffers in the finalizer at all. 
This means scripts relying on this will break, but in contrast to the current 
behavior they will break consistently so it's easy to find and fix the problem.

I think this is still preferable to any solution that does not guarantee 
flushing in 100% of the cases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-03 Thread Nikolaus Rath

Changes by Nikolaus Rath nikol...@rath.org:


--
nosy: +nikratio

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-03 Thread Nikolaus Rath

Nikolaus Rath added the comment:

Serhiy, I believe this still happens in Python 3.4, but it is harder to 
reproduce. I couldn't get Armin's script to produce the problem either, but I'm 
pretty sure that this is what causes e.g. 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771452#60.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-12-03 Thread Charles-François Natali

Charles-François Natali added the comment:

 Serhiy, I believe this still happens in Python 3.4, but it is harder to 
 reproduce. I couldn't get Armin's script to produce the problem either, but 
 I'm pretty sure that this is what causes e.g. 
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771452#60.

Maybe, as it's been said several times, it's an application bug: files
should be closed explicitly (or better with a context manager of
course).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-11-20 Thread Armin Rigo

Armin Rigo added the comment:

Victor: there is the GIL, you don't need any locking.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-11-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

In 3.4+ the example script always writes string bar to file foo. Tested by 
running it in a loop hundreds times. Cleaning up at shutdown was enhanced in 
3.4.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-11-19 Thread STINNER Victor

STINNER Victor added the comment:

Is anyone interested to work on the maintain a list of open file objects idea?

I consider that Python 3 does its best to flush data at exit, but it's not a 
good practice to rely on the destructors to flush data. I mean, there is no 
warranty that files will be written in the correct order and that files will 
stay consistent. It's much safer to explicitly call the clode() method.

By the way, Python 3 now emits ResourceWarning warnings when files are 
destroyed without being explicitly closed. I enhanced the code handling 
warnings during Python shutdown to show these warnings in more use cases, but 
it is still not perfect. I opened the issue #21788 as a reminder that the code 
can still be enhanced.

For the maintain a list of objects idea, I expect race conditions with 
threads, signals and processes (fork). We already have known issues with locks 
+ fork. I expect even more issues if we maintain a list of open files.

I propose to close the issue as wontfix. Python 3 does its best, but please 
call explicitly the close() method!

Maybe the undefined behaviour should be documented (with a warning?) in the 
buffered writer of the io module, and maybe also in the open() function?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2014-11-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm +1 on closing this. Agree with Charles-François that it's never been 
guaranteed by the Python specification. Pythonic way to work with files is to 
use the with statement, or, if you need long living file stream, careful 
close files in the finally block or in __exit__ method of some class which is 
used as context manager. Non-observance of this rule can be considered as a bug 
(see issue22831).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-30 Thread Charles-François Natali

Charles-François Natali added the comment:

 Hum, POSIX (2004) is not so strict:
 Whether open streams are flushed or closed, or temporary files are
 removed is implementation-defined.
 http://pubs.opengroup.org/onlinepubs/009695399/functions/exit.html

You're looking at the wrong section:

The exit() function shall then flush all open streams with unwritten
buffered data, close all open streams

Then a couple lines below:

The _Exit() [CX]   and _exit()  functions shall not call functions
registered with atexit() nor any registered signal handlers. Whether
open streams are flushed or closed, or temporary files are removed is
implementation-defined.

It's guaranteed for exit(), not _Exit()/_exit().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-30 Thread STINNER Victor

STINNER Victor added the comment:

 It's guaranteed for exit(), not _Exit()/_exit().

Oops ok, thanks.

2013/4/30 Charles-François Natali rep...@bugs.python.org:

 Charles-François Natali added the comment:

 Hum, POSIX (2004) is not so strict:
 Whether open streams are flushed or closed, or temporary files are
 removed is implementation-defined.
 http://pubs.opengroup.org/onlinepubs/009695399/functions/exit.html

 You're looking at the wrong section:

 The exit() function shall then flush all open streams with unwritten
 buffered data, close all open streams

 Then a couple lines below:

 The _Exit() [CX]   and _exit()  functions shall not call functions
 registered with atexit() nor any registered signal handlers. Whether
 open streams are flushed or closed, or temporary files are removed is
 implementation-defined.

 It's guaranteed for exit(), not _Exit()/_exit().

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue17852
 ___

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-29 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-29 Thread STINNER Victor

STINNER Victor added the comment:

In Python 2, a buffered file opened for writing is flushed by the C library 
when the process exit.


When you say Python 2, I assume you mean CPython 2, right?
Because - AFAICT - files got flushed only by accident, not by design.

It looks to be a feature of the standard C library, at least the GNU libc. Its 
libio library installs an exit handler flushing all open files. You can see it 
if you set a breaking on write() using gdb:


Breakpoint 5, write () at ../sysdeps/unix/syscall-template.S:82
82  ../sysdeps/unix/syscall-template.S: Aucun fichier ou dossier de ce type.

(gdb) where
#0  write () at ../sysdeps/unix/syscall-template.S:82
#1  0xb7e33ba5 in _IO_new_file_write (f=0x8254ef0, data=0x81f9798, n=3) at 
fileops.c:1289
#2  0xb7e33a84 in new_do_write (fp=0x8254ef0, data=0x81f9798 
bar\267\300\207\366\267\220\227\037\b\220\227\037\b\004d, to_do=3) at 
fileops.c:543
#3  0xb7e350fe in _IO_new_do_write (fp=0x8254ef0, data=0x81f9798 
bar\267\300\207\366\267\220\227\037\b\220\227\037\b\004d, to_do=3) at 
fileops.c:516
#4  0xb7e354b5 in _IO_new_file_overflow (f=0x8254ef0, ch=-1) at fileops.c:894
#5  0xb7e36c40 in _IO_flush_all_lockp (do_lock=0) at genops.c:849
#6  0xb7e36d8e in _IO_cleanup () at genops.c:1010
#7  0xb7df5f92 in __run_exit_handlers (status=0, listp=0xb7f683e4, 
run_list_atexit=true) at exit.c:91
#8  0xb7df5fdd in __GI_exit (status=0) at exit.c:100
#9  0xb7ddc4db in __libc_start_main (main=0x8058f90 main, argc=2, 
ubp_av=0xb1b4, init=0x8156960 __libc_csu_init, fini=0x81569d0 
__libc_csu_fini, rtld_fini=0xb7fed280 _dl_fini, stack_end=0xb1ac)
at libc-start.c:258
#10 0x08058fd1 in _start ()

Source of _IO_flush_all_lockp() in the GNU libc:

http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/genops.c;h=390d8d24b5fb04f6a35b8fec27e700b9a9d641d4;hb=HEAD#l816

So the glibc maintains a list of open FILE* files: _IO_list_all, which is 
protected by list_all_lock lock.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-29 Thread Charles-François Natali

Charles-François Natali added the comment:

 When you say Python 2, I assume you mean CPython 2, right?
 Because - AFAICT - files got flushed only by accident, not by design.

 It looks to be a feature of the standard C library, at least the GNU libc. 
 Its libio library installs an exit handler flushing all open files. You can 
 see it if you set a breaking on write() using gdb:

Yes, it's guaranteed by POSIX/ANSI (see man exit).
I was refering to the fact that the automatic flushing of files upon
exit is a mere side effect of the implementation based atop stdio
stream in cpython 2. It's no guaranteed by any Python spec (and I
can't really think of any platform other than C that makes such
guarantee).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-29 Thread STINNER Victor

STINNER Victor added the comment:

 It looks to be a feature of the standard C library, at least the GNU libc.
 Yes, it's guaranteed by POSIX/ANSI (see man exit).

Hum, POSIX (2004) is not so strict:
Whether open streams are flushed or closed, or temporary files are
removed is implementation-defined.
http://pubs.opengroup.org/onlinepubs/009695399/functions/exit.html

2013/4/29 Charles-François Natali rep...@bugs.python.org:

 Charles-François Natali added the comment:

 When you say Python 2, I assume you mean CPython 2, right?
 Because - AFAICT - files got flushed only by accident, not by design.

 It looks to be a feature of the standard C library, at least the GNU libc. 
 Its libio library installs an exit handler flushing all open files. You can 
 see it if you set a breaking on write() using gdb:

 Yes, it's guaranteed by POSIX/ANSI (see man exit).
 I was refering to the fact that the automatic flushing of files upon
 exit is a mere side effect of the implementation based atop stdio
 stream in cpython 2. It's no guaranteed by any Python spec (and I
 can't really think of any platform other than C that makes such
 guarantee).

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue17852
 ___

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-28 Thread Charles-François Natali

Charles-François Natali added the comment:

 It used to be a consistently reliable behavior in Python 2 (and we made it so 
 in PyPy too), provided of course that the process exits normally; but it no 
 longer is in Python 3.  Well I can see the reasons for not flushing files, if 
 it's clearly documented somewhere as a change of behavior from Python 2.

When you say Python 2, I assume you mean CPython 2, right?
Because - AFAICT - files got flushed only by accident, not by design.
For example, I suspect that Jython doesn't flush files on exit (since
the JVM doesn't), and I guess IronPython neither.

 However I'm complaining about the current behavior: files are flushed *most 
 of the time*.

That's the problem with implementation-defined behavior ;-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-27 Thread Armin Rigo

New submission from Armin Rigo:

In Python 2, a buffered file opened for writing is flushed by the C library 
when the process exit.  In Python 3, the _pyio and _io modules don't do it 
reliably.  They rely on __del__ being called, which is not neccesarily the 
case.  The attached example ends with the file 'foo' empty (tested with Python 
3.3 and the current trunk).  The same example in Python 2 using the built-in 
open() cannot end with 'foo' empty.

--
components: IO
files: y.py
messages: 187890
nosy: arigo
priority: normal
severity: normal
status: open
title: Built-in module _io can loose data from buffered files at exit
type: behavior
versions: Python 3.3, Python 3.4
Added file: http://bugs.python.org/file30030/y.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

There are actually several issues here:

* collection of globals at shutdown is wonky: you should add an explicit del 
a,f; gc.collect() at the end of the script

* order of tp_clear calls, or another issue with TextIOWrapper: if you open the 
file in binary mode (f = open('...', 'wb'); f.write(b'bar')), the data gets 
flushed during the GC run

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-27 Thread Charles-François Natali

Charles-François Natali added the comment:

Code relying on garbage collection/shutdown hook to flush files is borked:
- it's never been guaranteed by the Python specification (neither does 
Java/C#...)
- even with an implementation based on C stdio streams, streams won't get 
flushed in case of _exit()/abort()/asynchronous signal

Furthermore, I think it's inherently unsafe: in a multi-threaded program, 
flushing requires locking, and doing this at shutdown is a recipe for 
deadlock/arbitrary delay (image you have a daemon thread currently holding the 
lock, e.g. for stdout).

In short, that's an application bug to me.

--
nosy: +neologix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17852] Built-in module _io can loose data from buffered files at exit

2013-04-27 Thread Armin Rigo

Armin Rigo added the comment:

It used to be a consistently reliable behavior in Python 2 (and we made it so 
in PyPy too), provided of course that the process exits normally; but it no 
longer is in Python 3.  Well I can see the reasons for not flushing files, if 
it's clearly documented somewhere as a change of behavior from Python 2.

However I'm complaining about the current behavior: files are flushed *most of 
the time*.  That's a behavior that is clearly misleading, or so I would think.  
I'm rather sure that there are many small scripts and large programs out there 
relying on automatic flushing, and then one day they'll hit a case where the 
file is not flushed and get the worst kind of error: a file unexpectedly 
truncated at 99% of its length, in a way that cannot be reproduced by small 
examples.

Feel free to close anyway as not-a-bug; I won't fight the Python 3 behavior, 
because Python 2 works as expected.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17852
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com