[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> If you use directly the parser class and pass an open file object, in that 
> case, yes, my change closes the file.

Or if the high-level *function* parse() is called with an open file object.

I don't know whether this is desirable change, but this change can break 
third-party code, therefore it should be documented in Misc/NEWS.

And it is worth to add a check in test_parse_bytes that the passed file object 
is closed after error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:

> This changes the behavior if pass a file object to the parser. When the 
> parser failed it was possible to use the passed file object (for example call 
> tell()). Now it is closed.

If the high-level *function* parse() is called with a filename, the caller 
doesn't have access to the file object nor the parser.

If you use directly the parser class and pass an open file object, in that 
case, yes, my change closes the file.

If you want to keep the old behaviour for that case, we can change the code to 
only close the source if source is not a string (not isinstance(source, str)) 
in the ExpatParser.parse() method. Since the caller owns the file object, (s)he 
is responsible to close it.

Note: on parser success, the source is always closed, even if the user pass an 
already open file object.

I'm not convince that using the file object on parser error and using directly 
the ExpatParser class is a common use case. I expect that the XML parser reads 
more data than it needs (read ahead for speed), so I don't think that 
file.tell() will you exactly the file position where the XML parser failed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This changes the behavior if pass a file object to the parser. When the parser 
failed it was possible to use the passed file object (for example call tell()). 
Now it is closed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:

I applied my fix to 2.7, 3.5, 3.6 and master (3.7). Thanks for the helpful 
reviews Serhiy ;-)

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:


New changeset d81f9e24ea89c0aaded1e0d3f8d8076bbd58c19a by Victor Stinner in 
branch '2.7':
bpo-30264: ExpatParser now closes the source (#1476)
https://github.com/python/cpython/commit/d81f9e24ea89c0aaded1e0d3f8d8076bbd58c19a


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:


New changeset 0c9aa6ffd318c04ce23997b4704477d4a4d82829 by Victor Stinner in 
branch '3.5':
bpo-30264: ExpatParser closes the source on error (#1451) (#1475)
https://github.com/python/cpython/commit/0c9aa6ffd318c04ce23997b4704477d4a4d82829


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:


New changeset 0fe870f3f95ba883b2b06bc0d814bdab8d53df98 by Victor Stinner in 
branch '3.6':
bpo-30264: ExpatParser closes the source on error (#1451) (#1474)
https://github.com/python/cpython/commit/0fe870f3f95ba883b2b06bc0d814bdab8d53df98


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:

Oh, the fix for Python 2.7 is different. In fact, the file object was never 
explicitly closed by the parser. It is now always closed by the parser, on 
success or error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +1575

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +1574

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +1573

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:


New changeset ef9c0e732fc50aefbdd7c5a80e04e14b31684e66 by Victor Stinner in 
branch 'master':
bpo-30264: ExpatParser closes the source on error (#1451)
https://github.com/python/cpython/commit/ef9c0e732fc50aefbdd7c5a80e04e14b31684e66


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:

> I made one design choice: the close() method of byte and character streams 
> are called twice, I didn't call setByteStream(None) / 
> setCharacterStream(None) to avoid creating an inconsistent source. If you 
> really care, maybe we can reset the source to a new xmlreader.InputSource() 
> object instead.

Hum, I just modified my PR one more time to replace try/finally: 
self._close_source() with "try/except: self._close_source(); raise", so 
file.close() is now only called once.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-05 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy: "As you said, calling close() can have side effects (for example 
invoking self._cont_handler.endDocument()). This was the argument against PR 
1444. It seems to me that if _entity_stack is not empty (this happens in case 
of error in entity parsing) the close() method does nothing. And maybe there 
are other leaks in entity parsing."

Right, it seems that the conclusion is that there is not only a high risk of 
regression because of unexpected side effects, but also an issue with unknown 
external parsers.

So finally I moved backward and proposed my change on ExpatParser.parse():

* the change has a narrow scope: only ExpatParser.parse() is modified, external 
unknown code is not impact, low risk of regression for external parsers.

* the change is well defined: it does exactly one thing, it makes sure that the 
source is closed, especially the inner file object or urllib object. My change 
makes sure that the sure is always closed, even if the close() method does 
nothing.

According to the long list of constraints for this fix, I don't think that we 
can do better.

I made one design choice: the close() method of byte and character streams are 
called twice, I didn't call setByteStream(None) / setCharacterStream(None) to 
avoid creating an inconsistent source. If you really care, maybe we can reset 
the source to a new xmlreader.InputSource() object instead.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

As you said, calling close() can have side effects (for example invoking 
self._cont_handler.endDocument()). This was the argument against PR 1444.

It seems to me that if _entity_stack is not empty (this happens in case of 
error in entity parsing) the close() method does nothing. And maybe there are 
other leaks in entity parsing.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-04 Thread STINNER Victor

STINNER Victor added the comment:

> What if the third-party parser don't use prepare_input_source()? It can use 
> more efficient way if pass just a file name.

About "third-party parsers": I have no idea of what are these parsers. It seems 
like Jython uses provides a parser. But I'm not interested to test Jython, 
sorry!

> Wouldn't be better to move your code into the parser's method parse()? If a 
> file is opened inside the parse() method and is not exposed outside, that 
> method should close it.

I wrote a first patch putting the try/except into expatparser code, but then I 
found a second parser with a parse() method: IncrementalParser. In fact, the 
expat parser inherits from IncrementalParser. So I only modified 
IncrementalParser, and IncrementalParser calls the abstract method close(). So 
any parser implemented on top of IncrementalParser should get the fix for free.

My 3rd attempt (patch IncrementalParser.parse()) changes less code and IMHO is 
less error-prone.

Now, *in practice*, only the expat parser is used in CPython, and according to 
unit tests, my fix closes the file, so the bug is fixed!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What if the third-party parser don't use prepare_input_source()? It can use 
more efficient way if pass just a file name.

Wouldn't be better to move your code into the parser's method parse()? If a 
file is opened inside the parse() method and is not exposed outside, that 
method should close it.

--
components: +Library (Lib) -Tests
nosy: +eli.bendersky, scoder, serhiy.storchaka
stage:  -> patch review
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-04 Thread STINNER Victor

STINNER Victor added the comment:

I wrote a different change which doesn't use the parser anymore, since it seems 
like xml.sax allows to plug third-party parsers. My new PR 1451 closes directly 
the file or urllib object, instead of touching the parser (which can have more 
visible side effect).

@Serhiy: On my first PR 1444, you wrote:
"Add a Misc/NEWS entry. This change changes behavior. It can change visible 
exception."

Do you consider that a NEWS entry is needed for my new PR 1451? I plan to 
backport the change to 2.7, 3.5 and 3.6.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-04 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +1549

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-03 Thread STINNER Victor

STINNER Victor added the comment:

> The problem is that os.unlink(TESTFN) ignores all OSError: os.unlink(TESTFN) 
> fails because there is still an open file object somewhere.

I created the issue #30265: test.support.unlink() should fail or emit a warning 
on WindowsError: [Error 32] The process cannot access the file 

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-03 Thread STINNER Victor

STINNER Victor added the comment:

> The question is if all parser provide a close() method?

It seems like xml.sax supports pluggable parsers, at least using the 
PY_SAX_PARSER environment variable. So it would be safer to add a cheap "if 
hasattr(parser, 'close'):" test;

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-03 Thread STINNER Victor

STINNER Victor added the comment:

https://github.com/python/cpython/pull/1444 is written for the master branch. 
IMHO the bug must be fixed in all supported branches.

The question is if all parser provide a close() method?

--
versions: +Python 3.5, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-03 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +1542

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30264] [Windows] test_sax: Warning -- files was modified by test_sax

2017-05-03 Thread STINNER Victor

New submission from STINNER Victor:

Running test_sax of Python 2.7 on Windows emits the following warning:

Warning -- files was modified by test_sax

The problem is that os.unlink(TESTFN) ignores all OSError: os.unlink(TESTFN) 
fails because there is still an open file object somewhere.

The bug is in the test_parse_bytes() of test_sax, on check_parse(TESTFN) which 
raises an exception as expected.

xml.sax.parse() should close the parser on exception.


On master, test_sax explicitly expects a ResourceWarning, WTF?

with support.check_warnings(('unclosed file', ResourceWarning)):
   
# XXX Failed parser leaks an opened file.   
   
with self.assertRaises(SAXException):   
   
self.check_parse(TESTFN)
   
# Collect leaked file.  
   
gc.collect()


See also issue #15388.

--
components: Tests, XML
messages: 292946
nosy: haypo
priority: normal
severity: normal
status: open
title: [Windows] test_sax: Warning -- files was modified by test_sax
versions: Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com