[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2019-04-27 Thread STINNER Victor


STINNER Victor  added the comment:

The 3.6 branch no longer accept bugfixes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2019-04-27 Thread Stefan Behnel


Stefan Behnel  added the comment:

This ticket looks like it's done for 3.7/8. Can it be closed?
I guess 3.6 isn't relevant anymore, right?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2019-04-27 Thread Stefan Behnel


Change by Stefan Behnel :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
versions:  -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-18 Thread miss-islington


miss-islington  added the comment:


New changeset 60c919b58bd3cf8730947a00ddc6a527d6922ff1 by Miss Islington (bot) 
in branch '3.7':
bpo-35502: Fix reference leaks in ElementTree.TreeBuilder. (GH-11170)
https://github.com/python/cpython/commit/60c919b58bd3cf8730947a00ddc6a527d6922ff1


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-18 Thread miss-islington


Change by miss-islington :


--
pull_requests: +10453

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-18 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset d2a75c67830d7c9f59e4e9b60f36974234c829ef by Serhiy Storchaka in 
branch 'master':
bpo-35502: Fix reference leaks in ElementTree.TreeBuilder. (GH-11170)
https://github.com/python/cpython/commit/d2a75c67830d7c9f59e4e9b60f36974234c829ef


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-18 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

It is not easy to avoid reference cycles if use a generator function. And 
generator function is much faster than an implementation as a class with the 
__next__ method. We need to access the iterator object from the code of the 
generator function, and this creates a cycle.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-18 Thread STINNER Victor


STINNER Victor  added the comment:

Oops, my PR 11169 used the wrong issue number: bpo-35257 instead of bpo-35502. 
Anyway, I closed it, the change is too complex.

--

IMHO the root issue is the handling of the SyntaxError exception in 
XMLPullParser.feed(). I wrote a fix, but I don't have the bandwidth to write an 
unit test checking that the reference cycle is broken.

commit 9f3354d36a89d7898bdb631e5119cc37e9a74840 (fix_etree_leak)
Author: Victor Stinner 
Date:   Fri Dec 14 22:03:16 2018 +0100

bpo-35257: Fix memory leak in XMLPullParser.feed()

Fix memory leak in XMLPullParser.feed() of xml.etree: on syntax
error, clear the traceback to break a reference cycle.

diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py
index c1cf483cf5..f17c52541b 100644
--- a/Lib/xml/etree/ElementTree.py
+++ b/Lib/xml/etree/ElementTree.py
@@ -1266,6 +1266,8 @@ class XMLPullParser:
 try:
 self._parser.feed(data)
 except SyntaxError as exc:
+# bpo-35502: Break reference cycle
+#exc.__traceback__ = None
 self._events_queue.append(exc)
 
 def _close_and_return_root(self):


I don't see any behavior difference in XMLPullParser.read_events() which raise 
again the exception:

events = self._events_queue
while events:
event = events.popleft()
if isinstance(event, Exception):
raise event
else:
yield event

--

PR 11170 is also a nice enhancement (fix treebuilder_gc_traverse()), but maybe 
we should also prevent creating reference cycles in the first place?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread Jess Johnson


Change by Jess Johnson :


--
versions: +Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +10408

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

The problem was with detecting a reference cycle containing a TreeBuilder.

--
nosy: +eli.bendersky, scoder
versions: +Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +10407
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
assignee:  -> serhiy.storchaka
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread STINNER Victor


STINNER Victor  added the comment:

Oops, there was a typo, you should read kB:

1 calls: 15.3 kB / call (total: 15.3 kB)
100 calls: 15.3 kB / call (total: 1527.7 kB)
1000 calls: 15.3 kB / call (total: 15265.0 kB)

--
Added file: https://bugs.python.org/file47999/run2.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread STINNER Victor


Change by STINNER Victor :


Removed file: https://bugs.python.org/file47998/run.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread STINNER Victor


STINNER Victor  added the comment:

I wrote attached run.py which confirms a leak using tracemalloc:

$ python3 run.py 
1 calls: 15.3B / call (total: 15.3 kB)
100 calls: 15.3B / call (total: 1527.7 kB)
1000 calls: 15.3B / call (total: 15265.0 kB)

--
nosy: +vstinner
Added file: https://bugs.python.org/file47998/run.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35502] Memory leak in xml.etree.ElementTree.iterparse

2018-12-14 Thread Jess Johnson


New submission from Jess Johnson :

When given xml that that would raise a ParseError, but parsing is stopped 
before the ParseError is raised, xml.etree.ElementTree.iterparse leaks memory.

Example:


import gc
from io import StringIO
import xml.etree.ElementTree as etree

import objgraph


def parse_xml():
xml = """
  
  

"""
parser = etree.iterparse(StringIO(initial_value=xml))
for _, elem in parser:
if elem.tag == 'LEVEL1':
break


def run():
parse_xml()

gc.collect()
uncollected_elems = objgraph.by_type('Element')
print(uncollected_elems)
objgraph.show_backrefs(uncollected_elems, max_depth=15)


if __name__ == "__main__":
run()


Output:
[]

Also see this gist which has an image showing the objects that are retained in 
memory: https://gist.github.com/grokcode/f89d5c5f1831c6bc373be6494f843de3

--
components: XML
messages: 331861
nosy: jess.j
priority: normal
severity: normal
status: open
title: Memory leak in xml.etree.ElementTree.iterparse
type: resource usage
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com