[lxml] Re: Performance issues when using element.clear() in Python 3.x

2025-03-03 Thread Noorulamry Daud via lxml - The Python XML Toolkit
I see. Then what I'm wondering is, for the "clearing the elements" process, is there a difference in what Python 2.7 did that is different than Python 3.8+? Because we have the same code but vastly different execution times when we run it through the different versions. And we narrowed it down to

[lxml] Re: Performance issues when using element.clear() in Python 3.x

2025-03-03 Thread Xavier Morel via lxml - The Python XML Toolkit
You're clearing the subelements, the attributes, the text, and the tail https://github.com/lxml/lxml/blob/0eb4f0029497957e58a9f15280b3529bdb18d117/src/lxml/etree.pyx#L1008-L1038 By default sys.getsizeof only measures the "intrinsic" size of an object, it does not traverse pointers unless the ob

[lxml] Re: Performance issues when using element.clear() in Python 3.x

2025-03-02 Thread Noorulamry Daud via lxml - The Python XML Toolkit
Hi everyone, Despite the brief respite, the issue my team is having with the element.clear() persists. I honestly have no idea why lxml 2.2.3 can do it instantly while the latest version took ages. I do wonder about something though; I used sys.getsizeof to see the size of the elements before and

[lxml] Re: Performance issues when using element.clear() in Python 3.x

2025-02-14 Thread Charlie Clark
On 14 Feb 2025, at 11:12, Stefan Behnel via lxml - The Python XML Toolkit wrote: Then you're not cleaning up enough of the XML tree. Some of it remains in memory after processing it, and thus leads to swapping and long waiting times. It's definitely a memory issue. You can write some code to

[lxml] Re: Performance issues when using element.clear() in Python 3.x

2025-02-14 Thread Stefan Behnel via lxml - The Python XML Toolkit
Hi, Noorulamry Daud schrieb am 14.02.25 um 09:56: Are you using the same versions of lxml (and libxml2) in both? No, and that's what makes it so frustrating. I cannot tell management that using the latest version of Python and lxml actually causes a significant performance penalty. By rights

[lxml] Re: Performance issues when using element.clear() in Python 3.x

2025-02-14 Thread Noorulamry Daud via lxml - The Python XML Toolkit
Hi everyone, Thank you for your replies. > I guess this is not a look-alike example but just meant as a hint, right? Yes. My work is very protective about the source code, so I am only allowed to sketch out the rough approximation. The start events are involved during some of the processing, an

[lxml] Re: Performance issues when using element.clear() in Python 3.x

2025-02-13 Thread Charlie Clark
On 13 Feb 2025, at 15:18, Stefan Behnel via lxml - The Python XML Toolkit wrote: > Are you using the same versions of lxml (and libxml2) in both? > > There shouldn't be a difference in behaviour, except for the obvious language > differences (bytes/unicode). Based on the parsing code we use in

[lxml] Re: Performance issues when using element.clear() in Python 3.x

2025-02-13 Thread Stefan Behnel via lxml - The Python XML Toolkit
Hi, Noorulamry Daud schrieb am 13.02.25 um 12:28: I've been cracking my head about this performance issue I'm having and I could use some help. At my work we have to parse extremely large XML files - 20GB and even larger. The basic algorithm is as follows: with open(file, "rb") as reader: