I see.
Then what I'm wondering is, for the "clearing the elements" process, is
there a difference in what Python 2.7 did that is different than Python
3.8+? Because we have the same code but vastly different execution times
when we run it through the different versions. And we narrowed it down to
You're clearing the subelements, the attributes, the text, and the tail
https://github.com/lxml/lxml/blob/0eb4f0029497957e58a9f15280b3529bdb18d117/src/lxml/etree.pyx#L1008-L1038
By default sys.getsizeof only measures the "intrinsic" size of an
object, it does not traverse pointers unless the ob
Hi everyone,
Despite the brief respite, the issue my team is having with the
element.clear() persists. I honestly have no idea why lxml 2.2.3 can do it
instantly while the latest version took ages.
I do wonder about something though; I used sys.getsizeof to see the size of
the elements before and
On 14 Feb 2025, at 11:12, Stefan Behnel via lxml - The Python XML
Toolkit wrote:
Then you're not cleaning up enough of the XML tree. Some of it remains
in memory after processing it, and thus leads to swapping and long
waiting times.
It's definitely a memory issue. You can write some code to
Hi,
Noorulamry Daud schrieb am 14.02.25 um 09:56:
Are you using the same versions of lxml (and libxml2) in both?
No, and that's what makes it so frustrating. I cannot tell management that
using the latest version of Python and lxml actually causes a significant
performance penalty. By rights
Hi everyone,
Thank you for your replies.
> I guess this is not a look-alike example but just meant as a hint, right?
Yes. My work is very protective about the source code, so I am only allowed
to sketch out the rough approximation.
The start events are involved during some of the processing, an
On 13 Feb 2025, at 15:18, Stefan Behnel via lxml - The Python XML Toolkit wrote:
> Are you using the same versions of lxml (and libxml2) in both?
>
> There shouldn't be a difference in behaviour, except for the obvious language
> differences (bytes/unicode).
Based on the parsing code we use in
Hi,
Noorulamry Daud schrieb am 13.02.25 um 12:28:
I've been cracking my head about this performance issue I'm having and I
could use some help.
At my work we have to parse extremely large XML files - 20GB and even
larger. The basic algorithm is as follows:
with open(file, "rb") as reader: