[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-18 Thread Robert Haschke

Robert Haschke added the comment:

Thank you very much for further improving the code. As I understand it, the 
trick is to use temporary variables to minimize access time. Learned something 
new.

I adapted your patch to python 2.7 again. Now, in python3, the new code is even 
faster than the old one (sometimes) for front insertions:

36 time for 1 insertions at front: 0.117563
opt36 time for 1 insertions at front: 0.116014

36 time for 1 insertions at front: 0.114282
opt36 time for 1 insertions at front: 0.116710

old time for 5000 insertions at front: 0.044055
new time for 5000 insertions at front: 0.075433
opt27 time for 5000 insertions at front: 0.052135

old time for 5000 insertions at back: 15.241450
new time for 5000 insertions at back: 0.071004
opt27 time for 5000 insertions at back: 0.046850

I hope you can consider, the patch for python 2.7. There the performance gain 
is most significant.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

There are different tricks that allow to minimize an overhead. Here is tuned 
patch.

But I still not sure that this special case is worth to be optimized.

--
Added file: http://bugs.python.org/file43445/minidom.insertBefore2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-17 Thread Robert Haschke

Robert Haschke added the comment:

I don't see how to further minimize the checks - all of them are required. I 
think, the most common uses cases to create a document are appendChild(), which 
is not affected, and insertBefore() using the same refChild for a while. In 
that case, the patch gives a tremendous speedup.

If you as maintainers don't want to share this improvement with the community, 
it's your choice and I will be fine with that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This slow down is not such small -- up to 25% for every insertBefore(). If most 
calls of insertBefore() are not with the same refChild, the benefit from the 
optimization of one special case can be dwarfed. Try to minimize the overhead 
of the optimization. If you succeed the chances of acceptance of the patch will 
increase.

I think that the availability of alternatives (upgrading to Python 3 or using 
ElementTree) plays against the acception of this optimization. Since it looks 
more as new feature to me, you have to convince the maintainer of 2.7 to take 
the patch in 2.7.

--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-17 Thread Robert Haschke

Robert Haschke added the comment:

Indeed there is a small slow down for insertion at the beginning.
However, this is simply due to the extra function _index() and thus linear in 
the number of insertion operations.

My patch essentially boosts insertions before /any fixed/ node.
If this reference node changes between insertions (as in your "before first" 
example), there is no gain anymore.

Of course, this optimization comes at the cost of an additional integer per 
node. There is no free lunch!

I know, that there are other parsers (e.g. etree) available. However
changing my existing code base from minidom to etree will be a heavy change, 
which isn't easily accepted as well.

I think, my minidom patch is a clean and simple fix to a common performance 
issue. As it mostly effects 2.7, it should primarily go there ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


Added file: http://bugs.python.org/file43439/etree_example.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you for your example Robert.

If modify your example for inserting new nodes before the first one, it shows 
the slowdown with your patch.

$ ./python minidom_example2.py old new
oldtime for 5000 iterations: 0.189058
newtime for 5000 iterations: 0.254402

The question is whether your case is enough common to compensate the slowdown 
of other cases.

Yest one disadvantage of your patch is increasing memory consumption (this can 
be partly compensated by adding '_index_cache' to slots).

Have you considered the option of using Python 3? In Python 3 your example is 
much faster even without your patch (but still has quadratic complexity).

Python 2.7:
oldtime for 5000 iterations: 68.485284
newtime for 5000 iterations: 0.237943

Python 3.6:
oldtime for 5000 iterations: 0.695023
newtime for 5000 iterations: 0.212854

And the best option is using ElementTree. It accepts an index instead of a 
subelement for insertion.

$ ./python etree_example.py
Python 2.7:
time for 5000 iterations: 0.037805

Python 3.6:
time for 5000 iterations: 0.015566

--
Added file: http://bugs.python.org/file43438/minidom_example2.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-17 Thread Robert Haschke

Robert Haschke added the comment:

I uploaded a simple example to illustrate the tremendous performance boost. 
Obviously, the example exploits the caching improvement as much as possible: 
The code assembles a XML document by inserting new nodes before the last one...
These are the timing results:
$ python minidom_example.py old new
oldtime for 5000 iterations: 18.422152
newtime for 5000 iterations: 0.129384

$ python minidom_example.py old new
oldtime for 1 iterations: 68.305351
newtime for 1 iterations: 0.142071

You see the quadratic increase of time...
IMHO, this is clearly a (performance) bug and really many people in the ROS 
community are affected. Hence, I hope that this patch will find its way into 
some python versions currently used by standard Linux distributions.

--
Added file: http://bugs.python.org/file43436/minidom_example.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Could you please provide a simple example that benefits from this change?

Besides index() Node.insertBefore() uses insert() that has linear complexity 
too.

In any case this is a new feature that can go only in 3.6.

--
assignee:  -> serhiy.storchaka
stage:  -> patch review
versions: +Python 3.6 -Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-16 Thread Berker Peksag

Changes by Berker Peksag :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-16 Thread Robert Haschke

Robert Haschke added the comment:

Uploaded a "hg diff" against the recent 2.7 branch of the source repo.

--
Added file: http://bugs.python.org/file43414/minidom.insertBefore.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-16 Thread Berker Peksag

Berker Peksag added the comment:

Please attach the patch in unified diff format. See 
https://docs.python.org/devguide/patch.html for details.

--
nosy: +berker.peksag

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2016-06-16 Thread GuiHome

GuiHome added the comment:

kindly ping

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2015-11-05 Thread GuiHome

GuiHome added the comment:

We have been running this patch for several month now without any issue. 
Would be glad if a maintainer could review it and merge it upstream.

thanks

--
nosy: +guihome

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24424] xml.dom.minidom: performance issue with Node.insertBefore()

2015-06-10 Thread Robert Haschke

New submission from Robert Haschke:

Node.insertBefore() has a serious performance issue:

Using self.childNodes.index(refChild) it searches for the correct index in 
childNodes where the newChild should be inserted.
However, index() is linear in time w.r.t. the size of childNodes.
Hence, if there are many children, runtime dramatically increases.

Adding a simple caching mechanism (caching the previously used reference)
I was able to reduce runtime in my particular case from 16s to 1.6s, i.e. a 
factor of 10!

--
components: XML
files: minidom.insertBefore.patch
keywords: patch
messages: 245128
nosy: Robert Haschke
priority: normal
severity: normal
status: open
title: xml.dom.minidom: performance issue with Node.insertBefore()
type: performance
versions: Python 2.7
Added file: http://bugs.python.org/file39674/minidom.insertBefore.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24424
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com