[issue20826] Faster implementation to collapse consecutive ip-networks

2014-05-15 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 8867874a2b7d by Antoine Pitrou in branch 'default':
Issue #20826: Optimize ipaddress.collapse_addresses().
http://hg.python.org/cpython/rev/8867874a2b7d

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-05-15 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I've now committed this. exhuma, if you have any further observations or 
results, don't hesitate to post them!

--
resolution:  - fixed
stage: patch review - commit review
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-05-12 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is a much faster patch, around 30x faster than the original code.

With exhuma's data set and tester.py, the original code gives:

$ time python3.4 tester.py 
Execution time: 5.949284339199949 seconds

real0m30.152s
user0m30.104s
sys 0m0.016s

The patched code gives:

$ time ./python tester.py 
Execution time: 0.25444041779992405 seconds

real0m1.695s
user0m1.681s
sys 0m0.012s


exhuma, perhaps you want to test with other data sets?

--
Added file: http://bugs.python.org/file35228/faster_collapse.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-05-12 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Uh, those measurements are wrong, sorry, since tester.py doesn't consume the 
iterator. When consuming the iterator, the patch is ~ 4x faster than the 
original code, which is more reasonable :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-05-12 Thread Antoine Pitrou

Antoine Pitrou added the comment:

It seems like the speed of Network.supernet() is a bottleneck here. issue16531 
would probably allow making supernet() much faster.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-05-12 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Updated patch, a bit faster yet. After issue16531 is committed, it is now ~15x 
faster than 3.4.

--
Added file: http://bugs.python.org/file35233/faster_collapse2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-05-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

From a quick look, the algorithm is indeed correct, and it should perform 
better on average than the previous one.

Note: both algorithms are O(n**2) worst case, not O(n).

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-05-09 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-03-23 Thread Michel Albert

Michel Albert added the comment:

Sorry for the late reply. I wanted to take some time and give a more detailed 
explanation. At least to the best of my abilities :)

I attached a zip-file with my quick-and-dirty test-rig. The zip contains:

 * gendata.py -- The script I used to generate test-data
 * testdata.lst -- My test-data set (for reproducability)
 * tester.py -- A simple script using ``timeit.timeit``.

I am not sure how sensitive the data is I am working with, so I prefer not to 
put any of the real data on a public forum. Instead, I wrote a small script 
which generates a data-set which makes the performance difference visible 
(``gendata.py``). The data which I processed actually created an even worse 
case, but it's difficult to reproduce. In my case, the data-set I used is in 
the file named ``testdata.lst``.

I then ran the operation 5 times using ``timeit`` (tester.py).

Let me also outline an explanation to what it happening:

It is possible, that through one merge operation, a second may become 
possible. For the sake of readability, let's use IPv4 addresses, and consider 
the following list:

[a_1, a_2, ..., a_n, 192.168.1.0/31, 192.168.1.2/32, 192.168.1.3/32, b_1, 
b_2, ..., b_n]

This can be reduced to

[a_1, a_2, ..., a_n, 192.168.1.0/31, 192.168.1.2/31, b_1, b_2, ..., b_n]

Which in turn can then be reduced to:

[a_1, a_2, ..., a_n, 192.168.1.0/30, b_1, b_2, ..., b_n]

The current implementation, sets a boolean (``optimized``) to ``True`` if any 
merge has been performed. If yes, it re-runs through the whole list until no 
optimisation is done. Those re-runs also include [a1..an] and [b1..bn], which 
is unnecessary. With the given data-set, this gives the following result:

Execution time: 48.27790632040014 seconds
./python tester.py  244.29s user 0.06s system 99% cpu 4:04.51 total

With the shift/reduce approach, only as many nodes are visited as necessary. If 
a reduce is made, it backtracks as much as possible, but not further. So in 
the above example, nodes [a1..an] will only be visited once, and [b1..bn] will 
only be visited once the complete optimisation of the example addresses has 
been performed. With the given data-set, this gives the following result:

Execution time: 20.298685277199912 seconds
./python tester.py  104.20s user 0.14s system 99% cpu 1:44.58 total

If my thoughts are correct, both implementations should have a similar 
best-case, but the worst-case differs significantly. I am not well-versed 
with the Big-O notation, especially the best/average/worst case difference. 
Neither am I good at math. But I believe both are strictly speaking O(n). But 
more precisely, O(k*n) where k is proportional the number of reconciliation 
steps needed (that is, if one merger makes another merger possible). But it is 
much smaller in the shift/reduce approach as only as many elements need to be 
revisited as necessary, instead of all of them.

--
Added file: http://bugs.python.org/file34583/testrig.zip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-03-11 Thread pmoody

pmoody added the comment:

Hey Exhuma, thanks for the patch.

Can you give me an example list on which this shift reduce approach works much 
better?

--
assignee:  - pmoody

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20826] Faster implementation to collapse consecutive ip-networks

2014-03-02 Thread Michel Albert

New submission from Michel Albert:

This alternative implementation runs over the ``addresses`` collection only 
once, and backtracks only if necessary. Inspired by a shift-reduce approach.

Technically both are O(n), so the best case is always the same. But the old 
implementation runs over the *complete* list multiple times until it cannot 
make any more optimisations. The new implementation only repeats the 
optimisation on elements which require reconciliation.

Tests on a local machine have shown a considerable increase in speed on large 
collections of elements (iirc about twice as fast on average).

--
components: Library (Lib)
files: faster-collapse-addresses.patch
keywords: patch
messages: 212553
nosy: exhuma, ncoghlan, pmoody
priority: normal
severity: normal
status: open
title: Faster implementation to collapse consecutive ip-networks
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file34267/faster-collapse-addresses.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20826
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com