Re: Execution speed question
On Mon, 28 Jul 2008 16:48:28 +0200, Suresh Pillai wrote: Okay, please consider this my one absolutely stupid post for the year. I'd like to pretend it never happened but unfortunately the web doesn't allow that. Having never used sets, I unfort read something that lead to it, but ... Okay, got some sleep and what I meant to ask, although equally basic, but not silly: For sets, I presume they are built on top of or like dicts, and there is nothing crazy in the low level implementation so that I can be guaranteed that if I don't alter the set, then the order, although arbitrary, will be maintained in successive iterations over the contents? -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
Suresh Pillai wrote: On Mon, 28 Jul 2008 16:48:28 +0200, Suresh Pillai wrote: Okay, please consider this my one absolutely stupid post for the year. I'd like to pretend it never happened but unfortunately the web doesn't allow that. Having never used sets, I unfort read something that lead to it, but ... Okay, got some sleep and what I meant to ask, although equally basic, but not silly: For sets, I presume they are built on top of or like dicts, and there is nothing crazy in the low level implementation so that I can be guaranteed that if I don't alter the set, then the order, although arbitrary, will be maintained in successive iterations over the contents? It is currently that way, but relying on it is certainly to be considered an implementation detail that might disappear without warning. Diez -- http://mail.python.org/mailman/listinfo/python-list
RE: Execution speed question
Diez B. Roggisch wrote: For sets, I presume they are built on top of or like dicts, and there is nothing crazy in the low level implementation so that I can be guaranteed that if I don't alter the set, then the order, although arbitrary, will be maintained in successive iterations over the contents? It is currently that way, but relying on it is certainly to be considered an implementation detail that might disappear without warning. Fortunately, that's incorrect - it is a documented feature that *so long as the set is not modified in any way*, iterating over the set will return the elements in the same order each time. Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Fri, 25 Jul 2008 08:08:57 -0700, Iain King wrote: On Jul 25, 3:39 pm, Suresh Pillai [EMAIL PROTECTED] wrote: That's a good comparison for the general question I posed. Thanks. Although I do believe lists are less than ideal here and a different data structure should be used. To be more specific to my case: As mentioned in my original post, I also have the specific condition that one does not know which nodes to turn ON until after all the probabilities are calculated (lets say we take the top m for example). In this case, the second and third will perform worse as the second one will require a remove from the list after the fact and the third will require another loop through the nodes to build the new list. So you need to loops through twice regardless? i.e. loop once to gather data on off nodes, do some calculation to work out what to turn on, then loop again to turn on the relevant nodes? If so, then I think the functions above remain the same, becoming the 2nd loop. Every iteration you do a first loop over the off_nodes (or them all for (1)) to gather the data on them, perform your calculation, and then perform one of the above functions (minus the setup code at the begining; basically starting at the 'for') as a second loop, with the goes_on function now returning a value based on the calculation (rather than the calculation itself as I had it). Performance should be similar. Iain If do I settle on an explicit loop to remove the nodes turned ON, then I realised this weekend that I could do this in the next iteration of the simulation (first loop above) and save some iteration overhead (the if checking will still be there of course). And thanks for pointing out that constructing a new list, for long lists, is faster than simple removal. It's obvious but I never really thought of it; good tip. -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Fri, 25 Jul 2008 05:46:56 -0700, Iain King wrote: or 3. build a new list every iteration intead of deleting from the old one: while processing: new_off_list = [] for x in off_list: if goes_on(x): on_list.append(x) else: new_off_list.append(x) off_list = new_off_list generation += 1 Iain Or 4, since the order of my nodes doesn't matter: swap the node to be deleted with the last node in the list and then remove the last node of the list. This is the fastest to date, if using native structures, for low number nodes being deleted per cycle (def if only deleting one). -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
Suresh Pillai: Or 4, since the order of my nodes doesn't matter: swap the node to be deleted with the last node in the list and then remove the last node of the list. This is the fastest to date, if using native structures, for low number nodes being deleted per cycle (def if only deleting one). Using Psyco this suggestion may lead to code as fast as it gets in Python :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Mon, 28 Jul 2008 10:44:18 +0200, Suresh Pillai wrote: Since I am doing A LOT of loops over the nodes and the number of nodes is also huge, my concern using sets is that in order to iterate over the set in each step of my simulation, the set items need to be converted to a list every time. So while removal from a set is much cheaper than say from a list, what about this conversion overhead in order to iterate over the items. I could of course use the old trick of using a dictionary with 'None' values and then using iterkeys(). But I thought sets were supposed to replace this. So maybe I should be asking a more basic question: is there any way to iterate over the items in a set other than converting to a list or using the pop() method. -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
Suresh Pillai [EMAIL PROTECTED] wrote: [ ... ] is there any way to iterate over the items in a set other than converting to a list or using the pop() method. Er, how about directly iterating over the set? -- \S -- [EMAIL PROTECTED] -- http://www.chaos.org.uk/~sion/ Frankly I have no feelings towards penguins one way or the other -- Arthur C. Clarke her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Mon, 28 Jul 2008 15:04:43 +0200, Suresh Pillai wrote: I could of course use the old trick of using a dictionary with 'None' values and then using iterkeys(). But I thought sets were supposed to replace this. So maybe I should be asking a more basic question: is there any way to iterate over the items in a set other than converting to a list or using the pop() method. Yes, just do it. for i in set([1,2,3]): ... print i ... 1 2 3 -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Mon, 28 Jul 2008 15:04:43 +0200, Suresh Pillai wrote: On Mon, 28 Jul 2008 10:44:18 +0200, Suresh Pillai wrote: Since I am doing A LOT of loops over the nodes and the number of nodes is also huge, my concern using sets is that in order to iterate over the set in each step of my simulation, the set items need to be converted to a list every time. So while removal from a set is much cheaper than say from a list, what about this conversion overhead in order to iterate over the items. I could of course use the old trick of using a dictionary with 'None' values and then using iterkeys(). But I thought sets were supposed to replace this. So maybe I should be asking a more basic question: is there any way to iterate over the items in a set other than converting to a list or using the pop() method. Okay, please consider this my one absolutely stupid post for the year. I'd like to pretend it never happened but unfortunately the web doesn't allow that. Having never used sets, I unfort read something that lead to it, but ... -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
The number of nodes is very large: millions for sure, maybe tens of millions. If considering (2), take note of my BOLD text above, which means I can't remove nodes as I iterate through them in the main loop. Since your use of 'node' is pretty vague and I don't have a good sense of what tests you are running and how long they would take, I'm only speculating, but a single loop might be the wrong way to go about that. If you are going to be frequently running tests and switching nodes on/off, have you considered a separate set of processes to do both? For example: A set of some number of tester threads, that loop through and test, recording thier results (somewhere). You could then have a separate loop that runs every so often, checks all the current test values, and runs through the nodes once, switching them on or off. I know it's not exactly what you asked, but depending on what your nodes are exactly, you can avoid a lot of other problems down the road. What if your single loop dies or gets hung on a test? With a separate approach, you'll have a lot more resilience too.. if there's some problem with a single tester or node, it won't keep the rest of the program from continuing to function. -- http://mail.python.org/mailman/listinfo/python-list
Execution speed question
I am performing simulations on networks (graphs). I have a question on speed of execution (assuming very ample memory for now). I simplify the details of my simulation below, as the question I ask applies more generally than my specific case. I would greatly appreciate general feedback in terms of computing and of course considerations specific to implementation in Python. The nodes in my network may be ON or OFF. The network starts off with all nodes in the OFF state. I loop through the nodes. For each node that is OFF, I consider some probability of it turning ON based on the states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING WHICH ONES TO TURN ON. So my question is whether it is faster to 1. loop through a list of ALL nodes and check for OFF nodes using ifs or to 2. loop through a container of OFF nodes and remove from this when they turn ON The second would result in looping through less nodes, especially as the simulation progresses, but how does the cost of removal compare with cheap ifs and would the extra memory usage affect performance. I an appreciate that the cost of the if check, the number of nodes, and the type of container I use will come into the answer. In my case, the ifs are cheap boolean queries (whether the node is ON or OFF). The number of nodes is very large: millions for sure, maybe tens of millions. If considering (2), take note of my BOLD text above, which means I can't remove nodes as I iterate through them in the main loop. I naturally started coding with (2), but couldn't decide on the best data structure for python. A set seemed ideal for speedy removal, but then I can't iterate through them with out popping. An ordered list? Some creative solution with numpy arrays? There is also the complication that since we are in interpreted python, what is theoretically the best data structure may not in reality be optimal unless it is a default system object or coded externally in a compiled module. Of course, I will start experimenting to see what the execution difference is, but I would appreciate some suggestions from others re which is best and also on best data structure for (2). I'm not a newbie, so you can get technical with me python-wise and algorithm wise. I realise it is a 'basic' question, but it is something that I have always wondered about (cheap ifs versus extra structure) and with the number of nodes I am considering, it actually becomes an issue. Many Thanks, Suresh -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Jul 25, 7:57 pm, Suresh Pillai [EMAIL PROTECTED] wrote: The nodes in my network may be ON or OFF. The network starts off with all nodes in the OFF state. I loop through the nodes. For each node that is OFF, I consider some probability of it turning ON based on the states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING WHICH ONES TO TURN ON. So my question is whether it is faster to 1. loop through a list of ALL nodes and check for OFF nodes using ifs I'd recommend using 'filter' and list comprehensions. import random class Node: ... def __init__(self): ... self.on = False ... def toggle(self): ... self.on = random.choice([True, False]) ... nodes = [Node() for i in range(0, 1)] for node in nodes: ... node.toggle() ... off_nodes = filter(lambda x: not x.on, nodes) len(off_nodes) 5050 -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
I'd recommend using 'filter' and list comprehensions. Look at using reduce(). You can collect information about all of the nodes without necessarily building a large, intermediate list in the process. You might get some ideas from here [http://en.wikipedia.org/wiki/ Antiobjects]. -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Jul 25, 10:57 am, Suresh Pillai [EMAIL PROTECTED] wrote: I am performing simulations on networks (graphs). I have a question on speed of execution (assuming very ample memory for now). I simplify the details of my simulation below, as the question I ask applies more generally than my specific case. I would greatly appreciate general feedback in terms of computing and of course considerations specific to implementation in Python. The nodes in my network may be ON or OFF. The network starts off with all nodes in the OFF state. I loop through the nodes. For each node that is OFF, I consider some probability of it turning ON based on the states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING WHICH ONES TO TURN ON. So my question is whether it is faster to 1. loop through a list of ALL nodes and check for OFF nodes using ifs or to 2. loop through a container of OFF nodes and remove from this when they turn ON or 3. build a new list every iteration intead of deleting from the old one: while processing: new_off_list = [] for x in off_list: if goes_on(x): on_list.append(x) else: new_off_list.append(x) off_list = new_off_list generation += 1 Iain -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Jul 25, 1:46 pm, Iain King [EMAIL PROTECTED] wrote: On Jul 25, 10:57 am, Suresh Pillai [EMAIL PROTECTED] wrote: I am performing simulations on networks (graphs). I have a question on speed of execution (assuming very ample memory for now). I simplify the details of my simulation below, as the question I ask applies more generally than my specific case. I would greatly appreciate general feedback in terms of computing and of course considerations specific to implementation in Python. The nodes in my network may be ON or OFF. The network starts off with all nodes in the OFF state. I loop through the nodes. For each node that is OFF, I consider some probability of it turning ON based on the states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING WHICH ONES TO TURN ON. So my question is whether it is faster to 1. loop through a list of ALL nodes and check for OFF nodes using ifs or to 2. loop through a container of OFF nodes and remove from this when they turn ON or 3. build a new list every iteration intead of deleting from the old one: while processing: new_off_list = [] for x in off_list: if goes_on(x): on_list.append(x) else: new_off_list.append(x) off_list = new_off_list generation += 1 Iain I was curious to what extent the different methods varied in time, so I checked it out. Here there are three procedures: test_every which matches your (1), destructive which matches your (2), and constructive which is (3) as I've outlined above. On varying the size of the dataset I get this (probability a node goes on = 50%): Length of initial list: 10 Test every: 1.16085492357 Destructive: 2.592310272 Constructive: 0.850312458886 Length of initial list: 20 Test every: 2.48013843287 Destructive: 9.20894689718 Constructive: 1.73562198439 Length of initial list: 40 Test every: 5.00652267447 Destructive: 44.9696004134 Constructive: 3.51687329373 Length of initial list: 80 Test every: 9.67657648655 Destructive: 220.57583941 Constructive: 7.06614485537 and changing the probability that a nodes goes on (dataset size = 20): Probability goes on: 1/2 Test every: 2.24765364513 Destructive: 9.28801971614 Constructive: 1.62770773816 Probability goes on: 1/4 Test every: 4.77387350904 Destructive: 13.4432467571 Constructive: 3.45467140006 Probability goes on: 1/8 Test every: 11.0514899721 Destructive: 18.4026878278 Constructive: 6.86778036177 Probability goes on: 1/16 Test every: 22.5896021593 Destructive: 25.7784044083 Constructive: 13.8631404605 Probability goes on: 1/32 Test every: 49.7667941179 Destructive: 39.3652502735 Constructive: 27.2527219598 Probability goes on: 1/64 Test every: 91.0523955153 Destructive: 65.7747103963 Constructive: 54.4087322936 Code: import random from timeit import Timer SIZE = 10 MAX = 2 def goes_on(x): global MAX return random.randint(1,MAX) == 1 def test_every(): global SIZE print Test every:, nodes = range(SIZE) is_on = [False for x in xrange(SIZE)] count = SIZE while count: for i,x in enumerate(nodes): if not is_on[i] and goes_on(x): is_on[i] = True count -= 1 def destructive(): global SIZE print Destructive:, off_list = range(SIZE) on_list = [] count = SIZE while count: for i in xrange(len(off_list)-1, -1, -1): x = off_list[i] if goes_on(x): on_list.append(x) del(off_list[i]) count -= 1 def constructive(): global SIZE print Constructive:, off_list = range(SIZE) on_list = [] count = SIZE while count: new_off_list = [] for x in off_list: if goes_on(x): on_list.append(x) count -= 1 else: new_off_list.append(x) off_list = new_off_list #SIZE = 20 while True: print Length of initial list:, SIZE #print Probability goes on: 1/%d % MAX print Timer(test_every(), from __main__ import test_every).timeit(1) print Timer(destructive(), from __main__ import destructive).timeit(1) print Timer(constructive(), from __main__ import constructive).timeit(1) print SIZE *= 2 #MAX *= 2 Conclusions: On size, (2) really doesn't like bigger datasets, taking exponentially longer as it increases, while (1) and (3) happily increase linearly. (3) is faster. On probability it's (1) who's the loser, while (2) and (3) are happy. (3) is once again faster. I think (2)'s poor performance is being amplified by how python handles lists and list deletions; the effect may be stymied in other languages, or by using other data constructs in python (like a dictionary or a user made list class). If you were short on memory then (2) would have an advantage, but as it is, (3) is the clear winner. I'm a fan of list
Re: Execution speed question
On Jul 25, 9:54 pm, Jeff [EMAIL PROTECTED] wrote: Look at using reduce(). You can collect information about all of the nodes without necessarily building a large, intermediate list in the process. From the OP's description, I assumed there'd be a list of all nodes, from which he wishes to derive a 2nd list of specific nodes. reduce() applies a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value, which doesn't seem to me to be what the OP was asking for. I could understand using map() across the filter'd list, or embedding the conditional check within the map'd function and ignoring filter(), but at no point does the OP ask to perform any kind of function based on two nodes... I may have misunderstand your point, though :) Could you provide a quick code sample to clarify? -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
That's a good comparison for the general question I posed. Thanks. Although I do believe lists are less than ideal here and a different data structure should be used. To be more specific to my case: As mentioned in my original post, I also have the specific condition that one does not know which nodes to turn ON until after all the probabilities are calculated (lets say we take the top m for example). In this case, the second and third will perform worse as the second one will require a remove from the list after the fact and the third will require another loop through the nodes to build the new list. -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
Iain King wrote: I think (2)'s poor performance is being amplified by how python handles lists and list deletions; the effect may be stymied in other languages Delete is O(n) (or O(n/2) on average, if you prefer), while append is amortized O(1). Unless I'm missing something, your example keeps going until it's flagged *all* nodes as on, which, obviously, kills performance for the first version as the probability goes down. The OP's question was about a single pass (but he did mention as the simulation progresses, so I guess it's fair to test a complete simulation.) Btw, if the nodes can be enumerated, I'd probably do something like: node_list = ... get list of nodes ... random.shuffle(node_list) start = 0 end = len(node_list) step = end / MAX while start end: for i in xrange(start, start + step): ... switch on node_list[i] ... ... do whatever you want to do after a step ... # prepare for next simulation step start += step step = max((len(node_list) - start) / MAX, 1) which is near O(n) overall, and mostly constant wrt. the probability for each pass (where the probability is 1:MAX). Might need some tuning; tweak as necessary. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Fri, 25 Jul 2008 16:51:42 +0200, Fredrik Lundh wrote: Unless I'm missing something, your example keeps going until it's flagged *all* nodes as on, which, obviously, kills performance for the first version as the probability goes down. The OP's question was about a single pass (but he did mention as the simulation progresses, so I guess it's fair to test a complete simulation.) I was referring to multiple passes as in Iain' test cases. Although not necessarily till all nodes are ON, let's say to to a large proportion at least. -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Jul 25, 3:39 pm, Suresh Pillai [EMAIL PROTECTED] wrote: That's a good comparison for the general question I posed. Thanks. Although I do believe lists are less than ideal here and a different data structure should be used. To be more specific to my case: As mentioned in my original post, I also have the specific condition that one does not know which nodes to turn ON until after all the probabilities are calculated (lets say we take the top m for example). In this case, the second and third will perform worse as the second one will require a remove from the list after the fact and the third will require another loop through the nodes to build the new list. So you need to loops through twice regardless? i.e. loop once to gather data on off nodes, do some calculation to work out what to turn on, then loop again to turn on the relevant nodes? If so, then I think the functions above remain the same, becoming the 2nd loop. Every iteration you do a first loop over the off_nodes (or them all for (1)) to gather the data on them, perform your calculation, and then perform one of the above functions (minus the setup code at the begining; basically starting at the 'for') as a second loop, with the goes_on function now returning a value based on the calculation (rather than the calculation itself as I had it). Performance should be similar. Iain -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
Suresh Pillai wrote: That's a good comparison for the general question I posed. Thanks. Although I do believe lists are less than ideal here and a different data structure should be used. To be more specific to my case: As mentioned in my original post, I also have the specific condition that one does not know which nodes to turn ON until after all the probabilities are calculated (lets say we take the top m for example). In this case, the second and third will perform worse as the second one will require a remove from the list after the fact and the third will require another loop through the nodes to build the new list. -- http://mail.python.org/mailman/listinfo/python-list It seems like the probability calculation applies to all three equally, and can therefore be ignored for the simulations. You said that your algorithm must be a two-stage process: (A) calculate the probabilities then (B) turn on some nodes. Iain's simulations assume (A) is already done. He just addressed the performance of (B). Part (A) is invariant for all his simulations, because your requirement forces it to be. As for different data structures, it largely depends on how you need to access the data. If you don't need to index the data, just loop through it, you might try a linked list. The performance hit in (2) is coming from the list del; using a linked list would make the removal constant rather than O(n), and may even execute faster than (3) as well. -Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
On Jul 25, 4:22 pm, Matthew Fitzgibbons [EMAIL PROTECTED] wrote: It seems like the probability calculation applies to all three equally, and can therefore be ignored for the simulations. The probability affects (1) more. My reasoning for this being: as probability gets lower the number of times you have to loop over the list increases. (1) always loops over the full list, but with each successive iteration (2) and (3) are looping over smaller and smaller lists. In the end this adds up, with (1) becoming slower than (2), even though it starts out quicker. Iain -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
2008/7/25 Suresh Pillai [EMAIL PROTECTED]: ... I naturally started coding with (2), but couldn't decide on the best data structure for python. A set seemed ideal for speedy removal, but then I can't iterate through them with out popping. An ordered list? Some creative solution with numpy arrays? ... Maybe I am missing something, but I'm not sure, where is the problem with non-destructive iterating over a set, e.g.: my_set = set([1,2,3,4,5]) my_set set([1, 2, 3, 4, 5]) for item in my_set: print item, ... 1 2 3 4 5 my_set set([1, 2, 3, 4, 5]) my_set -= set([2,4]) my_set set([1, 3, 5]) Of course, the suitability for your application would depend on other factors too. vbr -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
Iain King wrote: On Jul 25, 4:22 pm, Matthew Fitzgibbons [EMAIL PROTECTED] wrote: It seems like the probability calculation applies to all three equally, and can therefore be ignored for the simulations. The probability affects (1) more. My reasoning for this being: as probability gets lower the number of times you have to loop over the list increases. (1) always loops over the full list, but with each successive iteration (2) and (3) are looping over smaller and smaller lists. In the end this adds up, with (1) becoming slower than (2), even though it starts out quicker. Iain -- http://mail.python.org/mailman/listinfo/python-list I meant the _calculation_ of the probability affects all three equally, not the value itself. As your simulations show, different probabilities affect the algorithms differently; I'm talking about the algorithm to arrive at the probability value. -Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Execution speed question
Suresh Pillai wrote: I am performing simulations on networks (graphs). I have a question on speed of execution (assuming very ample memory for now). I simplify the details of my simulation below, as the question I ask applies more generally than my specific case. I would greatly appreciate general feedback in terms of computing and of course considerations specific to implementation in Python. The nodes in my network may be ON or OFF. The network starts off with all nodes in the OFF state. I loop through the nodes. For each node that is OFF, I consider some probability of it turning ON based on the states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING WHICH ONES TO TURN ON. If the nodes do not have to be processed in any particular order, then you could keep them either in a dict, with the value being either On or Off (True,False)(plus connection data) or a pair of sets, one for On and one for Off. The advantage of the dict is that the items would be fixed and only their values would change, but you needlessly scan through On items. The advantage of the set pair is that you only scan through Off items but have to move some from Off to On. I will not guess which would be faster over a complete run, or how this will compare with using lists. tjr -- http://mail.python.org/mailman/listinfo/python-list