Re: random.SystemRandom().randint() inefficient
Barry writes: >> On 26 Jul 2022, at 16:07, Cecil Westerhof via Python-list >> wrote: >> >> I need to get a random integer. At first I tried it with: >>from secrets import randbelow >>index = randbelow(len(to_try)) >> >> This works perfectly, but it took some time. So I thought I try: >>from random import SystemRandom >>index = SystemRandom().randint(0, len(to_try) - 1) >> >> A first indication is that the second version would take about two >> times as much time as the first. Is there a reason for this, or should >> this not be happening? > > What is the OS that you are running on and its version? > If it’s linux what is the kernel version? > What version of python and where from? That is always good information of-course. Debian 11.3 5.10.0-13-amd64 3.9.2 What do you mean with where the python version is from? -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
"Michael F. Stemper" writes: > This is orthogonal to your question, but might be of some use to you: > > The combination of using len(to_try) as an argument to randint() and > saving the output to a variable named "index" suggests that you might > be setting up to select a random element from to_try, as in: > something = to_try[index] > > If that is the case, you might want to consider using random.choice() instead: > > >>> from random import choice > >>> to_try = [2,3,5,7,11,13,"seventeen",19] > >>> choice(to_try) > 2 > >>> choice(to_try) > 'seventeen' > >>> choice(to_try) > 13 > >>> choice(to_try) > 5 > >>> Yes, I try to select a random element, but it has also to be removed, because an element should not be used more as once. This is the code I use: # index = randbelow(len(to_try)) index = randrange(len(to_try)) found = permutation[to_try.pop(index)] -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: More efficient code, but slower program
r...@zedat.fu-berlin.de (Stefan Ram) writes: > Cecil Westerhof writes: >>values = [*range(100)] > > In many cases, any iterable is just fine and a list is not > required, just as peudo-random numbers often are just fine and > real-world entropy is not required. In this case both are. I must select (several times) a random element from the list. So I need the list. I also want the randomness to be as good as possible to make the 'simulation' as good as possible. > Usually one wants to write code for readability, and thinking > too much about runtime efficiency optimizations is in vain, > because one might get different results with a different > version of Python or on a different machine. That is why I went for the less efficient code. ;-) -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
Chris Angelico writes: > On Wed, 27 Jul 2022 at 08:18, Cecil Westerhof via Python-list > wrote: >> >> Chris Angelico writes: >> >> > On Wed, 27 Jul 2022 at 06:06, Cecil Westerhof via Python-list >> > wrote: >> >> >> >> Chris Angelico writes: >> >> >> >> > On Wed, 27 Jul 2022 at 01:06, Cecil Westerhof via Python-list >> >> > wrote: >> >> >> >> >> >> I need to get a random integer. At first I tried it with: >> >> >> from secrets import randbelow >> >> >> index = randbelow(len(to_try)) >> >> >> >> >> >> This works perfectly, but it took some time. So I thought I try: >> >> >> from random import SystemRandom >> >> >> index = SystemRandom().randint(0, len(to_try) - 1) >> >> >> >> >> >> A first indication is that the second version would take about two >> >> >> times as much time as the first. Is there a reason for this, or should >> >> >> this not be happening? >> >> >> >> >> > >> >> > You're setting up a brand new SystemRandom instance just for a single >> >> > random number. For a fairer comparison, set up the instance, then >> >> > generate far more than just a single number, and see how that goes. >> >> >> >> Thanks. I thought I did something wrong and I did. >> >> I will try to implement like you said and look what the result will >> >> be. (And share it.) >> > >> > Thanks! Don't feel bad; performance testing is *hard*, getting >> > meaningful results takes a lot of of fiddling with parameters, and >> > getting interesting AND meaningful results can sometimes seem about >> > impossible. >> > >> >> (As I understand it both do more, or less the same and should have >> >> comparable performance.) >> > >> > In normal production work? Yes (the SystemRandom object doesn't have >> > any significant state - a seeded RNG could have a lot more overhead >> > here). But for performance testing? The work of instantiating the >> > class could be completely irrelevant, or it could be dominating your >> > results. It's hard to say, hence the suggestion to try it without >> > reinstantiating. >> >> It had a very big influence. Original it took about three times more >> time to run my program. (The program was still running when I posted >> the original post and the difference was higher as I anticipated.) >> Removing that did cut about 45% of the execution time of the program. >> (So the initiation is quit expensive.) >> But it still takes about 50% more time. So I am still a bit >> flabbergasted. >> >> The new code: >> from random import SystemRandom >> system_random = SystemRandom() >> index = system_random.randint(0, len(to_try) - 1) >> >> The first two statements are executed once. >> The last statement I think about 75 * 10 ** 6. >> >> So it seems that my first idea of using randbelow was the correct one. >> But if anyone could explain why SystemRandom is so much more >> expensive, I would be interested to know it. >> (Or am I still doing something wrong?) > > Hmm. There are still a lot of differences here. Are you able to make > use of randrange() instead, to make them more consistent? > > According to the source code, secrets.randbelow is calling on an > internal method _randbelow of the SystemRandom object, but randrange > (if called with only one arg) will go straight into that same method. > Here's my results: > > rosuav@sikorsky:~$ python3 -m timeit -s 'from random import randrange' > 'randrange(1)' > 100 loops, best of 5: 322 nsec per loop > rosuav@sikorsky:~$ python3 -m timeit -s 'from random import > SystemRandom; r = SystemRandom()' 'r.randint(0, 1)' > 20 loops, best of 5: 1.92 usec per loop > rosuav@sikorsky:~$ python3 -m timeit -s 'from random import > SystemRandom; r = SystemRandom()' 'r.randrange(1)' > 20 loops, best of 5: 1.87 usec per loop > rosuav@sikorsky:~$ python3 -m timeit -s 'from secrets import > randbelow' 'randbelow(1)' > 20 loops, best of 5: 1.64 usec per loop > > (The difference with the first one is that it isn't using the system > RNG, so it has the limitations of an internal PRNG.) > > When you call randint, what happens is (1) the endpoint is incremented > to transform it from inclusive-inclusive to inclusive-exclusive; (2) > randrange is called with two args; (3) fast path 1 is skipped, fast > path 2 is taken, and _randbelow gets called to get an actual random > number, which gets zero added to it before returning. > > If, instead, you use randrange(len(to_try)), what would happen is (1) > fast path 1 is used, and (2) _randbelow is called to get the random > number. > > In secrets.randbelow, it's even better: (1) _randbelow is called to > get the random number. > > I think that might be what's going on here. You're adding some tiny > amounts of extra work, but if you were to make them more similar, the > distinctions would evaporate. Here's a couple of other variants that > use SystemRandom: > > rosuav@sikorsky:~$ python3 -m timeit -s 'from random import > SystemRandom; randrange = SystemRandom().randrange' 'randrang
Re: random.SystemRandom().randint() inefficient
Chris Angelico writes: > Incidentally - if you are actually trying to select a specific item, > you may want to consider random.choice. Yes, I try to select a random element, but it has also to be removed. An element should be used at most once. This is the code I use: # index = randbelow(len(to_try)) index = randrange(len(to_try)) found = permutation[to_try.pop(index)] -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Simple TCP proxy
Hi. I'd like to share with you a recent project, which is a simple TCP proxy that can stand in front of a TCP server of some sort, queueing requests and then allowing n number of connections to pass through at a time: https://github.com/morphex/stp I'll be developing it further, but the the files committed in this tree seem to be stable: https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465 I just bombed that code with 700+ requests almost simultaneously, and STP handled it well. Regards, Morten -- I am https://leavingnorway.info Videos at https://www.youtube.com/user/TheBlogologue Twittering at http://twitter.com/blogologue Blogging at http://blogologue.com Playing music at https://soundcloud.com/morten-w-petersen Also playing music and podcasting here: http://www.mixcloud.com/morten-w-petersen/ On Google+ here https://plus.google.com/107781930037068750156 On Instagram at https://instagram.com/morphexx/ -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
Dennis Lee Bieber writes: > On Tue, 26 Jul 2022 23:47:59 +0200, Cecil Westerhof > declaimed the following: > > >>The new code: >>from random import SystemRandom >>system_random = SystemRandom() >>index = system_random.randint(0, len(to_try) - 1) >> >>The first two statements are executed once. >>The last statement I think about 75 * 10 ** 6. >> >>So it seems that my first idea of using randbelow was the correct one. >>But if anyone could explain why SystemRandom is so much more >>expensive, I would be interested to know it. >>(Or am I still doing something wrong?) > > What happens with > > system_randint = SystemRandom().randint #no parens > > index = system_randint(...) > > which may remove the method lookup from the repetition. I had already switched to randrange. This went to 15 minutes from 21 minutes. By removing the method lookup I could shave off another minute. So certainly noteworthy. (Should have thought about it myself.) -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
More efficient code, but slower program
It is not very important, but I am just curious. Original I had in a program: values = [*range(100)] But because it is done quite often I expected that initialising: range_list = [*range(100)] and then use: values = range_list.copy() Would be more efficient. So I tried: timeit('values = [*range(100)]') 1.6964535564184189 and: timeit('new_values = values.copy()', 'values = [*range(100)]') 0.6457642465829849 That showed that it should make a positive difference. But when changing the program it took a little bit more time. I find the code with the copy a little bit better, so I kept it. But I am curious why the effect is the opposite of what I expected. It does not hurt to understand optimisation better, so I can do a better job when I need it. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
On 26/07/2022 16.47, Cecil Westerhof wrote: Chris Angelico writes: On Wed, 27 Jul 2022 at 06:06, Cecil Westerhof via Python-list wrote: Chris Angelico writes: On Wed, 27 Jul 2022 at 01:06, Cecil Westerhof via Python-list wrote: I need to get a random integer. At first I tried it with: from secrets import randbelow index = randbelow(len(to_try)) This works perfectly, but it took some time. So I thought I try: from random import SystemRandom index = SystemRandom().randint(0, len(to_try) - 1) A first indication is that the second version would take about two times as much time as the first. Is there a reason for this, or should this not be happening? You're setting up a brand new SystemRandom instance just for a single random number. For a fairer comparison, set up the instance, then generate far more than just a single number, and see how that goes. Thanks. I thought I did something wrong and I did. I will try to implement like you said and look what the result will be. (And share it.) Thanks! Don't feel bad; performance testing is *hard*, getting meaningful results takes a lot of of fiddling with parameters, and getting interesting AND meaningful results can sometimes seem about impossible. (As I understand it both do more, or less the same and should have comparable performance.) In normal production work? Yes (the SystemRandom object doesn't have any significant state - a seeded RNG could have a lot more overhead here). But for performance testing? The work of instantiating the class could be completely irrelevant, or it could be dominating your results. It's hard to say, hence the suggestion to try it without reinstantiating. It had a very big influence. Original it took about three times more time to run my program. (The program was still running when I posted the original post and the difference was higher as I anticipated.) Removing that did cut about 45% of the execution time of the program. (So the initiation is quit expensive.) But it still takes about 50% more time. So I am still a bit flabbergasted. The new code: from random import SystemRandom system_random = SystemRandom() index = system_random.randint(0, len(to_try) - 1) This is orthogonal to your question, but might be of some use to you: The combination of using len(to_try) as an argument to randint() and saving the output to a variable named "index" suggests that you might be setting up to select a random element from to_try, as in: something = to_try[index] If that is the case, you might want to consider using random.choice() instead: >>> from random import choice >>> to_try = [2,3,5,7,11,13,"seventeen",19] >>> choice(to_try) 2 >>> choice(to_try) 'seventeen' >>> choice(to_try) 13 >>> choice(to_try) 5 >>> -- Michael F. Stemper This sentence no verb. -- https://mail.python.org/mailman/listinfo/python-list
Re: More efficient code, but slower program
On 2022-07-27 at 17:48:47 +0200, Regarding "Re: More efficient code, but slower program," Cecil Westerhof via Python-list wrote: > r...@zedat.fu-berlin.de (Stefan Ram) writes: > > > Cecil Westerhof writes: > >>values = [*range(100)] > > > > In many cases, any iterable is just fine and a list is not > > required, just as peudo-random numbers often are just fine and > > real-world entropy is not required. > > In this case both are. I must select (several times) a random element > from the list. So I need the list. > I also want the randomness to be as good as possible to make the > 'simulation' as good as possible. "[A]s good as possible" for simulations and tests may not require the cryptographic quality numbers from SystemRandom. Many/most pseudo random number generators are optimized for statistically normalized outputs, and are repeatable as a bonus (again, often a requirement for certain types of simulations; YMMV). Also, what if you shuffled the list first (e.g., with random.shuffle) and then iterated through it directly? Repeatedly removing arbitrary elements from the middle of a list is potentially expensive. -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
On 27/07/2022 16:43, Cecil Westerhof via Python-list wrote: "Michael F. Stemper" writes: This is orthogonal to your question, but might be of some use to you: The combination of using len(to_try) as an argument to randint() and saving the output to a variable named "index" suggests that you might be setting up to select a random element from to_try, as in: something = to_try[index] If that is the case, you might want to consider using random.choice() instead: >>> from random import choice >>> to_try = [2,3,5,7,11,13,"seventeen",19] >>> choice(to_try) 2 >>> choice(to_try) 'seventeen' >>> choice(to_try) 13 >>> choice(to_try) 5 >>> Yes, I try to select a random element, but it has also to be removed, because an element should not be used more as once. This is the code I use: # index = randbelow(len(to_try)) index = randrange(len(to_try)) found = permutation[to_try.pop(index)] When you pop an element from the last, the elements after it need to be moved down, which takes time. Try shuffling the list and then popping the now randomly-ordered elements off the end. -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
On Wed, 27 Jul 2022 10:45:47 +0200, Cecil Westerhof declaimed the following: >What do you mean with where the python version is from? Base Python.org download, ActiveState package download, Anaconda package download, native OS install/extra install via OS repository download (Debian/Ubuntu: apt install xxx, where xxx is not the native OS Python) -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
On 7/27/22 02:45, Cecil Westerhof via Python-list wrote: > Barry writes: >> What version of python and where from? > > That is always good information of-course. > Debian 11.3 > 5.10.0-13-amd64 > 3.9.2 > > What do you mean with where the python version is from? On Windows, the platform of a large proportion of people asking questions here, there are many possible Python builds (python.org, Microsoft Store, Conda, ActiveState, etc.) and it sometimes matters, thus that tends to be a standard question that gets asked here. In your case the implied answer is "distro package". -- https://mail.python.org/mailman/listinfo/python-list
Re: Simple TCP proxy
On Thu, 28 Jul 2022 at 02:15, Morten W. Petersen wrote: > > Hi. > > I'd like to share with you a recent project, which is a simple TCP proxy > that can stand in front of a TCP server of some sort, queueing requests and > then allowing n number of connections to pass through at a time: How's this different from what the networking subsystem already does? When you listen, you can set a queue length. Can you elaborate? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
> On 27 Jul 2022, at 17:09, Cecil Westerhof via Python-list > wrote: > > Barry writes: > On 26 Jul 2022, at 16:07, Cecil Westerhof via Python-list wrote: >>> >>> I need to get a random integer. At first I tried it with: >>> from secrets import randbelow >>> index = randbelow(len(to_try)) >>> >>> This works perfectly, but it took some time. So I thought I try: >>> from random import SystemRandom >>> index = SystemRandom().randint(0, len(to_try) - 1) >>> >>> A first indication is that the second version would take about two >>> times as much time as the first. Is there a reason for this, or should >>> this not be happening? >> >> What is the OS that you are running on and its version? >> If it’s linux what is the kernel version? >> What version of python and where from? > > That is always good information of-course. > Debian 11.3 > 5.10.0-13-amd64 > 3.9.2 > > What do you mean with where the python version is from? Because random number generator implementation depends on the OS since it is SystemRandom() that you are comparing with python’s Implementation. Barry > > -- > Cecil Westerhof > Senior Software Engineer > LinkedIn: http://www.linkedin.com/in/cecilwesterhof > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
Dennis Lee Bieber writes: > On Wed, 27 Jul 2022 10:45:47 +0200, Cecil Westerhof > declaimed the following: > > >>What do you mean with where the python version is from? > > Base Python.org download, ActiveState package download, Anaconda > package download, native OS install/extra install via OS repository > download (Debian/Ubuntu: apt install xxx, where xxx is not the native OS > Python) Just the default Debian install. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: More efficient code, but slower program
On Wed, 27 Jul 2022 14:58:02 +0200, Cecil Westerhof declaimed the following: >It is not very important, but I am just curious. > >Original I had in a program: >values = [*range(100)] > >But because it is done quite often I expected that initialising: >range_list = [*range(100)] > >and then use: >values = range_list.copy() > >Would be more efficient. So I tried: >timeit('values = [*range(100)]') >1.6964535564184189 > >and: >timeit('new_values = values.copy()', 'values = [*range(100)]') >0.6457642465829849 > Were these done in the same program/session? If so, the first invocation may be initializing/caching the first 100 integers (Python tends to keep some number of integers in a permanent cache to speed later access to common values). Also rather than * unpacking of the range iterator into a [] list... just... >>> list(range(100)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99] >>> ... will do it. Also, if your goal is to /remove/ and entry from the list via some index, you might consider if this is more effective than copying the list and THEN popping a value. >>> full = list(range(100)) >>> import random >>> idx = random.randint(0, len(full)) >>> idx 74 >>> trim = full[:idx] + full[idx+1:] >>> trim [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99] >>> trim == full False >>> or >>> trim2 = full[:] >>> del trim2[idx] >>> trim [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99] >>> The first does two partial copies skipping the item to be removed, and joins the results into a new list. The second does a full copy and DELETES the element to be removed from the copy. >>> trim3 = full[:] >>> trim3.remove(idx) >>> trim3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99] >>> This option works because the list is sequential integers and the index matches the values in the list (.remove() removes the first MATCHING element, so if the list can have duplicates is may not remove the one AT the index position). -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
MRAB writes: > On 27/07/2022 16:43, Cecil Westerhof via Python-list wrote: >> "Michael F. Stemper" writes: >> >>> This is orthogonal to your question, but might be of some use to you: >>> >>> The combination of using len(to_try) as an argument to randint() and >>> saving the output to a variable named "index" suggests that you might >>> be setting up to select a random element from to_try, as in: >>> something = to_try[index] >>> >>> If that is the case, you might want to consider using random.choice() >>> instead: >>> >>> >>> from random import choice >>> >>> to_try = [2,3,5,7,11,13,"seventeen",19] >>> >>> choice(to_try) >>> 2 >>> >>> choice(to_try) >>> 'seventeen' >>> >>> choice(to_try) >>> 13 >>> >>> choice(to_try) >>> 5 >>> >>> >> Yes, I try to select a random element, but it has also to be removed, >> because an element should not be used more as once. >> This is the code I use: >> # index = randbelow(len(to_try)) >> index = randrange(len(to_try)) >> found = permutation[to_try.pop(index)] >> > > When you pop an element from the last, the elements after it need to be > moved down, which takes time. > > Try shuffling the list and then popping the now randomly-ordered > elements off the end. Would shuffling not be a lot more expensive? Especially because I do not eat the whole list. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: More efficient code, but slower program
2qdxy4rzwzuui...@potatochowder.com writes: > On 2022-07-27 at 17:48:47 +0200, > Regarding "Re: More efficient code, but slower program," > Cecil Westerhof via Python-list wrote: > >> r...@zedat.fu-berlin.de (Stefan Ram) writes: >> >> > Cecil Westerhof writes: >> >>values = [*range(100)] >> > >> > In many cases, any iterable is just fine and a list is not >> > required, just as peudo-random numbers often are just fine and >> > real-world entropy is not required. >> >> In this case both are. I must select (several times) a random element >> from the list. So I need the list. >> I also want the randomness to be as good as possible to make the >> 'simulation' as good as possible. > > "[A]s good as possible" for simulations and tests may not require the > cryptographic quality numbers from SystemRandom. Many/most pseudo > random number generators are optimized for statistically normalized > outputs, and are repeatable as a bonus (again, often a requirement for > certain types of simulations; YMMV). > > Also, what if you shuffled the list first (e.g., with random.shuffle) > and then iterated through it directly? Repeatedly removing arbitrary > elements from the middle of a list is potentially expensive. Someone else also mentioned that. I think I should put it on my list of things to do/try. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: Simple TCP proxy
Hi Chris. You're thinking of the backlog argument of listen? Well, STP will accept all connections, but can limit how many of the accepted connections that are active at any given time. So when I bombed it with hundreds of almost simultaneous connections, all of them were accepted, but only 25 were actively sending and receiving data at any given time. First come, first served. Regards, Morten On Wed, Jul 27, 2022 at 8:00 PM Chris Angelico wrote: > On Thu, 28 Jul 2022 at 02:15, Morten W. Petersen > wrote: > > > > Hi. > > > > I'd like to share with you a recent project, which is a simple TCP proxy > > that can stand in front of a TCP server of some sort, queueing requests > and > > then allowing n number of connections to pass through at a time: > > How's this different from what the networking subsystem already does? > When you listen, you can set a queue length. Can you elaborate? > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- I am https://leavingnorway.info Videos at https://www.youtube.com/user/TheBlogologue Twittering at http://twitter.com/blogologue Blogging at http://blogologue.com Playing music at https://soundcloud.com/morten-w-petersen Also playing music and podcasting here: http://www.mixcloud.com/morten-w-petersen/ On Google+ here https://plus.google.com/107781930037068750156 On Instagram at https://instagram.com/morphexx/ -- I am https://leavingnorway.info Videos at https://www.youtube.com/user/TheBlogologue Twittering at http://twitter.com/blogologue Blogging at http://blogologue.com Playing music at https://soundcloud.com/morten-w-petersen Also playing music and podcasting here: http://www.mixcloud.com/morten-w-petersen/ On Instagram at https://instagram.com/morphexx/ -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
Cecil Westerhof via Python-list schreef op 27/07/2022 om 17:43: "Michael F. Stemper" writes: > This is orthogonal to your question, but might be of some use to you: > > The combination of using len(to_try) as an argument to randint() and > saving the output to a variable named "index" suggests that you might > be setting up to select a random element from to_try, as in: > something = to_try[index] > > If that is the case, you might want to consider using random.choice() instead: > > >>> from random import choice > >>> to_try = [2,3,5,7,11,13,"seventeen",19] > >>> choice(to_try) > 2 > >>> choice(to_try) > 'seventeen' > >>> choice(to_try) > 13 > >>> choice(to_try) > 5 > >>> Yes, I try to select a random element, but it has also to be removed, because an element should not be used more as once. This is the code I use: # index = randbelow(len(to_try)) index = randrange(len(to_try)) found = permutation[to_try.pop(index)] Do you know in advance how many items you'll need, or maybe an upper limit on the amount? In that case it might be more efficient to use random.sample(to_try, k=nr_items_needed). -- "Honest criticism is hard to take, particularly from a relative, a friend, an acquaintance, or a stranger." -- Franklin P. Jones -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
Cecil Westerhof writes: Yes, I try to select a random element, but it has also to be removed, because an element should not be used more as once. Instead of using pop to do that why not something like: def lazy_shuffle(seq): """ Generate the elements of the given sequence in a random order. """ # Delete the next line if you want to use a different version of # randrange: from random import randrange # Delete the next line if SEQ is already a mutable sequence and you # are willing to have it destroyed by this process: seq = list(seq) n = len(seq) while n: i = randrange(n) yield seq[i] n -= 1 if i < n: seq[i] = seq[n] -- Alan Bawden -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
On 27/07/2022 18:24, Cecil Westerhof via Python-list wrote: MRAB writes: On 27/07/2022 16:43, Cecil Westerhof via Python-list wrote: "Michael F. Stemper" writes: This is orthogonal to your question, but might be of some use to you: The combination of using len(to_try) as an argument to randint() and saving the output to a variable named "index" suggests that you might be setting up to select a random element from to_try, as in: something = to_try[index] If that is the case, you might want to consider using random.choice() instead: >>> from random import choice >>> to_try = [2,3,5,7,11,13,"seventeen",19] >>> choice(to_try) 2 >>> choice(to_try) 'seventeen' >>> choice(to_try) 13 >>> choice(to_try) 5 >>> Yes, I try to select a random element, but it has also to be removed, because an element should not be used more as once. This is the code I use: # index = randbelow(len(to_try)) index = randrange(len(to_try)) found = permutation[to_try.pop(index)] When you pop an element from the last, the elements after it need to be moved down, which takes time. Try shuffling the list and then popping the now randomly-ordered elements off the end. Would shuffling not be a lot more expensive? Especially because I do not eat the whole list. You won't know until you time it. -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
Roel Schroeven writes: > Cecil Westerhof via Python-list schreef op 27/07/2022 om 17:43: >> "Michael F. Stemper" writes: >> >> > This is orthogonal to your question, but might be of some use to you: >> > >> > The combination of using len(to_try) as an argument to randint() and >> > saving the output to a variable named "index" suggests that you might >> > be setting up to select a random element from to_try, as in: >> > something = to_try[index] >> > >> > If that is the case, you might want to consider using random.choice() >> > instead: >> > >> > >>> from random import choice >> > >>> to_try = [2,3,5,7,11,13,"seventeen",19] >> > >>> choice(to_try) >> > 2 >> > >>> choice(to_try) >> > 'seventeen' >> > >>> choice(to_try) >> > 13 >> > >>> choice(to_try) >> > 5 >> > >>> >> >> Yes, I try to select a random element, but it has also to be removed, >> because an element should not be used more as once. >> This is the code I use: >> # index = randbelow(len(to_try)) >> index = randrange(len(to_try)) >> found = permutation[to_try.pop(index)] > Do you know in advance how many items you'll need, or maybe an upper > limit on the amount? In that case it might be more efficient to use > random.sample(to_try, k=nr_items_needed). Something else to try. :-) And yes: I will be using half of the list. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
MRAB writes: >>> When you pop an element from the last, the elements after it need to be >>> moved down, which takes time. >>> >>> Try shuffling the list and then popping the now randomly-ordered >>> elements off the end. >> Would shuffling not be a lot more expensive? Especially because I do >> not eat the whole list. >> > You won't know until you time it. A first indication is that it doubles the needed time. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
Alan Bawden writes: > Cecil Westerhof writes: > >Yes, I try to select a random element, but it has also to be removed, >because an element should not be used more as once. > > Instead of using pop to do that why not something like: > > def lazy_shuffle(seq): > """ > Generate the elements of the given sequence in a random order. > """ > # Delete the next line if you want to use a different version of > # randrange: > from random import randrange > # Delete the next line if SEQ is already a mutable sequence and you > # are willing to have it destroyed by this process: > seq = list(seq) > n = len(seq) > while n: > i = randrange(n) > yield seq[i] > n -= 1 > if i < n: > seq[i] = seq[n] That looks interesting. But there is on problem. (I think.) The list is only partly eaten and I will eat a lot of sequences. So a lot of sequences will be left in the runtime. Or is there a way to destroy the lazy_shuffle when it is not needed anymore? -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
Cecil Westerhof writes: Alan Bawden writes: > Cecil Westerhof writes: > >Yes, I try to select a random element, but it has also to be removed, >because an element should not be used more as once. > > Instead of using pop to do that why not something like: > > def lazy_shuffle(seq): > """ > Generate the elements of the given sequence in a random order. > """ > # Delete the next line if you want to use a different version of > # randrange: > from random import randrange > # Delete the next line if SEQ is already a mutable sequence and you > # are willing to have it destroyed by this process: > seq = list(seq) > n = len(seq) > while n: > i = randrange(n) > yield seq[i] > n -= 1 > if i < n: > seq[i] = seq[n] That looks interesting. But there is on problem. (I think.) The list is only partly eaten and I will eat a lot of sequences. So a lot of sequences will be left in the runtime. Or is there a way to destroy the lazy_shuffle when it is not needed anymore? You don't have to worry about that. That's what Garbage Collection is for. After you drop the last reference to the generator, the GC will destroy it for you. Welcome to Python. BTW, what I wrote might be slightly faster if you replace it with: while n: i = randrange(n) yield seq[i] n -= 1 seq[i] = seq[n] That will save you some time spent testing and branching. -- Alan Bawden -- https://mail.python.org/mailman/listinfo/python-list
Re: Simple TCP proxy
On Thu, 28 Jul 2022 at 04:32, Morten W. Petersen wrote: > > Hi Chris. > > You're thinking of the backlog argument of listen? Yes, precisely. > Well, STP will accept all connections, but can limit how many of the accepted > connections that are active at any given time. > > So when I bombed it with hundreds of almost simultaneous connections, all of > them were accepted, but only 25 were actively sending and receiving data at > any given time. First come, first served. > Hmm. Okay. Not sure what the advantage is, but sure. If the server's capable of handling the total requests-per-minute, then a queueing system like this should help with burst load, although I would have thought that the listen backlog would do the same. What happens if the server actually gets overloaded though? Do connections get disconnected after appearing connected? What's the disconnect mode? BTW, you probably don't want to be using the _thread module - Python has a threading module which is better suited to this sort of work. Although you may want to consider asyncio instead, as that has far lower overhead when working with large numbers of sockets. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: random.SystemRandom().randint() inefficient
On Thu, 28 Jul 2022 at 05:36, Cecil Westerhof via Python-list wrote: > > Roel Schroeven writes: > > > Cecil Westerhof via Python-list schreef op 27/07/2022 om 17:43: > >> "Michael F. Stemper" writes: > >> > >> > This is orthogonal to your question, but might be of some use to you: > >> > > >> > The combination of using len(to_try) as an argument to randint() and > >> > saving the output to a variable named "index" suggests that you might > >> > be setting up to select a random element from to_try, as in: > >> > something = to_try[index] > >> > > >> > If that is the case, you might want to consider using random.choice() > >> > instead: > >> > > >> > >>> from random import choice > >> > >>> to_try = [2,3,5,7,11,13,"seventeen",19] > >> > >>> choice(to_try) > >> > 2 > >> > >>> choice(to_try) > >> > 'seventeen' > >> > >>> choice(to_try) > >> > 13 > >> > >>> choice(to_try) > >> > 5 > >> > >>> > >> > >> Yes, I try to select a random element, but it has also to be removed, > >> because an element should not be used more as once. > >> This is the code I use: > >> # index = randbelow(len(to_try)) > >> index = randrange(len(to_try)) > >> found = permutation[to_try.pop(index)] > > Do you know in advance how many items you'll need, or maybe an upper > > limit on the amount? In that case it might be more efficient to use > > random.sample(to_try, k=nr_items_needed). > > Something else to try. :-) > And yes: I will be using half of the list. > A perfect job for random.sample() then, if that's *exactly* half the list. But if you don't know for sure how many you'll need, an easy way to make it more efficient would be to take the n'th element, then move the last element down to the n'th position, and finally pop off the last element. That way, you only ever pop the last element away. The list will get reordered arbitrarily while you do this, but if the only purpose of it is to remove elements from random consideration, that won't be a problem. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Simple TCP proxy
On Wed, Jul 27, 2022 at 08:32:31PM +0200, Morten W. Petersen wrote: You're thinking of the backlog argument of listen? From my understanding, yes, when you set up the "accepter" socket (the one that you use to listen and accept new connections), you can define the length of the queue for incoming connections that are not accepted yet. This will be the equivalent of your SimpleQueue which basically puts a limits on how many incoming connections are "accepted" to do a real job. Using skt.listen(N) the incoming connections are put on hold by the OS while in your implementation are formally accepted but they are not allowed to do any meaningful work: they are put on the SimpleQueue and only when they are popped then they will work (send/recv data). The difference then between the OS and your impl is minimal. The only case that I can think is that on the clients' side it may exist a timeout for the acceptance of the connection so your proxy server will eagerly accept these connections so no timeout is possible(*) On a side note, you implementation is too thread-naive: it uses plain Python lists, integers and boolean variables which are not thread safe. It is a matter of time until your server will start behave weird. One option is that you use thread-safe objects. I'll encourage to read about thread-safety in general and then which sync mechanisms Python offers. Another option is to remove the SimpleQueue and the background function that allows a connection to be "active". If you think, the handlers are 99% independent except that you want to allow only N of them to progress (stablish and forward the connection) and when a handler finishes, another handler "waiting" is activated, "in a queue fashion" as you said. If you allow me to not have a strict queue discipline here, you can achieve the same results coordinating the handlers using semaphores. Once again, take this email as starting point for your own research. On a second side note, the use of handlers and threads is inefficient because while you have N active handlers sending/receiving data, because you are eagerly accepting new connections you will have much more handlers created and (if I'm not wrong), each will be a thread. A more efficient solution could be 1) accept as many connections as you can, saving the socket (not the handler) in the thread-safe queue. 2) have N threads in the background popping from the queue a socket and then doing the send/recv stuff. When the thread is done, the thread closes the socket and pops another from the queue. So the queue length will be the count of accepted connections but in any moment your proxy will not activate (forward) more than N connections. This idea is thread-safe, simpler, efficient and has the queue discipline (I leave aside the usefulness). I encourage you to take time to read about the different things mentioned as concurrency and thread-related stuff is not easy to master. Thanks, Martin. (*) make your proxy server slow enough and yes, you will get timeouts anyways. Well, STP will accept all connections, but can limit how many of the accepted connections that are active at any given time. So when I bombed it with hundreds of almost simultaneous connections, all of them were accepted, but only 25 were actively sending and receiving data at any given time. First come, first served. Regards, Morten On Wed, Jul 27, 2022 at 8:00 PM Chris Angelico wrote: On Thu, 28 Jul 2022 at 02:15, Morten W. Petersen wrote: > > Hi. > > I'd like to share with you a recent project, which is a simple TCP proxy > that can stand in front of a TCP server of some sort, queueing requests and > then allowing n number of connections to pass through at a time: How's this different from what the networking subsystem already does? When you listen, you can set a queue length. Can you elaborate? ChrisA -- https://mail.python.org/mailman/listinfo/python-list -- I am https://leavingnorway.info Videos at https://www.youtube.com/user/TheBlogologue Twittering at http://twitter.com/blogologue Blogging at http://blogologue.com Playing music at https://soundcloud.com/morten-w-petersen Also playing music and podcasting here: http://www.mixcloud.com/morten-w-petersen/ On Google+ here https://plus.google.com/107781930037068750156 On Instagram at https://instagram.com/morphexx/ -- I am https://leavingnorway.info Videos at https://www.youtube.com/user/TheBlogologue Twittering at http://twitter.com/blogologue Blogging at http://blogologue.com Playing music at https://soundcloud.com/morten-w-petersen Also playing music and podcasting here: http://www.mixcloud.com/morten-w-petersen/ On Instagram at https://instagram.com/morphexx/ -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Simple TCP proxy
> On 27 Jul 2022, at 17:16, Morten W. Petersen wrote: > > Hi. > > I'd like to share with you a recent project, which is a simple TCP proxy > that can stand in front of a TCP server of some sort, queueing requests and > then allowing n number of connections to pass through at a time: > > https://github.com/morphex/stp > > I'll be developing it further, but the the files committed in this tree > seem to be stable: > > https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465 > > I just bombed that code with 700+ requests almost simultaneously, and STP > handled it well. What is the problem that this solves? Why not just increase the allowed size of the socket listen backlog if you just want to handle bursts of traffic. I do not think of this as a proxy, rather a tunnel. And the tunnel is a lot more expensive the having kernel keep the connection in the listen socket backlog. I work on a web proxy written on python that handles huge load and using backlog of the bursts. It’s async using twisted as threads are not practice at scale. Barry > > Regards, > > Morten > > -- > I am https://leavingnorway.info > Videos at https://www.youtube.com/user/TheBlogologue > Twittering at http://twitter.com/blogologue > Blogging at http://blogologue.com > Playing music at https://soundcloud.com/morten-w-petersen > Also playing music and podcasting here: > http://www.mixcloud.com/morten-w-petersen/ > On Google+ here https://plus.google.com/107781930037068750156 > On Instagram at https://instagram.com/morphexx/ > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list