OK. Well, I've worked with web hosting in the past, and proxies like squid were used to lessen the load on dynamic backends. There was also a website opensourcearticles.com that we had with Firefox, Thunderbird articles etc. that got quite a bit of traffic.
IIRC, that website was mostly static with some dynamic bits and heavily cached by squid. Most websites don't get a lot of traffic though, and don't have a big budget for "website system administration". So maybe that's where I'm partly going with this, just making a proxy that can be put in front and deal with a lot of common situations, in a reasonably good way. If I run into problems with threads that can't be managed, then a switch to something like the queue_manager function which has data and then functions that manage the data and connections is an option. -Morten On Fri, Jul 29, 2022 at 12:11 AM Chris Angelico <ros...@gmail.com> wrote: > On Fri, 29 Jul 2022 at 07:24, Morten W. Petersen <morp...@gmail.com> > wrote: > > > > Forwarding to the list as well. > > > > ---------- Forwarded message --------- > > From: Morten W. Petersen <morp...@gmail.com> > > Date: Thu, Jul 28, 2022 at 11:22 PM > > Subject: Re: Simple TCP proxy > > To: Chris Angelico <ros...@gmail.com> > > > > > > Well, an increase from 0.1 seconds to 0.2 seconds on "polling" in each > > thread whether or not the connection should become active doesn't seem > like > > a big deal. > > Maybe, but polling *at all* is the problem here. It shouldn't be > hammering the other server. You'll quickly find that there are limits > that simply shouldn't exist, because every connection is trying to > check to see if it's active now. This is *completely unnecessary*. > I'll reiterate the advice given earlier in this thread (of > conversation): Look into the tools available for thread (of execution) > synchronization, such as mutexes (in Python, threading.Lock) and > events (in Python, threading.Condition). A poll interval enforces a > delay before the thread notices that it's active, AND causes inactive > threads to consume CPU, neither of which is a good thing. > > > And there's also some point where it is pointless to accept more > > connections, and where maybe remedies like accepting known good IPs, > > blocking IPs / IP blocks with more than 3 connections etc. should be > > considered. > > Firewalling is its own science. Blocking IPs with too many > simultaneous connections should be decided administratively, not > because your proxy can't handle enough connections. > > > I think I'll be getting closer than most applications to an eventual > > ceiling for what Python can handle of threads, and that's interesting and > > could be beneficial for Python as well. > > Here's a quick demo of the cost of threads when they're all blocked on > something. > > >>> import threading > >>> finish = threading.Condition() > >>> def thrd(cond): > ... with cond: cond.wait() > ... > >>> threading.active_count() # Main thread only > 1 > >>> import time > >>> def spawn(n): > ... start = time.monotonic() > ... for _ in range(n): > ... t = threading.Thread(target=thrd, args=(finish,)) > ... t.start() > ... print("Spawned", n, "threads in", time.monotonic() - start, > "seconds") > ... > >>> spawn(10000) > Spawned 10000 threads in 7.548425202025101 seconds > >>> threading.active_count() > 10001 > >>> with finish: finish.notify_all() > ... > >>> threading.active_count() > 1 > > It takes a bit of time to start ten thousand threads, but after that, > the system is completely idle again until I notify them all and they > shut down. > > (Interestingly, it takes four times as long to start 20,000 threads, > suggesting that something in thread spawning has O(n²) cost. Still, > even that leaves the system completely idle once it's done spawning > them.) > > If your proxy can handle 20,000 threads, I would be astonished. And > this isn't even close to a thread limit. > > Obviously the cost is different if the threads are all doing things, > but if you have thousands of active socket connections, you'll start > finding that there are limitations in quite a few places, depending on > how much traffic is going through them. Ultimately, yes, you will find > that threads restrict you and asynchronous I/O is the only option; but > you can take threads a fairly long way before they are the limiting > factor. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- I am https://leavingnorway.info Videos at https://www.youtube.com/user/TheBlogologue Twittering at http://twitter.com/blogologue Blogging at http://blogologue.com Playing music at https://soundcloud.com/morten-w-petersen Also playing music and podcasting here: http://www.mixcloud.com/morten-w-petersen/ On Google+ here https://plus.google.com/107781930037068750156 On Instagram at https://instagram.com/morphexx/ -- https://mail.python.org/mailman/listinfo/python-list