Re: Simple TCP proxy

2022-07-28 Thread Chris Angelico
On Fri, 29 Jul 2022 at 11:42, Andrew MacIntyre  wrote:
>
> On 29/07/2022 8:08 am, Chris Angelico wrote:
> > It takes a bit of time to start ten thousand threads, but after that,
> > the system is completely idle again until I notify them all and they
> > shut down.
> >
> > (Interestingly, it takes four times as long to start 20,000 threads,
> > suggesting that something in thread spawning has O(n²) cost. Still,
> > even that leaves the system completely idle once it's done spawning
> > them.)
>
> Another cost of threads can be memory allocated as thread stack space,
> the default size of which varies by OS (see e.g.
> https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/).
>
> threading.stack_size() can be used to check and perhaps adjust the
> allocation size.
>

Yeah, they do have quite a few costs, and a naive approach of "give a
thread to every client", while very convenient, will end up limiting
throughput. (But I'll be honest: I still have a server that's built on
exactly that model, because it's much much safer than risking one
client stalling out the whole server due to a small bug. But that's a
MUD server.) Thing is, though, it'll most likely limit throughput to
something in the order of thousands of concurrent connections (or
thousands per second if it's something like HTTP where they tend to
get closed again), maybe tens of thousands. So if you have something
where every thread needs its own database connection, well, you're
gonna have database throughput problems WAY before you actually run
into thread count limitations!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Andrew MacIntyre

On 29/07/2022 8:08 am, Chris Angelico wrote:

It takes a bit of time to start ten thousand threads, but after that,
the system is completely idle again until I notify them all and they
shut down.

(Interestingly, it takes four times as long to start 20,000 threads,
suggesting that something in thread spawning has O(n²) cost. Still,
even that leaves the system completely idle once it's done spawning
them.)


Another cost of threads can be memory allocated as thread stack space, 
the default size of which varies by OS (see e.g. 
https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/).


threading.stack_size() can be used to check and perhaps adjust the 
allocation size.


--
-
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: andy...@pcug.org.au(pref) | Snail: PO Box 370
andy...@bullseye.apana.org.au   (alt) |Belconnen ACT 2616
Web:http://www.andymac.org/   |Australia
--
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Chris Angelico
On Fri, 29 Jul 2022 at 07:24, Morten W. Petersen  wrote:
>
> Forwarding to the list as well.
>
> -- Forwarded message -
> From: Morten W. Petersen 
> Date: Thu, Jul 28, 2022 at 11:22 PM
> Subject: Re: Simple TCP proxy
> To: Chris Angelico 
>
>
> Well, an increase from 0.1 seconds to 0.2 seconds on "polling" in each
> thread whether or not the connection should become active doesn't seem like
> a big deal.

Maybe, but polling *at all* is the problem here. It shouldn't be
hammering the other server. You'll quickly find that there are limits
that simply shouldn't exist, because every connection is trying to
check to see if it's active now. This is *completely unnecessary*.
I'll reiterate the advice given earlier in this thread (of
conversation): Look into the tools available for thread (of execution)
synchronization, such as mutexes (in Python, threading.Lock) and
events (in Python, threading.Condition). A poll interval enforces a
delay before the thread notices that it's active, AND causes inactive
threads to consume CPU, neither of which is a good thing.

> And there's also some point where it is pointless to accept more
> connections, and where maybe remedies like accepting known good IPs,
> blocking IPs / IP blocks with more than 3 connections etc. should be
> considered.

Firewalling is its own science. Blocking IPs with too many
simultaneous connections should be decided administratively, not
because your proxy can't handle enough connections.

> I think I'll be getting closer than most applications to an eventual
> ceiling for what Python can handle of threads, and that's interesting and
> could be beneficial for Python as well.

Here's a quick demo of the cost of threads when they're all blocked on
something.

>>> import threading
>>> finish = threading.Condition()
>>> def thrd(cond):
... with cond: cond.wait()
...
>>> threading.active_count() # Main thread only
1
>>> import time
>>> def spawn(n):
... start = time.monotonic()
... for _ in range(n):
... t = threading.Thread(target=thrd, args=(finish,))
... t.start()
... print("Spawned", n, "threads in", time.monotonic() - start, "seconds")
...
>>> spawn(1)
Spawned 1 threads in 7.548425202025101 seconds
>>> threading.active_count()
10001
>>> with finish: finish.notify_all()
...
>>> threading.active_count()
1

It takes a bit of time to start ten thousand threads, but after that,
the system is completely idle again until I notify them all and they
shut down.

(Interestingly, it takes four times as long to start 20,000 threads,
suggesting that something in thread spawning has O(n²) cost. Still,
even that leaves the system completely idle once it's done spawning
them.)

If your proxy can handle 20,000 threads, I would be astonished. And
this isn't even close to a thread limit.

Obviously the cost is different if the threads are all doing things,
but if you have thousands of active socket connections, you'll start
finding that there are limitations in quite a few places, depending on
how much traffic is going through them. Ultimately, yes, you will find
that threads restrict you and asynchronous I/O is the only option; but
you can take threads a fairly long way before they are the limiting
factor.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Morten W. Petersen
Well, it's not just code size in terms of disk space, it is also code
complexity, and the level of knowledge, skill and time it takes to make use
of something.

And if something fails in an unobvious way in Twisted, I imagine that
requires somebody highly skilled, and that costs quite a bit of money. And
people like that might also not always be available.

-Morten

On Thu, Jul 28, 2022 at 2:29 PM Barry  wrote:

>
>
> On 28 Jul 2022, at 10:31, Morten W. Petersen  wrote:
>
> 
> Hi Barry.
>
> Well, I can agree that using backlog is an option for handling bursts. But
> what if that backlog number is exceeded?  How easy is it to deal with such
> a situation?
>
>
> You can make backlog very large, if that makes sense.
> But at some point you will be forced to reject connections,
> once you cannot keep up with the average rate of connections.
>
>
>
> I just cloned twisted, and compared the size:
>
> morphex@morphex-Latitude-E4310:~$ du -s stp; du -s tmp/twisted/
> 464 stp
> 98520 tmp/twisted/
> morphex@morphex-Latitude-E4310:~$ du -sh stp/LICENSE
> 36K stp/LICENSE
>
> >>> 464/98520.0
> 0.004709703613479496
> >>>
>
> It's quite easy to get an idea of what's going on in STP, as opposed to if
> something goes wrong in Twisted with the size of the codebase. I used to
> use emacs a lot, but then I came into a period where it was more practical
> to use nano, and I mostly use nano now, unless I need to for example search
> and replace or something like that.
>
>
> I mentioned twisted for context. Depending on yours need the built in
> python 3 async support may well be sufficient for you needs. Using threads
> is not scalable.
>
> In the places I code disk space of a few MiB is not an issue.
>
> Barry
>
>
> -Morten
>
> On Thu, Jul 28, 2022 at 8:31 AM Barry  wrote:
>
>>
>>
>> > On 27 Jul 2022, at 17:16, Morten W. Petersen  wrote:
>> >
>> > Hi.
>> >
>> > I'd like to share with you a recent project, which is a simple TCP proxy
>> > that can stand in front of a TCP server of some sort, queueing requests
>> and
>> > then allowing n number of connections to pass through at a time:
>> >
>> > https://github.com/morphex/stp
>> >
>> > I'll be developing it further, but the the files committed in this tree
>> > seem to be stable:
>> >
>> >
>> https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465
>> >
>> > I just bombed that code with 700+ requests almost simultaneously, and
>> STP
>> > handled it well.
>>
>> What is the problem that this solves?
>>
>> Why not just increase the allowed size of the socket listen backlog if
>> you just want to handle bursts of traffic.
>>
>> I do not think of this as a proxy, rather a tunnel.
>> And the tunnel is a lot more expensive the having kernel keep the
>> connection in
>> the listen socket backlog.
>>
>> I work on a web proxy written on python that handles huge load and
>> using backlog of the bursts.
>>
>> It’s async using twisted as threads are not practice at scale.
>>
>> Barry
>>
>> >
>> > Regards,
>> >
>> > Morten
>> >
>> > --
>> > I am https://leavingnorway.info
>> > Videos at https://www.youtube.com/user/TheBlogologue
>> > Twittering at http://twitter.com/blogologue
>> > Blogging at http://blogologue.com
>> > Playing music at https://soundcloud.com/morten-w-petersen
>> > Also playing music and podcasting here:
>> > http://www.mixcloud.com/morten-w-petersen/
>> > On Google+ here https://plus.google.com/107781930037068750156
>> > On Instagram at https://instagram.com/morphexx/
>> > --
>> > https://mail.python.org/mailman/listinfo/python-list
>> >
>>
>>
>
> --
> I am https://leavingnorway.info
> Videos at https://www.youtube.com/user/TheBlogologue
> Twittering at http://twitter.com/blogologue
> Blogging at http://blogologue.com
> Playing music at https://soundcloud.com/morten-w-petersen
> Also playing music and podcasting here:
> http://www.mixcloud.com/morten-w-petersen/
> On Google+ here https://plus.google.com/107781930037068750156
> On Instagram at https://instagram.com/morphexx/
>
>

-- 
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/
-- 
https://mail.python.org/mailman/listinfo/python-list


Fwd: Simple TCP proxy

2022-07-28 Thread Morten W. Petersen
Forwarding to the list as well.

-- Forwarded message -
From: Morten W. Petersen 
Date: Thu, Jul 28, 2022 at 11:22 PM
Subject: Re: Simple TCP proxy
To: Chris Angelico 


Well, an increase from 0.1 seconds to 0.2 seconds on "polling" in each
thread whether or not the connection should become active doesn't seem like
a big deal.

And there's also some point where it is pointless to accept more
connections, and where maybe remedies like accepting known good IPs,
blocking IPs / IP blocks with more than 3 connections etc. should be
considered.

I think I'll be getting closer than most applications to an eventual
ceiling for what Python can handle of threads, and that's interesting and
could be beneficial for Python as well.

-Morten

On Thu, Jul 28, 2022 at 2:31 PM Chris Angelico  wrote:

> On Thu, 28 Jul 2022 at 21:01, Morten W. Petersen 
> wrote:
> >
> > Well, I was thinking of following the socketserver / handle layout of
> code and execution, for now anyway.
> >
> > It wouldn't be a big deal to make them block, but another option is to
> increase the sleep period 100% for every 200 waiting connections while
> waiting in handle.
>
> Easy denial-of-service attack then. Spam connections and the queue
> starts blocking hard. The sleep loop seems like a rather inefficient
> way to do things.
>
> > Another thing is that it's nice to see Python handling 500+ threads
> without problems. :)
>
> Yeah, well, that's not all THAT many threads, ultimately :)
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>


-- 
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/


-- 
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fwd: timedelta object recursion bug

2022-07-28 Thread Dieter Maurer
Please stay on the list (such that others can help, too)

Ben Hirsig wrote at 2022-7-29 06:53 +1000:
>Thanks for the replies, I'm just trying to understand why this would be
>useful?
>
>E.g. why does max need a min/max/resolution, and why would these attributes
>themselves need a min/max/resolution, etc, etc?

`max` is a `timedelta` and as such inherits (e.g.) `resolution`
from the class (as any other `timedelta` instance).

Note that `timedelta` instances do not have a `max` (`min|resolution`)
slot. When `max` is looked up, it is first searched in the instance
(and not found), then in the class where it is found:
all `max` accesses result in the same object.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fwd: timedelta object recursion bug

2022-07-28 Thread Dieter Maurer
Ben Hirsig wrote at 2022-7-28 19:54 +1000:
>Hi, I noticed this when using the requests library in the response.elapsed
>object (type timedelta). Tested using the standard datetime library alone
>with the example displayed on
>https://docs.python.org/3/library/datetime.html#examples-of-usage-timedelta
>
>
>
>It appears as though the timedelta object recursively adds its own
>attributes (min, max, resolution) as further timedelta objects. I’m not
>sure how deep they go, but presumably hitting the recursion limit.

If you look at the source, you will see that `min`, `max`, `resolution`
are class level attributes. Their values are `timedelta` instances.
Therefore, you can access e.g. `timedelta(days=365).min.max.resolution`.
But this is nothing to worry about.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fwd: timedelta object recursion bug

2022-07-28 Thread Jon Ribbens via Python-list
On 2022-07-28, Ben Hirsig  wrote:
> Hi, I noticed this when using the requests library in the response.elapsed
> object (type timedelta). Tested using the standard datetime library alone
> with the example displayed on
> https://docs.python.org/3/library/datetime.html#examples-of-usage-timedelta
>
> It appears as though the timedelta object recursively adds its own
> attributes (min, max, resolution) as further timedelta objects. I’m not
> sure how deep they go, but presumably hitting the recursion limit.
>
>>from datetime import timedelta
>>year = timedelta(days=365)
>>print(year.max)
>   9 days, 23:59:59.99
>>print(year.max.min.max.resolution.max.min)
>   -9 days, 0:00:00

Why do you think this is a bug?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fwd: timedelta object recursion bug

2022-07-28 Thread MRAB

On 28/07/2022 10:54, Ben Hirsig wrote:

Hi, I noticed this when using the requests library in the response.elapsed
object (type timedelta). Tested using the standard datetime library alone
with the example displayed on
https://docs.python.org/3/library/datetime.html#examples-of-usage-timedelta



It appears as though the timedelta object recursively adds its own
attributes (min, max, resolution) as further timedelta objects. I’m not
sure how deep they go, but presumably hitting the recursion limit.




from datetime import timedelta



year = timedelta(days=365)



print(year.max)


   9 days, 23:59:59.99


print(year.max.min.max.resolution.max.min)


   -9 days, 0:00:00



I’m using 3.10.3


It's not recursion, it's a reference cycle. In fact, more than one:

>>> from datetime import timedelta
>>> year = timedelta(days=365)
>>> type(year)

>>> type(year.max)

>>> year.max is year.max.max
True
>>> type(year.min)

>>> year.min is year.min.min
True
--
https://mail.python.org/mailman/listinfo/python-list


Re: poetry script fails to find module

2022-07-28 Thread Loris Bennett
"Loris Bennett"  writes:

> Hi,
>
> The following is a little bit involved, but I hope can make the problem clear.
>
> Using poetry I have written a dummy application which just uses to typer
> to illustrate a possible interface design.  The directory structure is a
> follows: 
>
>   $ tree -P *.py
>   .
>   |-- dist
>   |-- stoat
>   |   |-- hpc
>   |   |   |-- database.py
>   |   |   |-- group.py
>   |   |   |-- __init__.py
>   |   |   |-- main.py
>   |   |   |-- owner.py
>   |   |   `-- user.py
>   |   |-- __init__.py
>   |   |-- main.py
>   |   `-- storage
>   |   |-- database.py
>   |   |-- group.py
>   |   |-- __init__.py
>   |   |-- main.py
>   |   |-- owner.py
>   |   |-- share.py
>   |   `-- user.py
>   `-- tests
>   |-- __init__.py
>   `-- test_stoat.py
>
> With in the poetry shell I can run the application successfully:
>
>   $ python stoat/main.py hpc user --help  
>  
>   Usage: main.py hpc user [OPTIONS] COMMAND [ARGS]...
>
> manage HPC users
>
>   Options:
> --help  Show this message and exit.
>
>   Commands:
> add add a user
> remove  remove a user
>
> I then install this in a non-standard path (because the OS Python3 is
> 3.6.8) and can run the installed version successfully:
>
>   $ PYTHONPATH=/trinity/shared/zedat/lib/python3.9/site-packages python 
> /trinity/shared/zedat/lib/python3.9/site-packages/stoat/main.py hpc user 
> --help
>   Usage: main.py hpc user [OPTIONS] COMMAND [ARGS]...
>
> manage HPC users
>
>   Options:
> --help  Show this message and exit.
>
>   Commands:
> add add a user
> remove  remove a user
>
> However, poetry creates a script 'stoat' from the entry
>
>   [tool.poetry.scripts]
>   stoat = "stoat.main:main"
>
> in pyproject.toml, which looks like
>
>   
> #!/trinity/shared/easybuild/software/Python/3.9.6-GCCcore-11.2.0/bin/python3.9
>   # -*- coding: utf-8 -*-
>   import re
>   import sys
>   from stoat.main import main
>   if __name__ == '__main__':
>   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
>   sys.exit(main())
>
> If I run that I get 
>
>   $ PYTHONPATH=/trinity/shared/zedat/lib/python3.9/site-packages stoat hpc 
> user --help
>   Traceback (most recent call last):
> File "/trinity/shared/zedat/bin/stoat", line 5, in 
>   from stoat.main import main
> File "/trinity/shared/zedat/lib/python3.9/site-packages/stoat/main.py", 
> line 3, in 
>   import hpc.main
>   ModuleNotFoundError: No module named 'hpc'
>
> Why is the module 'hpc' not found by the poetry script?

Never mind, I worked it out.  I had to replace 

  import hpc.main

with

  import stoat.hpc.main

However, this raises the question of why it worked in the first place
in the poetry shell. 

Cheers,

Loris

-- 
This signature is currently under construction.
-- 
https://mail.python.org/mailman/listinfo/python-list


poetry script fails to find module

2022-07-28 Thread Loris Bennett
Hi,

The following is a little bit involved, but I hope can make the problem clear.

Using poetry I have written a dummy application which just uses to typer
to illustrate a possible interface design.  The directory structure is a
follows: 

  $ tree -P *.py
  .
  |-- dist
  |-- stoat
  |   |-- hpc
  |   |   |-- database.py
  |   |   |-- group.py
  |   |   |-- __init__.py
  |   |   |-- main.py
  |   |   |-- owner.py
  |   |   `-- user.py
  |   |-- __init__.py
  |   |-- main.py
  |   `-- storage
  |   |-- database.py
  |   |-- group.py
  |   |-- __init__.py
  |   |-- main.py
  |   |-- owner.py
  |   |-- share.py
  |   `-- user.py
  `-- tests
  |-- __init__.py
  `-- test_stoat.py

With in the poetry shell I can run the application successfully:

  $ python stoat/main.py hpc user --help
   
  Usage: main.py hpc user [OPTIONS] COMMAND [ARGS]...

manage HPC users

  Options:
--help  Show this message and exit.

  Commands:
add add a user
remove  remove a user

I then install this in a non-standard path (because the OS Python3 is
3.6.8) and can run the installed version successfully:

  $ PYTHONPATH=/trinity/shared/zedat/lib/python3.9/site-packages python 
/trinity/shared/zedat/lib/python3.9/site-packages/stoat/main.py hpc user --help
  Usage: main.py hpc user [OPTIONS] COMMAND [ARGS]...

manage HPC users

  Options:
--help  Show this message and exit.

  Commands:
add add a user
remove  remove a user

However, poetry creates a script 'stoat' from the entry

  [tool.poetry.scripts]
  stoat = "stoat.main:main"

in pyproject.toml, which looks like

  #!/trinity/shared/easybuild/software/Python/3.9.6-GCCcore-11.2.0/bin/python3.9
  # -*- coding: utf-8 -*-
  import re
  import sys
  from stoat.main import main
  if __name__ == '__main__':
  sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
  sys.exit(main())

If I run that I get 

  $ PYTHONPATH=/trinity/shared/zedat/lib/python3.9/site-packages stoat hpc user 
--help
  Traceback (most recent call last):
File "/trinity/shared/zedat/bin/stoat", line 5, in 
  from stoat.main import main
File "/trinity/shared/zedat/lib/python3.9/site-packages/stoat/main.py", 
line 3, in 
  import hpc.main
  ModuleNotFoundError: No module named 'hpc'

Why is the module 'hpc' not found by the poetry script?

Cheers,

Loris

-- 
This signature is currently under construction.
-- 
https://mail.python.org/mailman/listinfo/python-list


Fwd: timedelta object recursion bug

2022-07-28 Thread Ben Hirsig
Hi, I noticed this when using the requests library in the response.elapsed
object (type timedelta). Tested using the standard datetime library alone
with the example displayed on
https://docs.python.org/3/library/datetime.html#examples-of-usage-timedelta



It appears as though the timedelta object recursively adds its own
attributes (min, max, resolution) as further timedelta objects. I’m not
sure how deep they go, but presumably hitting the recursion limit.



>from datetime import timedelta

>year = timedelta(days=365)

>print(year.max)

  9 days, 23:59:59.99

>print(year.max.min.max.resolution.max.min)

  -9 days, 0:00:00



I’m using 3.10.3



Cheers
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Barry


> On 28 Jul 2022, at 10:31, Morten W. Petersen  wrote:
> 
> 
> Hi Barry.
> 
> Well, I can agree that using backlog is an option for handling bursts. But 
> what if that backlog number is exceeded?  How easy is it to deal with such a 
> situation?

You can make backlog very large, if that makes sense.
But at some point you will be forced to reject connections,
once you cannot keep up with the average rate of connections.


> 
> I just cloned twisted, and compared the size:
> 
> morphex@morphex-Latitude-E4310:~$ du -s stp; du -s tmp/twisted/
> 464 stp
> 98520 tmp/twisted/
> morphex@morphex-Latitude-E4310:~$ du -sh stp/LICENSE 
> 36K stp/LICENSE
> 
> >>> 464/98520.0
> 0.004709703613479496
> >>> 
> 
> It's quite easy to get an idea of what's going on in STP, as opposed to if 
> something goes wrong in Twisted with the size of the codebase. I used to use 
> emacs a lot, but then I came into a period where it was more practical to use 
> nano, and I mostly use nano now, unless I need to for example search and 
> replace or something like that.

I mentioned twisted for context. Depending on yours need the built in python 3 
async support may well be sufficient for you needs. Using threads is not 
scalable.

In the places I code disk space of a few MiB is not an issue.

Barry

> 
> -Morten
> 
>> On Thu, Jul 28, 2022 at 8:31 AM Barry  wrote:
>> 
>> 
>> > On 27 Jul 2022, at 17:16, Morten W. Petersen  wrote:
>> > 
>> > Hi.
>> > 
>> > I'd like to share with you a recent project, which is a simple TCP proxy
>> > that can stand in front of a TCP server of some sort, queueing requests and
>> > then allowing n number of connections to pass through at a time:
>> > 
>> > https://github.com/morphex/stp
>> > 
>> > I'll be developing it further, but the the files committed in this tree
>> > seem to be stable:
>> > 
>> > https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465
>> > 
>> > I just bombed that code with 700+ requests almost simultaneously, and STP
>> > handled it well.
>> 
>> What is the problem that this solves?
>> 
>> Why not just increase the allowed size of the socket listen backlog if you 
>> just want to handle bursts of traffic.
>> 
>> I do not think of this as a proxy, rather a tunnel.
>> And the tunnel is a lot more expensive the having kernel keep the connection 
>> in
>> the listen socket backlog.
>> 
>> I work on a web proxy written on python that handles huge load and
>> using backlog of the bursts.
>> 
>> It’s async using twisted as threads are not practice at scale.
>> 
>> Barry
>> 
>> > 
>> > Regards,
>> > 
>> > Morten
>> > 
>> > -- 
>> > I am https://leavingnorway.info
>> > Videos at https://www.youtube.com/user/TheBlogologue
>> > Twittering at http://twitter.com/blogologue
>> > Blogging at http://blogologue.com
>> > Playing music at https://soundcloud.com/morten-w-petersen
>> > Also playing music and podcasting here:
>> > http://www.mixcloud.com/morten-w-petersen/
>> > On Google+ here https://plus.google.com/107781930037068750156
>> > On Instagram at https://instagram.com/morphexx/
>> > -- 
>> > https://mail.python.org/mailman/listinfo/python-list
>> > 
>> 
> 
> 
> -- 
> I am https://leavingnorway.info
> Videos at https://www.youtube.com/user/TheBlogologue
> Twittering at http://twitter.com/blogologue
> Blogging at http://blogologue.com
> Playing music at https://soundcloud.com/morten-w-petersen
> Also playing music and podcasting here: 
> http://www.mixcloud.com/morten-w-petersen/
> On Google+ here https://plus.google.com/107781930037068750156
> On Instagram at https://instagram.com/morphexx/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Chris Angelico
On Thu, 28 Jul 2022 at 21:01, Morten W. Petersen  wrote:
>
> Well, I was thinking of following the socketserver / handle layout of code 
> and execution, for now anyway.
>
> It wouldn't be a big deal to make them block, but another option is to 
> increase the sleep period 100% for every 200 waiting connections while 
> waiting in handle.

Easy denial-of-service attack then. Spam connections and the queue
starts blocking hard. The sleep loop seems like a rather inefficient
way to do things.

> Another thing is that it's nice to see Python handling 500+ threads without 
> problems. :)

Yeah, well, that's not all THAT many threads, ultimately :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Morten W. Petersen
Well, I was thinking of following the socketserver / handle layout of code
and execution, for now anyway.

It wouldn't be a big deal to make them block, but another option is to
increase the sleep period 100% for every 200 waiting connections while
waiting in handle.

Another thing is that it's nice to see Python handling 500+ threads without
problems. :)

-Morten

On Thu, Jul 28, 2022 at 11:45 AM Chris Angelico  wrote:

> On Thu, 28 Jul 2022 at 19:41, Morten W. Petersen 
> wrote:
> >
> > Hi Martin.
> >
> > I was thinking of doing something with the handle function, but just this
> > little tweak:
> >
> >
> https://github.com/morphex/stp/commit/9910ca8c80e9d150222b680a4967e53f0457b465
> >
> > made a huge difference in CPU usage.  Hundreds of waiting sockets are now
> > using 20-30% of CPU instead of 10x that.
>
>  wait, what?
>
> Why do waiting sockets consume *any* measurable amount of CPU? Why
> don't the threads simply block until it's time to do something?
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>


-- 
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Chris Angelico
On Thu, 28 Jul 2022 at 19:41, Morten W. Petersen  wrote:
>
> Hi Martin.
>
> I was thinking of doing something with the handle function, but just this
> little tweak:
>
> https://github.com/morphex/stp/commit/9910ca8c80e9d150222b680a4967e53f0457b465
>
> made a huge difference in CPU usage.  Hundreds of waiting sockets are now
> using 20-30% of CPU instead of 10x that.

 wait, what?

Why do waiting sockets consume *any* measurable amount of CPU? Why
don't the threads simply block until it's time to do something?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Morten W. Petersen
Hi Martin.

I was thinking of doing something with the handle function, but just this
little tweak:

https://github.com/morphex/stp/commit/9910ca8c80e9d150222b680a4967e53f0457b465

made a huge difference in CPU usage.  Hundreds of waiting sockets are now
using 20-30% of CPU instead of 10x that.  So for example making the handle
function exit / stop and wait isn't necessary at this point. It also opens
up the possibility of sending a noop that is appropriate for the given
protocol.

I've not done a lot of thread programming before, but yes, locks can be
used and will be used if necessary. I wasn't sure what data types were
thread safe in Python, and it might be that some variables could be off by
1 or more, if using <= >= checks is an option and that there is no risk of
the variable containing "garbage".

I think with a simple focus, that the project is aimed at one task, will
make it easier to manage even complex matters such as concurrency and
threads.

-Morten

On Wed, Jul 27, 2022 at 11:00 PM Martin Di Paola 
wrote:

>
> On Wed, Jul 27, 2022 at 08:32:31PM +0200, Morten W. Petersen wrote:
> >You're thinking of the backlog argument of listen?
>
>  From my understanding, yes, when you set up the "accepter" socket (the
> one that you use to listen and accept new connections), you can define
> the length of the queue for incoming connections that are not accepted
> yet.
>
> This will be the equivalent of your SimpleQueue which basically puts a
> limits on how many incoming connections are "accepted" to do a real job.
>
> Using skt.listen(N) the incoming connections are put on hold by the OS
> while in your implementation are formally accepted but they are not
> allowed to do any meaningful work: they are put on the SimpleQueue and
> only when they are popped then they will work (send/recv data).
>
> The difference then between the OS and your impl is minimal. The only
> case that I can think is that on the clients' side it may exist a
> timeout for the acceptance of the connection so your proxy server will
> eagerly accept these connections so no timeout is possible(*)
>
> On a side note, you implementation is too thread-naive: it uses plain
> Python lists, integers and boolean variables which are not thread safe.
> It is a matter of time until your server will start behave weird.
>
> One option is that you use thread-safe objects. I'll encourage to read
> about thread-safety in general and then which sync mechanisms Python
> offers.
>
> Another option is to remove the SimpleQueue and the background function
> that allows a connection to be "active".
>
> If you think, the handlers are 99% independent except that you want to
> allow only N of them to progress (stablish and forward the connection)
> and when a handler finishes, another handler "waiting" is activated, "in
> a queue fashion" as you said.
>
> If you allow me to not have a strict queue discipline here, you can achieve
> the same results coordinating the handlers using semaphores. Once again,
> take this email as starting point for your own research.
>
> On a second side note, the use of handlers and threads is inefficient
> because while you have N active handlers sending/receiving data, because
> you are eagerly accepting new connections you will have much more
> handlers created and (if I'm not wrong), each will be a thread.
>
> A more efficient solution could be
>
> 1) accept as many connections as you can, saving the socket (not the
> handler) in the thread-safe queue.
> 2) have N threads in the background popping from the queue a socket and
> then doing the send/recv stuff. When the thread is done, the thread
> closes the socket and pops another from the queue.
>
> So the queue length will be the count of accepted connections but in any
> moment your proxy will not activate (forward) more than N connections.
>
> This idea is thread-safe, simpler, efficient and has the queue
> discipline (I leave aside the usefulness).
>
> I encourage you to take time to read about the different things
> mentioned as concurrency and thread-related stuff is not easy to
> master.
>
> Thanks,
> Martin.
>
> (*) make your proxy server slow enough and yes, you will get timeouts
> anyways.
>
> >
> >Well, STP will accept all connections, but can limit how many of the
> >accepted connections that are active at any given time.
> >
> >So when I bombed it with hundreds of almost simultaneous connections, all
> >of them were accepted, but only 25 were actively sending and receiving
> data
> >at any given time. First come, first served.
> >
> >Regards,
> >
> >Morten
> >
> >On Wed, Jul 27, 2022 at 8:00 PM Chris Angelico  wrote:
> >
> >> On Thu, 28 Jul 2022 at 02:15, Morten W. Petersen 
> >> wrote:
> >> >
> >> > Hi.
> >> >
> >> > I'd like to share with you a recent project, which is a simple TCP
> proxy
> >> > that can stand in front of a TCP server of some sort, queueing
> requests
> >> and
> >> > then allowing n number of connections to pass through at a time:
> >>
> >> How

Re: Simple TCP proxy

2022-07-28 Thread Morten W. Petersen
Hi Barry.

Well, I can agree that using backlog is an option for handling bursts. But
what if that backlog number is exceeded?  How easy is it to deal with such
a situation?

I just cloned twisted, and compared the size:

morphex@morphex-Latitude-E4310:~$ du -s stp; du -s tmp/twisted/
464 stp
98520 tmp/twisted/
morphex@morphex-Latitude-E4310:~$ du -sh stp/LICENSE
36K stp/LICENSE

>>> 464/98520.0
0.004709703613479496
>>>

It's quite easy to get an idea of what's going on in STP, as opposed to if
something goes wrong in Twisted with the size of the codebase. I used to
use emacs a lot, but then I came into a period where it was more practical
to use nano, and I mostly use nano now, unless I need to for example search
and replace or something like that.

-Morten

On Thu, Jul 28, 2022 at 8:31 AM Barry  wrote:

>
>
> > On 27 Jul 2022, at 17:16, Morten W. Petersen  wrote:
> >
> > Hi.
> >
> > I'd like to share with you a recent project, which is a simple TCP proxy
> > that can stand in front of a TCP server of some sort, queueing requests
> and
> > then allowing n number of connections to pass through at a time:
> >
> > https://github.com/morphex/stp
> >
> > I'll be developing it further, but the the files committed in this tree
> > seem to be stable:
> >
> >
> https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465
> >
> > I just bombed that code with 700+ requests almost simultaneously, and STP
> > handled it well.
>
> What is the problem that this solves?
>
> Why not just increase the allowed size of the socket listen backlog if you
> just want to handle bursts of traffic.
>
> I do not think of this as a proxy, rather a tunnel.
> And the tunnel is a lot more expensive the having kernel keep the
> connection in
> the listen socket backlog.
>
> I work on a web proxy written on python that handles huge load and
> using backlog of the bursts.
>
> It’s async using twisted as threads are not practice at scale.
>
> Barry
>
> >
> > Regards,
> >
> > Morten
> >
> > --
> > I am https://leavingnorway.info
> > Videos at https://www.youtube.com/user/TheBlogologue
> > Twittering at http://twitter.com/blogologue
> > Blogging at http://blogologue.com
> > Playing music at https://soundcloud.com/morten-w-petersen
> > Also playing music and podcasting here:
> > http://www.mixcloud.com/morten-w-petersen/
> > On Google+ here https://plus.google.com/107781930037068750156
> > On Instagram at https://instagram.com/morphexx/
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> >
>
>

-- 
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple TCP proxy

2022-07-28 Thread Morten W. Petersen
OK, I'll have a look at using something else than _threading.

I quickly saw a couple of points where code could be optimized for speed,
the loop that transfers data back and forth also has low throughput, but
first priority was getting it working and seeing that it is fairly stable.

Regards,

Morten
--

I am https://leavingnorway.info

Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue

Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen

Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/

On Instagram at https://instagram.com/morphexx/



On Wed, Jul 27, 2022 at 9:57 PM Chris Angelico  wrote:

> On Thu, 28 Jul 2022 at 04:32, Morten W. Petersen 
> wrote:
> >
> > Hi Chris.
> >
> > You're thinking of the backlog argument of listen?
>
> Yes, precisely.
>
> > Well, STP will accept all connections, but can limit how many of the
> accepted connections that are active at any given time.
> >
> > So when I bombed it with hundreds of almost simultaneous connections,
> all of them were accepted, but only 25 were actively sending and receiving
> data at any given time. First come, first served.
> >
>
> Hmm. Okay. Not sure what the advantage is, but sure.
>
> If the server's capable of handling the total requests-per-minute,
> then a queueing system like this should help with burst load, although
> I would have thought that the listen backlog would do the same. What
> happens if the server actually gets overloaded though? Do connections
> get disconnected after appearing connected? What's the disconnect
> mode?
>
> BTW, you probably don't want to be using the _thread module - Python
> has a threading module which is better suited to this sort of work.
> Although you may want to consider asyncio instead, as that has far
> lower overhead when working with large numbers of sockets.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>


-- 
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/
-- 
https://mail.python.org/mailman/listinfo/python-list