Hi Henrik, Thanks for sharing. I get the following when running the ws-demo:
./pil pl-web/ws-demo/main.l -go .. !? (wsServer) wsServer -- Undefined I can't find the definition of wsServer anywhere. Is it missing from the repo? Thanks, Joe On Mon, Jan 4, 2016 at 4:27 PM, Henrik Sarvell <hsarv...@gmail.com> wrote: > Update: > > The socketserver is now completely reliant on Redis, using Redis' pub > / sub functionality: http://redis.io/topics/pubsub > > The reason for this is that I was using the websocket server to handle > all websockets functionality for the site I'm being paid to work on > and it started running into problems as the site grew, the first issue > was an easy fix after Alex pointed me to it, increasing the amount of > file descriptors in src64/sys/x86-64.linux.defs.l, my line #115 now > looks like this: (equ FD_SET 1024) # 1024 bit > > After re-compiling I could easily handle more than 500 clients and all > was well for a while. > > Unfortunately the site is growing so fast that just some month(s) > later the parent / root process started intermittently running at 100% > CPU utilization and the service stopped working for perhaps 10-20 > minutes before resolving on its own. At this point peak usage involved > 2000 clients being connected at the same time. > > Alex suspects that the issue has got to do with how the internal logic > handles new processes being created when there are already a lot of > them present. In a normal HTTP server scenario this probably never > happens, imagine that every request takes on average 1 second to > perform before the socket closes, you would then need about 2000 > requests per second in order to trigger the CPU problem, you'll run > into many other issues long before that happens in a non-trivial > scenario (trust me I've tested). > > In the end we switched over to a node.js based solution that also > relies on Redis' pub / sub functionality (that's where I got the idea > from to make the PL based solution also use it). > > I have tried to replicate the real world situation load wise and > number of clients wise but not been able to trigger the CPU issue > (this also seems to imply that Alex's suspicion is not completely on > target), it's impossible for me to replicate the real world situation > since I can't commandeer hundreds of machines all over the world to > connect to my test server. What I did manage to trigger though was > fairly high CPU usage in the child processes though, a situation that > also involved loss of service. After the switch to using pub / sub I > haven't been able to trigger it, so that's a win at least. > > Now for the real improvement, actually making HTTP requests to publish > something becomes redundant when publishing from server to client > since it's just a matter of issuing a publish call directly to Redis > instead. That lowers the amount of process creation by more than 90% > in my use case. > > Even though I can't be 100% sure as it currently stands I believe that > if I had implemented the websocket server using Redis' pub / sub to > begin with the CPU issue would probably never have happened and there > would've been no need to switch over to node.js. > > That being said, this type of service / application is better suited > for threads since the cost in RAM etc is lower. > > Final note, my decision to use one socket per feature was poor, it > allowed me a simpler architecture but had I opted for one socket with > "routing" logic implemented in the browser instead I could have > lowered the amount of simultaneous sockets up to 8 times. Peak usage > would then have been 2000 / 8 = 250 processes. Not only that, it turns > out that IE (yes, even version 11 / edge) only allows 6 simultaneous > sockets (including in iframes) per page. We've therefore been forced > to turn off for instance the tournament functionality for IE users. > > > > On Fri, Jun 26, 2015 at 9:30 PM, Henrik Sarvell <hsarv...@gmail.com> > wrote: > > Hi all, after over a month without any of the prior issues I now > > consider the websockets part of pl-web stable: > > https://bitbucket.org/hsarvell/pl-web Gone are the days of 100% CPU > > usage and zombie processes. > > > > With Alex's help the main web server is now more stable (he made me > > throw away a few throws in favour of a few byes). The throws were > > causing the zombies. > > > > I was also including dbg.l (it was causing hung processes at 100% > > CPU), it's basically been deprecated or something, I'll leave it up to > > him to elaborate. It's just something I've been including by habit > > since years ago when at some point I needed to include it to do some > > kind of debugging. > > > > Anyway atm the WS router is regularly routing up to 40 messages per > > second to upwards 300-500 clients which means that roughly 20,000 > > messages are being pushed out per second during peak hours. > > > > The PL processes show up with 0 CPU and 0 RAM usage when I run top, > > sometimes 1% CPU :) They hardly register even i aggregate, the server > > would be running 99% idle if it was only running the WS server. > > > > To work around the inter-process limit of 4096 byte long messages the > > router now supports storing the messages in Redis (raw disk is also > > supported if Redis is not available), this is also in effect in > > production and is working flawlessly since months. > > > > This is how I start the WS server in production: > > > > (load "pl-web/pl-web.l") > > > > (setq *Mobj (new '(+Redis) "pl-ws-")) > > > > (undef 'app) > > > > (setq *WsAuth '(("notifications" (("send" ("put your password/key > here")))))) > > > > (de app () > > (splitPath) > > (wsApp) > > (bye)) > > > > (de go () > > (wsServer) > > (server 9090) ) > -- > UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe >