Re: Websockets now considered stable
Hi Joe, thanks for the pointer, had totally forgotten about that demo app. I've removed the call from the demo, the wsServer logic has been removed completely since it's now Redis that is responsible for routing messages through the pub / sub handling. I've also updated the readme with the fact that Redis is now a required dependency. On Thu, Jan 14, 2016 at 6:17 PM, Joe Bognerwrote: > Hi Henrik, > > Thanks for sharing. I get the following when running the ws-demo: > > ./pil pl-web/ws-demo/main.l -go > ... > !? (wsServer) > wsServer -- Undefined > > I can't find the definition of wsServer anywhere. Is it missing from the > repo? > > Thanks, > Joe > > On Mon, Jan 4, 2016 at 4:27 PM, Henrik Sarvell wrote: >> >> Update: >> >> The socketserver is now completely reliant on Redis, using Redis' pub >> / sub functionality: http://redis.io/topics/pubsub >> >> The reason for this is that I was using the websocket server to handle >> all websockets functionality for the site I'm being paid to work on >> and it started running into problems as the site grew, the first issue >> was an easy fix after Alex pointed me to it, increasing the amount of >> file descriptors in src64/sys/x86-64.linux.defs.l, my line #115 now >> looks like this: (equ FD_SET 1024) # 1024 bit >> >> After re-compiling I could easily handle more than 500 clients and all >> was well for a while. >> >> Unfortunately the site is growing so fast that just some month(s) >> later the parent / root process started intermittently running at 100% >> CPU utilization and the service stopped working for perhaps 10-20 >> minutes before resolving on its own. At this point peak usage involved >> 2000 clients being connected at the same time. >> >> Alex suspects that the issue has got to do with how the internal logic >> handles new processes being created when there are already a lot of >> them present. In a normal HTTP server scenario this probably never >> happens, imagine that every request takes on average 1 second to >> perform before the socket closes, you would then need about 2000 >> requests per second in order to trigger the CPU problem, you'll run >> into many other issues long before that happens in a non-trivial >> scenario (trust me I've tested). >> >> In the end we switched over to a node.js based solution that also >> relies on Redis' pub / sub functionality (that's where I got the idea >> from to make the PL based solution also use it). >> >> I have tried to replicate the real world situation load wise and >> number of clients wise but not been able to trigger the CPU issue >> (this also seems to imply that Alex's suspicion is not completely on >> target), it's impossible for me to replicate the real world situation >> since I can't commandeer hundreds of machines all over the world to >> connect to my test server. What I did manage to trigger though was >> fairly high CPU usage in the child processes though, a situation that >> also involved loss of service. After the switch to using pub / sub I >> haven't been able to trigger it, so that's a win at least. >> >> Now for the real improvement, actually making HTTP requests to publish >> something becomes redundant when publishing from server to client >> since it's just a matter of issuing a publish call directly to Redis >> instead. That lowers the amount of process creation by more than 90% >> in my use case. >> >> Even though I can't be 100% sure as it currently stands I believe that >> if I had implemented the websocket server using Redis' pub / sub to >> begin with the CPU issue would probably never have happened and there >> would've been no need to switch over to node.js. >> >> That being said, this type of service / application is better suited >> for threads since the cost in RAM etc is lower. >> >> Final note, my decision to use one socket per feature was poor, it >> allowed me a simpler architecture but had I opted for one socket with >> "routing" logic implemented in the browser instead I could have >> lowered the amount of simultaneous sockets up to 8 times. Peak usage >> would then have been 2000 / 8 = 250 processes. Not only that, it turns >> out that IE (yes, even version 11 / edge) only allows 6 simultaneous >> sockets (including in iframes) per page. We've therefore been forced >> to turn off for instance the tournament functionality for IE users. >> >> >> >> On Fri, Jun 26, 2015 at 9:30 PM, Henrik Sarvell >> wrote: >> > Hi all, after over a month without any of the prior issues I now >> > consider the websockets part of pl-web stable: >> > https://bitbucket.org/hsarvell/pl-web Gone are the days of 100% CPU >> > usage and zombie processes. >> > >> > With Alex's help the main web server is now more stable (he made me >> > throw away a few throws in favour of a few byes). The throws were >> > causing the zombies. >> > >> > I was also including dbg.l (it was causing hung processes at 100% >> > CPU), it's
Re: Websockets now considered stable
Hi Henrik, Thanks for sharing. I get the following when running the ws-demo: ./pil pl-web/ws-demo/main.l -go .. !? (wsServer) wsServer -- Undefined I can't find the definition of wsServer anywhere. Is it missing from the repo? Thanks, Joe On Mon, Jan 4, 2016 at 4:27 PM, Henrik Sarvellwrote: > Update: > > The socketserver is now completely reliant on Redis, using Redis' pub > / sub functionality: http://redis.io/topics/pubsub > > The reason for this is that I was using the websocket server to handle > all websockets functionality for the site I'm being paid to work on > and it started running into problems as the site grew, the first issue > was an easy fix after Alex pointed me to it, increasing the amount of > file descriptors in src64/sys/x86-64.linux.defs.l, my line #115 now > looks like this: (equ FD_SET 1024) # 1024 bit > > After re-compiling I could easily handle more than 500 clients and all > was well for a while. > > Unfortunately the site is growing so fast that just some month(s) > later the parent / root process started intermittently running at 100% > CPU utilization and the service stopped working for perhaps 10-20 > minutes before resolving on its own. At this point peak usage involved > 2000 clients being connected at the same time. > > Alex suspects that the issue has got to do with how the internal logic > handles new processes being created when there are already a lot of > them present. In a normal HTTP server scenario this probably never > happens, imagine that every request takes on average 1 second to > perform before the socket closes, you would then need about 2000 > requests per second in order to trigger the CPU problem, you'll run > into many other issues long before that happens in a non-trivial > scenario (trust me I've tested). > > In the end we switched over to a node.js based solution that also > relies on Redis' pub / sub functionality (that's where I got the idea > from to make the PL based solution also use it). > > I have tried to replicate the real world situation load wise and > number of clients wise but not been able to trigger the CPU issue > (this also seems to imply that Alex's suspicion is not completely on > target), it's impossible for me to replicate the real world situation > since I can't commandeer hundreds of machines all over the world to > connect to my test server. What I did manage to trigger though was > fairly high CPU usage in the child processes though, a situation that > also involved loss of service. After the switch to using pub / sub I > haven't been able to trigger it, so that's a win at least. > > Now for the real improvement, actually making HTTP requests to publish > something becomes redundant when publishing from server to client > since it's just a matter of issuing a publish call directly to Redis > instead. That lowers the amount of process creation by more than 90% > in my use case. > > Even though I can't be 100% sure as it currently stands I believe that > if I had implemented the websocket server using Redis' pub / sub to > begin with the CPU issue would probably never have happened and there > would've been no need to switch over to node.js. > > That being said, this type of service / application is better suited > for threads since the cost in RAM etc is lower. > > Final note, my decision to use one socket per feature was poor, it > allowed me a simpler architecture but had I opted for one socket with > "routing" logic implemented in the browser instead I could have > lowered the amount of simultaneous sockets up to 8 times. Peak usage > would then have been 2000 / 8 = 250 processes. Not only that, it turns > out that IE (yes, even version 11 / edge) only allows 6 simultaneous > sockets (including in iframes) per page. We've therefore been forced > to turn off for instance the tournament functionality for IE users. > > > > On Fri, Jun 26, 2015 at 9:30 PM, Henrik Sarvell > wrote: > > Hi all, after over a month without any of the prior issues I now > > consider the websockets part of pl-web stable: > > https://bitbucket.org/hsarvell/pl-web Gone are the days of 100% CPU > > usage and zombie processes. > > > > With Alex's help the main web server is now more stable (he made me > > throw away a few throws in favour of a few byes). The throws were > > causing the zombies. > > > > I was also including dbg.l (it was causing hung processes at 100% > > CPU), it's basically been deprecated or something, I'll leave it up to > > him to elaborate. It's just something I've been including by habit > > since years ago when at some point I needed to include it to do some > > kind of debugging. > > > > Anyway atm the WS router is regularly routing up to 40 messages per > > second to upwards 300-500 clients which means that roughly 20,000 > > messages are being pushed out per second during peak hours. > > > > The PL processes show up with 0 CPU and 0 RAM usage when I run top, > > sometimes 1% CPU
Re: Websockets now considered stable
Update: The socketserver is now completely reliant on Redis, using Redis' pub / sub functionality: http://redis.io/topics/pubsub The reason for this is that I was using the websocket server to handle all websockets functionality for the site I'm being paid to work on and it started running into problems as the site grew, the first issue was an easy fix after Alex pointed me to it, increasing the amount of file descriptors in src64/sys/x86-64.linux.defs.l, my line #115 now looks like this: (equ FD_SET 1024) # 1024 bit After re-compiling I could easily handle more than 500 clients and all was well for a while. Unfortunately the site is growing so fast that just some month(s) later the parent / root process started intermittently running at 100% CPU utilization and the service stopped working for perhaps 10-20 minutes before resolving on its own. At this point peak usage involved 2000 clients being connected at the same time. Alex suspects that the issue has got to do with how the internal logic handles new processes being created when there are already a lot of them present. In a normal HTTP server scenario this probably never happens, imagine that every request takes on average 1 second to perform before the socket closes, you would then need about 2000 requests per second in order to trigger the CPU problem, you'll run into many other issues long before that happens in a non-trivial scenario (trust me I've tested). In the end we switched over to a node.js based solution that also relies on Redis' pub / sub functionality (that's where I got the idea from to make the PL based solution also use it). I have tried to replicate the real world situation load wise and number of clients wise but not been able to trigger the CPU issue (this also seems to imply that Alex's suspicion is not completely on target), it's impossible for me to replicate the real world situation since I can't commandeer hundreds of machines all over the world to connect to my test server. What I did manage to trigger though was fairly high CPU usage in the child processes though, a situation that also involved loss of service. After the switch to using pub / sub I haven't been able to trigger it, so that's a win at least. Now for the real improvement, actually making HTTP requests to publish something becomes redundant when publishing from server to client since it's just a matter of issuing a publish call directly to Redis instead. That lowers the amount of process creation by more than 90% in my use case. Even though I can't be 100% sure as it currently stands I believe that if I had implemented the websocket server using Redis' pub / sub to begin with the CPU issue would probably never have happened and there would've been no need to switch over to node.js. That being said, this type of service / application is better suited for threads since the cost in RAM etc is lower. Final note, my decision to use one socket per feature was poor, it allowed me a simpler architecture but had I opted for one socket with "routing" logic implemented in the browser instead I could have lowered the amount of simultaneous sockets up to 8 times. Peak usage would then have been 2000 / 8 = 250 processes. Not only that, it turns out that IE (yes, even version 11 / edge) only allows 6 simultaneous sockets (including in iframes) per page. We've therefore been forced to turn off for instance the tournament functionality for IE users. On Fri, Jun 26, 2015 at 9:30 PM, Henrik Sarvellwrote: > Hi all, after over a month without any of the prior issues I now > consider the websockets part of pl-web stable: > https://bitbucket.org/hsarvell/pl-web Gone are the days of 100% CPU > usage and zombie processes. > > With Alex's help the main web server is now more stable (he made me > throw away a few throws in favour of a few byes). The throws were > causing the zombies. > > I was also including dbg.l (it was causing hung processes at 100% > CPU), it's basically been deprecated or something, I'll leave it up to > him to elaborate. It's just something I've been including by habit > since years ago when at some point I needed to include it to do some > kind of debugging. > > Anyway atm the WS router is regularly routing up to 40 messages per > second to upwards 300-500 clients which means that roughly 20,000 > messages are being pushed out per second during peak hours. > > The PL processes show up with 0 CPU and 0 RAM usage when I run top, > sometimes 1% CPU :) They hardly register even i aggregate, the server > would be running 99% idle if it was only running the WS server. > > To work around the inter-process limit of 4096 byte long messages the > router now supports storing the messages in Redis (raw disk is also > supported if Redis is not available), this is also in effect in > production and is working flawlessly since months. > > This is how I start the WS server in production: > > (load "pl-web/pl-web.l") > > (setq *Mobj (new
Re: Websockets now considered stable
Hi Rick, seems like a fix would be a check there: if sessions dir doesn't exist (and Redis isn't used to store the session) create it and move on instead of breaking down in tears. Hi Henrik! Yes, I agree. BTW, thanks. I forgot to thank you before for sharing this! -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Websockets now considered stable
Hi Rick, seems like a fix would be a check there: if sessions dir doesn't exist (and Redis isn't used to store the session) create it and move on instead of breaking down in tears. On Sun, Jun 28, 2015 at 10:47 PM, Rick Hanson cryptor...@gmail.com wrote: I downloaded pl-web and ext and ran the demo-app. When I went to the login page, I got this Open error in the console: # excmd redefined # exlst redefined # exlst redefined !? (out Sf (print (list (list sid *Sid /home/rick/projects/pl-web-demo/./sessions/6d61fa61b9cc1d8fd878f4b534703473 -- Open error: No such file or directory ? But after I quit, issued a: $ mkdir sessions and re-started the server, I never got the error again -- everything worked as expected. (I got the login creds from main.l.) -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Websockets now considered stable
What gives?! This stuff is broken!!! $ git clone https://bitbucket.org/hsarvell/pl-web Cloning into 'pl-web'... fatal: repository 'https://bitbucket.org/hsarvell/pl-web/' not found Just yanking your chain. I know this is a mercurial repo. :) Thanks, man. Looks good. I'll study the code when I get time in the next few days. On Fri, Jun 26, 2015 at 3:30 PM, Henrik Sarvell hsarv...@gmail.com wrote: Hi all, after over a month without any of the prior issues I now consider the websockets part of pl-web stable: https://bitbucket.org/hsarvell/pl-web Gone are the days of 100% CPU usage and zombie processes. With Alex's help the main web server is now more stable (he made me throw away a few throws in favour of a few byes). The throws were causing the zombies. I was also including dbg.l (it was causing hung processes at 100% CPU), it's basically been deprecated or something, I'll leave it up to him to elaborate. It's just something I've been including by habit since years ago when at some point I needed to include it to do some kind of debugging. Anyway atm the WS router is regularly routing up to 40 messages per second to upwards 300-500 clients which means that roughly 20,000 messages are being pushed out per second during peak hours. The PL processes show up with 0 CPU and 0 RAM usage when I run top, sometimes 1% CPU :) They hardly register even i aggregate, the server would be running 99% idle if it was only running the WS server. To work around the inter-process limit of 4096 byte long messages the router now supports storing the messages in Redis (raw disk is also supported if Redis is not available), this is also in effect in production and is working flawlessly since months. This is how I start the WS server in production: (load pl-web/pl-web.l) (setq *Mobj (new '(+Redis) pl-ws-)) (undef 'app) (setq *WsAuth '((notifications ((send (put your password/key here)) (de app () (splitPath) (wsApp) (bye)) (de go () (wsServer) (server 9090) ) -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Websockets now considered stable
Hi Henrik, hi Andreas, Question: To work around the inter-process limit of 4096 byte long messages the router now supports storing the messages in Redis 1) Where comes this limit from? POSIX IPC? PicoLisp IPC ? 1) As far as I remember from a discussion with Alex it's a hard limit (OS related). I think this was about the constant PIPE_BUF /usr/include/linux/limits.h #define PIPE_BUF4096 /* # bytes in atomic write to a pipe */ Used to be just 512 bytes on older Unixes ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Websockets now considered stable
On Fri, Jun 26, 2015 at 09:30:58PM +0200, Henrik Sarvell wrote: I was also including dbg.l (it was causing hung processes at 100% CPU), it's basically been deprecated or something, I'll leave it up to him to elaborate. IIRC, the problem was not so much including dbg.l, but starting the application without stdio redirection to some log file. As a result, errors or other messages in the background server caused broken pipe exceptions. ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
RE: Websockets now considered stable
Hi Henrik Awesome! That's really cool, thank you for your effort and for sharing the code :-) 20k message with nearly zero server load sounds very impressive. Question: To work around the inter-process limit of 4096 byte long messages the router now supports storing the messages in Redis 1) Where comes this limit from? POSIX IPC? PicoLisp IPC ? 2) I couldn't find the redis part in the code, maybe you can give me a hint where to look? Thanks, your work on websockets will definitely help me in in the future :-) - beneroth - Original Message - From: Henrik Sarvell [mailto:hsarv...@gmail.com] To: picolisp@software-lab.de Sent: Fri, 26 Jun 2015 21:30:58 +0200 Subject: Hi all, after over a month without any of the prior issues I now consider the websockets part of pl-web stable: https://bitbucket.org/hsarvell/pl-web Gone are the days of 100% CPU usage and zombie processes. With Alex's help the main web server is now more stable (he made me throw away a few throws in favour of a few byes). The throws were causing the zombies. I was also including dbg.l (it was causing hung processes at 100% CPU), it's basically been deprecated or something, I'll leave it up to him to elaborate. It's just something I've been including by habit since years ago when at some point I needed to include it to do some kind of debugging. Anyway atm the WS router is regularly routing up to 40 messages per second to upwards 300-500 clients which means that roughly 20,000 messages are being pushed out per second during peak hours. The PL processes show up with 0 CPU and 0 RAM usage when I run top, sometimes 1% CPU :) They hardly register even i aggregate, the server would be running 99% idle if it was only running the WS server. To work around the inter-process limit of 4096 byte long messages the router now supports storing the messages in Redis (raw disk is also supported if Redis is not available), this is also in effect in production and is working flawlessly since months. This is how I start the WS server in production: (load pl-web/pl-web.l) (setq *Mobj (new '(+Redis) pl-ws-)) (undef 'app) (setq *WsAuth '((notifications ((send (put your password/key here)) (de app () (splitPath) (wsApp) (bye)) (de go () (wsServer) (server 9090) ) -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Websockets now considered stable
Hi all, after over a month without any of the prior issues I now consider the websockets part of pl-web stable: https://bitbucket.org/hsarvell/pl-web Gone are the days of 100% CPU usage and zombie processes. With Alex's help the main web server is now more stable (he made me throw away a few throws in favour of a few byes). The throws were causing the zombies. I was also including dbg.l (it was causing hung processes at 100% CPU), it's basically been deprecated or something, I'll leave it up to him to elaborate. It's just something I've been including by habit since years ago when at some point I needed to include it to do some kind of debugging. Anyway atm the WS router is regularly routing up to 40 messages per second to upwards 300-500 clients which means that roughly 20,000 messages are being pushed out per second during peak hours. The PL processes show up with 0 CPU and 0 RAM usage when I run top, sometimes 1% CPU :) They hardly register even i aggregate, the server would be running 99% idle if it was only running the WS server. To work around the inter-process limit of 4096 byte long messages the router now supports storing the messages in Redis (raw disk is also supported if Redis is not available), this is also in effect in production and is working flawlessly since months. This is how I start the WS server in production: (load pl-web/pl-web.l) (setq *Mobj (new '(+Redis) pl-ws-)) (undef 'app) (setq *WsAuth '((notifications ((send (put your password/key here)) (de app () (splitPath) (wsApp) (bye)) (de go () (wsServer) (server 9090) ) -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: Websockets now considered stable
Hi Andreas. 1) As far as I remember from a discussion with Alex it's a hard limit (OS related). 2) Line 369 - 372 here: https://bitbucket.org/hsarvell/pl-web/src/c445ca3861159d0b28ea779a183572c91b7b8458/pl-web.l?at=default On Fri, Jun 26, 2015 at 9:50 PM, andr...@itship.ch wrote: Hi Henrik Awesome! That's really cool, thank you for your effort and for sharing the code :-) 20k message with nearly zero server load sounds very impressive. Question: To work around the inter-process limit of 4096 byte long messages the router now supports storing the messages in Redis 1) Where comes this limit from? POSIX IPC? PicoLisp IPC ? 2) I couldn't find the redis part in the code, maybe you can give me a hint where to look? Thanks, your work on websockets will definitely help me in in the future :-) - beneroth - Original Message - From: Henrik Sarvell [mailto:hsarv...@gmail.com] To: picolisp@software-lab.de Sent: Fri, 26 Jun 2015 21:30:58 +0200 Subject: Hi all, after over a month without any of the prior issues I now consider the websockets part of pl-web stable: https://bitbucket.org/hsarvell/pl-web Gone are the days of 100% CPU usage and zombie processes. With Alex's help the main web server is now more stable (he made me throw away a few throws in favour of a few byes). The throws were causing the zombies. I was also including dbg.l (it was causing hung processes at 100% CPU), it's basically been deprecated or something, I'll leave it up to him to elaborate. It's just something I've been including by habit since years ago when at some point I needed to include it to do some kind of debugging. Anyway atm the WS router is regularly routing up to 40 messages per second to upwards 300-500 clients which means that roughly 20,000 messages are being pushed out per second during peak hours. The PL processes show up with 0 CPU and 0 RAM usage when I run top, sometimes 1% CPU :) They hardly register even i aggregate, the server would be running 99% idle if it was only running the WS server. To work around the inter-process limit of 4096 byte long messages the router now supports storing the messages in Redis (raw disk is also supported if Redis is not available), this is also in effect in production and is working flawlessly since months. This is how I start the WS server in production: (load pl-web/pl-web.l) (setq *Mobj (new '(+Redis) pl-ws-)) (undef 'app) (setq *WsAuth '((notifications ((send (put your password/key here)) (de app () (splitPath) (wsApp) (bye)) (de go () (wsServer) (server 9090) ) -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe