Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-10 Thread Jon Zeppieri
On Sat, Sep 9, 2017 at 6:25 PM, Jon Zeppieri  wrote:
> It looks like after roughly 2^14 requests are
> `accept`-ed, there's a *long* delay before the next one succeeds.

Okay, the above happens when the host runs out of ephemeral ports. So,
not a big deal.
---

My tests suggest the same thing (w.r.t. places) that D. Bohdan's do:
that using places consistently lowers the server throughput (even when
there are multiple cores available). Don't know why, though.

I wanted to see if inter-place communication was the bottleneck, so I
made some changes to allow the individual places to do their work
without needing to communicate:

- First, I made tcp-listeners able to be sent over place-channels, so
the only inter-place communication would be at initialization time.

- Then I realized I could accomplish the same goal with a lot less
fuss by changing the meaning of tcp-listen's reuse? parameter so that
it would set SO_REUSEPORT (instead of SO_REUSEADDR) on the listening
socket. (This allows each place to bind to the same port without
needing any inter-place communication at all.)

This did not improve throughput. But it doesn't exactly prove that
inter-place communication isn't a bottleneck, since both of the above
changes required some other changes to rktio, which, for all I know,
may have caused different performance problems. (If multiple OS
threads are polling the same socket, you need to handle the race
between them to accept an incoming connection.)

So, I'm still puzzled by this.

-J

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-09 Thread Jon Zeppieri
On Sat, Sep 9, 2017 at 8:05 PM, Jon Zeppieri  wrote:
> On Sat, Sep 9, 2017 at 7:52 PM, Jon Zeppieri  wrote:
>>
>> It does seem odd, though, that the server seems to *favor* sending
>> ACKs to clients it can't service over responding to the ones it can.
>
> No, there has to be something else wrong. The tcpdump output shows
> significant gaps in time while this ACT/RST game is going on (I'm
> looking at a gap of 8 seconds right now), so there's plenty of time
> where the server is just sitting idle. But, for whatever reason, the
> `poll`-ing Racket program isn't waking up. -J


As it turns out, the same thing happens with the Caddy example, too,
so it seems to be an OS X thing, rather than a Racket thing. -J

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-09 Thread Jon Zeppieri
On Sat, Sep 9, 2017 at 7:52 PM, Jon Zeppieri  wrote:
>
> It does seem odd, though, that the server seems to *favor* sending
> ACKs to clients it can't service over responding to the ones it can.

No, there has to be something else wrong. The tcpdump output shows
significant gaps in time while this ACT/RST game is going on (I'm
looking at a gap of 8 seconds right now), so there's plenty of time
where the server is just sitting idle. But, for whatever reason, the
`poll`-ing Racket program isn't waking up. -J

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-09 Thread Jon Zeppieri
On Sat, Sep 9, 2017 at 6:25 PM, Jon Zeppieri  wrote:
> When I ran experiments similar to yours on OS X I saw some odd
> scheduling behavior. It looks like after roughly 2^14 requests are
> `accept`-ed, there's a *long* delay before the next one succeeds. It
> appears that the program is `poll`-ing, waiting for activity, but, for
> whatever reason, it doesn't receive notice of any for a long time.
>
> - Jon


Okay, it seems this occurs when the listen backlog fills up (the
listen(2) man page on OS X says that the backlog is limited to 128),
at which point the server stops sending SYN-ACKs (it appears to send
ACKs instead), and the clients respond with RSTs. It looks like the
server and client play this game for some time, where the clients
backoff exponentially, so that their reset requests come more
infrequently, until the server can manage to start processing requests
again.

It does seem odd, though, that the server seems to *favor* sending
ACKs to clients it can't service over responding to the ones it can.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-09 Thread Jon Zeppieri
When I ran experiments similar to yours on OS X I saw some odd
scheduling behavior. It looks like after roughly 2^14 requests are
`accept`-ed, there's a *long* delay before the next one succeeds. It
appears that the program is `poll`-ing, waiting for activity, but, for
whatever reason, it doesn't receive notice of any for a long time.

- Jon

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-03 Thread antoine
A time ago i have implemented a minimal fastcgi protocol and compare it 
against various others implementations.


http://antoineb.github.io/blog/2015/06/02/basic-fastcgi-with-racket/


On 09/02/2017 10:12 PM, Neil Van Dyke wrote:

dbohdan wrote on 09/02/2017 03:12 PM:
I rather like the SCGI protocol. It's a pity that it isn't as widely 
supported as FastCGI, considering that it's much simpler to implement 
(second only to plain old CGI), but still has a performance profile 
similar to FastCGI's. 


I mostly implemented FastCGI in Racket at one point, but then I read 
about the FastCGI implementation in my target HTTP server having hard 
bugs, so I abandoned that.


I also think there are faster ways to serve HTTP from Racket, but I'd 
have to find funding to work through them.


And we have to keep in mind that, unlike benchmarks for LINPACK or 
standard transaction processing, the real-world applications of HTTP 
servers are messier.  And also, I don't think many people have been 
tuning for Web application benchmarks, unlike was once done for 
LINPACK and TP.  I think the Racket community has enough skill to make 
a respectable showing in a benchmark tuning war, or in general 
platform performance for real-world Web applications, but I'm not 
aware of any funding going into that right now.




--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-02 Thread Neil Van Dyke

dbohdan wrote on 09/02/2017 03:12 PM:
I rather like the SCGI protocol. It's a pity that it isn't as widely 
supported as FastCGI, considering that it's much simpler to implement 
(second only to plain old CGI), but still has a performance profile 
similar to FastCGI's. 


I mostly implemented FastCGI in Racket at one point, but then I read 
about the FastCGI implementation in my target HTTP server having hard 
bugs, so I abandoned that.


I also think there are faster ways to serve HTTP from Racket, but I'd 
have to find funding to work through them.


And we have to keep in mind that, unlike benchmarks for LINPACK or 
standard transaction processing, the real-world applications of HTTP 
servers are messier.  And also, I don't think many people have been 
tuning for Web application benchmarks, unlike was once done for LINPACK 
and TP.  I think the Racket community has enough skill to make a 
respectable showing in a benchmark tuning war, or in general platform 
performance for real-world Web applications, but I'm not aware of any 
funding going into that right now.


--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-02 Thread dbohdan
On Friday, September 1, 2017 at 8:19:19 PM UTC+3, dbohdan wrote:
> My exceptions were [...]

This, of course, should say "expectations".


On Friday, September 1, 2017 at 9:38:25 PM UTC+3, Neil Van Dyke wrote:
> Thank you very much for doing this work, D. Bohdan.

You're welcome! I had fun doing it.

> This performance of Racket SCGI+nginx relative to the others you tested
is surprising to me, since I made the Racket `scgi` package for
particular non-performance requirements, and performance was really
secondary.

Thanks for making the 'scgi package. I rather like the SCGI protocol. It's a 
pity that it isn't as widely supported as FastCGI, considering that it's much 
simpler to implement (second only to plain old CGI), but still has a 
performance profile similar to FastCGI's.

> Not to look a gift horse in the mouth, [...]

No worries. The horse is given very much with that in mind. :-) To address your 
specific concerns:

> errors can cause good performance numbers. Sometimes I used
> JMeter instead of `ab` to rule out that cause of bad numbers in
> performance testing (as well as to implement some testing).  

I think the SCGI benchmark works correctly because of the data sizes that 
ApacheBench reports. For example, here is the request data from one run:

> Complete requests:  178572
> Failed requests:0
> Total transferred:  755002416 bytes
> HTML transferred:   733038060 bytes

733038060 / 178572 = 4105, which is exactly the size of the text message the 
application serves. The same is true of other data I've examined so far (5 
runs). To help detect errors, the benchmark is also programmed to abort if the 
first request to an application doesn't serve exactly the right text (see 
`run-when-ready.sh`) or if ApacheBench sees enough of nginx's status 502 pages, 
which are served when the SCGI server doesn't respond correctly or at all.

I'll look into using JMeter in addition to ApacheBench.

> the OS pushing into swap

Good point. I thought I'd already disabled the containers' access to swap, but 
apparently it didn't work because of a thing with cgroups. The "benchmarked" 
container still must have used swap, because it began to run out of memory for 
some applications when I disabled the swap on the VM itself. I've increased 
"benchmarked's" memory quota to 768 MiB and added a recommendation to disable 
the swap system-wide in README.md.

> sporadic network latency (though looks like you might've controlled for
that one)

The application and the load generator communicate through a virtual network 
between two Docker containers on the same host, so this should not be an issue.

> some other OS/hardware/network burp outside of your Racket
process(es).

Such burps are possible, and even likely, because I run the VM on a machine I 
use for other tasks. I try to ensure no taxing tasks run alongside the 
benchmark and mitigate the inevitable CPU spikes by simply benchmarking every 
application for longer (three minutes by default).

On Friday, September 1, 2017 at 9:51:13 PM UTC+3, Neil Van Dyke wrote:
> `#:scgi-max-allow-wait

Thanks for the suggestion. This turned out to be the key to SCGI performance. 
Increasing #:scgi-max-allow-wait from 1 to 4 (default), 16, 64, 256 gives a 
moderate increase in throughput (from ~2350 req/s to ~2900 req/s), but 
*decreases* the maximum latency in a very major way (from ~5 ms to ~250 
ms). See scgi-max-allow.md in the attachments for some detailed data samples. 
The effect levels out at 256. There isn't an obvious difference between 256, 
1024, 4096, and 16384. I've pushed the update to run the tests at 
#:scgi-max-allow-wait 256.

Besides scgi-max-allow.md, I've also attached the results for 1) a five-minute 
benchmark with one concurrent connection, 768 MiB RAM, no swap, 
#:scgi-max-allow-wait 4; 2) a rerun of the first benchmark with the updated 
settings (three minutes, 100 connections, 768 MiB RAM, no swap, 
#:scgi-max-allow-wait 256).

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
> grep Requests results/*
results/caddy.txt:Requests per second:4117.47 [#/sec] (mean)
results/compojure.txt:Requests per second:5127.98 [#/sec] (mean)
results/flask.txt:Requests per second:1077.55 [#/sec] (mean)
results/guile.txt:Requests per second:2053.89 [#/sec] (mean)
results/plug.txt:Requests per second:5042.51 [#/sec] (mean)
results/scgi.txt:Requests per second:2760.30 [#/sec] (mean)
results/sinatra.txt:Requests per second:312.66 [#/sec] (mean)
results/stateful.txt:Requests per second:529.93 [#/sec] (mean)
results/stateless.txt:Requests per second:620.32 [#/sec] (mean)

> grep -A 29 'Concurrency Level' results/*
results/caddy.txt:Concurrency Level:  100
results/caddy.txt-Time 

Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-01 Thread Neil Van Dyke
Oh yeah, contention from simultaneous requests, if you're doing that, 
can also complicate your numbers.  Adjusting `#:scgi-max-allow-wait` 
might be a quick way to see whether that changes your numbers.  (Hitting 
a limit here could give you better numbers, or worse numbers, but 
removing a limit being hit should change the numbers.)  You can also run 
Wireshark or "tcpdump", to be certain of what's going on at the packet 
level, but that can be time-consuming to trace through.


Neil Van Dyke wrote on 09/01/2017 02:38 PM:

Thank you very much for doing this work, D. Bohdan.

If I'm reading these results quickly, and if my guess about the 
distribution is correct, then it looks like Racket SCGI+nginx *might* 
actually have the best times of any of your tested combinations 
*except when a GC cycle kicks in*.


results/scgi.txt-Percentage of the requests served within a certain 
time (ms)

results/scgi.txt-  50%  3
results/scgi.txt-  66%  4
results/scgi.txt-  75%  4
results/scgi.txt-  80%  5
results/scgi.txt-  90%  7
results/scgi.txt-  95% 11
results/scgi.txt-  98%   1018
results/scgi.txt-  99%   1030
results/scgi.txt- 100%  55256 (longest request)

If GC is indeed the cause, if you avoid or reduce the GC hits that are 
killing you 5% of the time, then maybe 100% of your requests are 
fast.  (I suggest looking at avoiding/reducing GC hits holistically, 
in an application-specific way, since there's lots of things you can 
do, depending, and there are costs and benefits.  One very likely 
situation is that there are inefficiencies in the application code 
itself that are the bottleneck, and it's best to take a look at those 
before focusing on where the bottleneck moves next.  That 
familiarization with the application code can also help you decide how 
to address any bottlenecks external to it.)


This performance of Racket SCGI+nginx relative to the others you 
tested is surprising to me, since I made the Racket `scgi` package for 
particular non-performance requirements, and performance was really 
secondary.  (If I were prioritizing speed higher, I suspect I could 
make serving much faster, doing it a different way, and then 
micro-optimizing on top of that.)


Not to look a gift horse in the mouth, but it's possible that 
something else was going on, to give surprisingly good numbers. For 
example, often, errors can cause good performance numbers. Sometimes I 
used JMeter instead of `ab` to rule out that cause of bad numbers in 
performance testing (as well as to implement some testing).  Also, the 
bad numbers could be something else, like the OS pushing into swap, 
sporadic network latency (though looks like you might've controlled 
for that one), or some other OS/hardware/network burp outside of your 
Racket process(es).


I'd want to have a better understanding of these numbers before 
Racketeers started either bragging or donning burlap sacks of shame. :)




--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Racket Web servlet performance benchmarked and compared

2017-09-01 Thread Neil Van Dyke

Thank you very much for doing this work, D. Bohdan.

If I'm reading these results quickly, and if my guess about the 
distribution is correct, then it looks like Racket SCGI+nginx *might* 
actually have the best times of any of your tested combinations *except 
when a GC cycle kicks in*.


results/scgi.txt-Percentage of the requests served within a certain time 
(ms)

results/scgi.txt-  50%  3
results/scgi.txt-  66%  4
results/scgi.txt-  75%  4
results/scgi.txt-  80%  5
results/scgi.txt-  90%  7
results/scgi.txt-  95% 11
results/scgi.txt-  98%   1018
results/scgi.txt-  99%   1030
results/scgi.txt- 100%  55256 (longest request)

If GC is indeed the cause, if you avoid or reduce the GC hits that are 
killing you 5% of the time, then maybe 100% of your requests are fast.  
(I suggest looking at avoiding/reducing GC hits holistically, in an 
application-specific way, since there's lots of things you can do, 
depending, and there are costs and benefits.  One very likely situation 
is that there are inefficiencies in the application code itself that are 
the bottleneck, and it's best to take a look at those before focusing on 
where the bottleneck moves next.  That familiarization with the 
application code can also help you decide how to address any bottlenecks 
external to it.)


This performance of Racket SCGI+nginx relative to the others you tested 
is surprising to me, since I made the Racket `scgi` package for 
particular non-performance requirements, and performance was really 
secondary.  (If I were prioritizing speed higher, I suspect I could make 
serving much faster, doing it a different way, and then micro-optimizing 
on top of that.)


Not to look a gift horse in the mouth, but it's possible that something 
else was going on, to give surprisingly good numbers.  For example, 
often, errors can cause good performance numbers. Sometimes I used 
JMeter instead of `ab` to rule out that cause of bad numbers in 
performance testing (as well as to implement some testing).  Also, the 
bad numbers could be something else, like the OS pushing into swap, 
sporadic network latency (though looks like you might've controlled for 
that one), or some other OS/hardware/network burp outside of your Racket 
process(es).


I'd want to have a better understanding of these numbers before 
Racketeers started either bragging or donning burlap sacks of shame. :)


--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] Racket Web servlet performance benchmarked and compared

2017-09-01 Thread dbohdan
Hi, everyone. Long time (occasional) reader, first time writer here.

In the 5.x days I played with Racket's Web servlets and found them slower than 
I'd expected. (My exceptions were, admittedly, quite high after seeing how much 
better Racket performed at other tasks than your average scripting language.) 
I've decided to try Web servlets out again, but this time to put some rough 
numbers on the performance with a reproducible benchmark.

My benchmark compares Racket's stateful and stateless servlets against the SCGI 
package for Racket, Caddy (HTTP server written in Go), Flask (Python web 
microframework), GNU Guile's Web server module, Ring/Compojure (Clojure HTTP 
middleware/routing library), Plug (Elixir HTTP middleware), and Sinatra (Ruby 
web microframework). On each of these platforms the benchmark implements a 
trivial web application that serves around 4K of plain text. It uses 
ApacheBench to stress it with a configurable number of concurrent connections. 
The application and ApacheBench are run in separate Docker containers, which 
lets you tune the memory and the CPU time available to them. I've published the 
source code for the benchmark at 
https://gitlab.com/dbohdan/racket-vs-the-world/. It should be straightforward 
to run on Linux with Docker (but please report any difficulties!).

I've attached the results I got on a two-core VM. According to them, Racket's 
servlets do lag behind everything else but Sinatra. The results are for 100 
concurrent connections, which is the default, but the differences in throughput 
are still very similar with 20 connections and quite similar with just one. I'd 
appreciate any feedback on these results (do they look reasonable to you?) and 
the code behind the benchmark (did I miss any crucial bits of configuration for 
the servlet?).

Best,
D. Bohdan

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
> grep -A 21 'Requests per second' results/*
results/caddy.txt:Requests per second:4294.27 [#/sec] (mean)
results/caddy.txt-Time per request:   23.287 [ms] (mean)
results/caddy.txt-Time per request:   0.233 [ms] (mean, across all 
concurrent requests)
results/caddy.txt-Transfer rate:  18141.66 [Kbytes/sec] received
results/caddy.txt-
results/caddy.txt-Connection Times (ms)
results/caddy.txt-  min  mean[+/-sd] median   max
results/caddy.txt-Connect:01   0.9  0  12
results/caddy.txt-Processing: 0   23  11.2 21 106
results/caddy.txt-Waiting:0   21  10.7 20 102
results/caddy.txt-Total:  0   23  11.0 22 106
results/caddy.txt-WARNING: The median and mean for the initial connection time 
are not within a normal deviation
results/caddy.txt-These results are probably not that reliable.
results/caddy.txt-
results/caddy.txt-Percentage of the requests served within a certain time (ms)
results/caddy.txt-  50% 22
results/caddy.txt-  66% 27
results/caddy.txt-  75% 30
results/caddy.txt-  80% 32
results/caddy.txt-  90% 38
results/caddy.txt-  95% 43
results/caddy.txt-  98% 50
--
results/compojure.txt:Requests per second:4989.57 [#/sec] (mean)
results/compojure.txt-Time per request:   20.042 [ms] (mean)
results/compojure.txt-Time per request:   0.200 [ms] (mean, across all 
concurrent requests)
results/compojure.txt-Transfer rate:  20659.95 [Kbytes/sec] received
results/compojure.txt-
results/compojure.txt-Connection Times (ms)
results/compojure.txt-  min  mean[+/-sd] median   max
results/compojure.txt-Connect:09  92.2  03048
results/compojure.txt-Processing: 0   11   7.2  9 228
results/compojure.txt-Waiting:0   10   7.1  9 228
results/compojure.txt-Total:  1   20  92.8 103067
results/compojure.txt-
results/compojure.txt-Percentage of the requests served within a certain time 
(ms)
results/compojure.txt-  50% 10
results/compojure.txt-  66% 13
results/compojure.txt-  75% 14
results/compojure.txt-  80% 16
results/compojure.txt-  90% 20
results/compojure.txt-  95% 25
results/compojure.txt-  98% 33
results/compojure.txt-  99% 53
results/compojure.txt- 100%   3067 (longest request)
--
results/flask.txt:Requests per second:1153.20 [#/sec] (mean)
results/flask.txt-Time per request:   86.715 [ms] (mean)
results/flask.txt-Time per request:   0.867 [ms] (mean, across all 
concurrent requests)
results/flask.txt-Transfer rate:  4799.74 [Kbytes/sec] received
results/flask.txt-
results/flask.txt-Connection Times (ms)
results/flask.txt-  min  mean[+/-sd] median   max
results/flask.txt-Connect:00   0.2  0  12
results/flask.txt-Processing: 2   87