Re: [webpy] Re: Bjoern super-fast event-loop WSGI server

Angelo Gladding Sat, 01 Jan 2011 02:05:53 -0800

On Sat, Dec 25, 2010 at 12:10 AM, Alice Bevan–McGregor
<[email protected]> wrote:
> Interesting idea, though as far as I can tell it's HTTP/1.0 only.  :/


To be fair he does say "Not HTTP/1.1 capable (yet)." Is it the
pipelining that you'd miss?

> Does anyone have any numbers they could share?  I'm having some difficulty
> compiling it against ports' libev installation. (Hello world comparisons
> against Paste and Tornado interest me, though comparisons against
> marrow.server.http are my final goal.)

http://nichol.as/benchmark-of-python-web-servers told me to use
`gevent` as it scored well and greenlets sounded attractive for a
potential faraway leveraging of a [stackless] PyPy (as its most recent
version has surpassed CPython in at least a few areas of performance).

Since then I have crossed paths with bjoern as many others have (YC
News, Proggit, here).

- - -

My application consists of a modified clone of web.py sans the bloat.
I never thought I'd say such a thing of the anti-framework framework.
Full disclosure: I've implemented my own `web.application`, borrowed
heavily from `web.utils`/`web.http`, reimplemented most of
`web.webapi` keeping with the style, and carried over the holy
`web.template`.

During the flux of transitioning away from lighty/web.py toward a more
performant combination of one of at least a dozen Python servers and a
web.py-esque application-style I have temporarily reduced my project's
code base to its bare essentials. Profiling, benchmarking, and browser
auditing will follow as I place each feature back into its proper
place. Bjoern and this e-mail have both come at the start of this
process. I'll be leaving the spawn code for `marrow`, `gevent`, and
`bjoern` in place for stress testing alongside all other testing.

A bit about the current level of complexity is important to understand
the results:

Currently a middleware sanitizes the `environ` into a nested storage
of request and response data parsing `accept*` headers, parsing
`user-agent` against `browscap` to determine browser capabilities,
geolocating according to `REMOTE_ADDR`, and checking if AJAX request
using `X-Requested-With`. No cookie/session handling has been
reintroduced yet.

The landing page does one of two things depending on context and
allows for two separate testing environments.

1. If it is requested by a javascript-capable user-agent it responds
with a skeleton and jQuery. jQuery is set to never expire from cache.
Upon document's ready state four AJAX requests are made for the four
main content areas of the page. One of the requests is set to stream
using generator syntax (yield) which triggers the appropriate chunked
transfer-encoding which is properly handled by supplying jQuery with a
modified XHR object. (a la Facebook's `BigPipe`)

2. If it is requested by a non-javascript-capable user-agent it
precompiles the output from the four page handlers internally and
flushes a complete page in one shot. jQuery is obviously /not/ fetched
(in `ab` or `lynx`).

My data backend is memcached[b!] run in threaded mode which boasts
something like 60k reads/sec. I am /not/ aware of any server-level
internal caching and I'm using the standard (non c-based)
python-memcache lib to interface. A single key of ~.3Kb is fetched
each request.

All `web.template`s are *precompiled* at application boot. Each of
four content handler are produced via templates. The generator yields
4 templates. That is 7 total. The landing utilizes these according to
the rules above. The entire product is wrapped in a base template just
before response. 9 template calls in total.

- - -

To run the following dirty benchmark on a single core eight year old
Dell with 1.5GB generic RAM that's been running Ubuntu for ~60 hours
with a leaking video card while watching (decoding) an Xvid and
leeching/seeding with ~12 peers over BitTorrent with Opera and Chrome
open as well as Firefox, with 50 (fifty) tabs open, including Gmail,
and developer panes open in all three, with `screen` open in terminal
and twelve separate user sessions active including one `htop` showing
VLC and Firefox beating each other up for 50-90% of my CPU and
833/1127MB of RAM currently consumed, I'm going to run `ab` with 25
concurrent connections (-c 25).

Without further ado..

- - -

## MARROW FOUR WORKES #####

FF 3.6.1 (AJAX)
unprimed: 6 requests, 77.8 KB (0 from cache), 4.68s (onload: 4.89s)
primed: 6 requests, 77.8 KB (76.8 KB from cache), 3.93s (onload: 4.11s)

AB (1K)
Document Path:          /
Document Length:        944 bytes

Concurrency Level:      25
Time taken for tests:   170.002 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      1015000 bytes
HTML transferred:       944000 bytes
Requests per second:    5.88 [#/sec] (mean)
Time per request:       4250.044 [ms] (mean)
Time per request:       170.002 [ms] (mean, across all concurrent requests)
Transfer rate:          5.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   5.8      0      46
Processing:   699 4227 1497.8   4276    7654
Waiting:        4   49  81.9     35     596
Total:        699 4228 1498.1   4276    7654

Percentage of the requests served within a certain time (ms)
  50%   4276
  66%   4984
  75%   5335
  80%   5545
  90%   6238
  95%   6691
  98%   7164
  99%   7401
 100%   7654 (longest request)

## GEVENT (GREENLETS, 10K MAX POOL, PYTHON WSGI) #####

FF 3.6.1 (AJAX)
unprimed: 6 requests, 77.8 KB (0 from cache), 4.04s (onload: 4.26s)
primed: 6 requests, 77.8 KB (76.8 KB from cache), 4.26s (onload: 4.53s)

AB (1K)
Document Path:          /
Document Length:        944 bytes

Concurrency Level:      25
Time taken for tests:   158.792 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      1092000 bytes
HTML transferred:       944000 bytes
Requests per second:    6.30 [#/sec] (mean)
Time per request:       3969.790 [ms] (mean)
Time per request:       158.792 [ms] (mean, across all concurrent requests)
Transfer rate:          6.72 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       8
Processing:   170 3901 914.4   3432    5765
Waiting:       14 3750 886.1   3305    5559
Total:        178 3901 914.2   3432    5765

Percentage of the requests served within a certain time (ms)
  50%   3432
  66%   3686
  75%   4481
  80%   4945
  90%   5553
  95%   5633
  98%   5687
  99%   5730
 100%   5765 (longest request)

## GEVENT (GREENLETS, 10K MAX POOL, C WSGI) #####

FF 3.6.1 (AJAX)
unprimed: 6 requests, 77.7 KB (0 from cache), 1.91s (onload: 2.19s)
primed: 6 requests, 77.7 KB (76.8 KB from cache), 1.7s (onload: 1.98s)

AB (10K)
Document Path:          /
Document Length:        944 bytes

Concurrency Level:      25
Time taken for tests:   81.476 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      10550000 bytes
HTML transferred:       9440000 bytes
Requests per second:    122.74 [#/sec] (mean)
Time per request:       203.690 [ms] (mean)
Time per request:       8.148 [ms] (mean, across all concurrent requests)
Transfer rate:          126.45 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.3      0      30
Processing:    11  203  60.9    196     417
Waiting:        1  202  60.0    194     417
Total:         27  203  61.0    196     417

Percentage of the requests served within a certain time (ms)
  50%    196
  66%    238
  75%    244
  80%    251
  90%    280
  95%    316
  98%    345
  99%    359
 100%    417 (longest request)

## BJOERN FOUR WORKERS #####

FF 3.6.1 (AJAX)
unprimed: 6 requests, 77.8 KB (0 from cache), 1.8s (onload: 2.05s)
primed: 6 requests, 77.8 KB (76.8 KB from cache), 1.53s (onload: 1.78s)

## BJOERN SINGLE WORKER #####

FF 3.6.1 (AJAX)
unprimed: 6 requests, 77.8 KB (0 from cache), 1.82s (onload: 2.03s)
primed: 6 requests, 77.8 KB (76.8 KB from cache), 1.56s (onload: 1.78s)

AB (10K)
Document Path:          /
Document Length:        944 bytes

Concurrency Level:      25
Time taken for tests:   51.039 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      10150000 bytes
HTML transferred:       9440000 bytes
Requests per second:    195.93 [#/sec] (mean)
Time per request:       127.597 [ms] (mean)
Time per request:       5.104 [ms] (mean, across all concurrent requests)
Transfer rate:          194.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.3      0      28
Processing:     4  127  38.1    106     278
Waiting:        1  127  37.6    106     275
Total:         29  127  38.3    106     278

Percentage of the requests served within a certain time (ms)
  50%    106
  66%    130
  75%    163
  80%    167
  90%    179
  95%    197
  98%    232
  99%    248
 100%    278 (longest request)

- - -

"BJOERN IS SCREAMINGLY FAST AND ULTRA-LIGHTWEIGHT." He's right.

bjoern is up to 35 times faster than marrow and gevent w/ py-wsgi and
1.5 times faster than gevent w/ c-wsgi. c truly is a requirement for
any kind of real speed.

- - -

> I'm mentioning this because in my tests m.s.http handles C10K, is able to 
> process
> 10Krsecs at C6K or so, is fully unit tested, and has complete documentation.

Which `ab` arguments are you using for "C10K"?

> Unfortunately for most people, it currently only supports PEP 444[2] (with
> modifications[3]) and, soon, my draft rewrite[4].
>
> Hell, creating a middleware-based adapter from WSGI 1 to my version of PEP
> 444 should be relatively straight-forward, considering the same
> incompatibility that bjoern takes exception to (the writer function returned
> by start_response, or rather, the lack of it).

I'm not yet intimate with the details of WSGI so I've learned a great
deal already just by hearing that a PEP 444 exists. To clarify, PEP
3333 is `WSGI 1.1` and PEP 444 is `web3`, correct? Would it be
accurate to say that your server supports `web3` rather than `WSGI 2`?

If anyone is interested in future benchmarks as I continue to
reintroduce complexity I'd be willing to repeat the above
periodically. And last but not least, the relevance of this post to
web.py is that my project will use web.py apps as "extensions" atop a
decentralized social framework. Think of it as retaining the
high-level features of web.py while constraining and optimizing the
lower-level features (in no small part by marrying the framework to
the most optimal server implementation).

-- 
Angelo Gladding
[email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"web.py" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/webpy?hl=en.

Re: [webpy] Re: Bjoern super-fast event-loop WSGI server

Reply via email to