On Wed, Jun 6, 2012 at 4:03 AM, Vlad K. <[email protected]> wrote: > On 06/06/2012 07:49 AM, Andi wrote: >> >> just a benchmark, but better than nothing. found that during our research. >> http://blog.curiasolutions.com/the-great-web-framework-shootout/ >> >> andi >> >> (sent right out of my head) >> > > > The Curiasolutions shootout is interesting. However, even for a synthetic > benchmark it is highly unbalanced. For example, it shows Pyramid yielding > more rps than Bottle on Hello World. But then throws Pyramid way lower than > Bottle on templated db task. > > If you take a look at the benchmark code, you'll notice: > > - both use SQLite with a local file db, which is ok > - Bottle uses SQLite driver directly > - Pyramid uses SQLAlchemy which incurs significant overhead > > You can use database drivers directly in Pyramid. You don't need sqlalchemy, > or transaction extensions, they are not required by Pyramid, just a chosen > default.
SQLAlchemy's overhead is not necessarily significant, and there are ways to use SQLAlchemy to minimze the overhead. That comes at the expense of convenience, of course, so you'd want to do a side-by-side comparison of a typical task to see how much the overhead matters. The biggest performance improvement is when you do bulk queries in the database. One SQL statement that does some calculations and returns a small number of result records. That avoids the overhead of loading every record into some Python data type. You can also do bulk updates and deletes. Optimized bulk processing at the C level is a feature of all SQL-compatible databases. It often is not a feature of non-SQL databases. For instance, CouchDB can perform a Javascript query and return some records, but I don't know if it's any faster than going through all the records in Python. The second-largest optimization is to use SQLAlchemy's SQL builder level rather than the ORM level. You can send it a SQL string to execute, or use the builder methods to construct the SQL statement. This essentially runs at the same speed as raw DB-API because it's doing the same thing. The result is an iterable of RowProxy's, which incur some minimal overhead. It you access fields by key rather than attribute (x[0] or x["foo"] vs x.foo), it's supposed to be faster. RowProxy uses lazy evaluation, so it avoids processing the underlying row tuple except as necessary. The ORM has to instantiate a Python object for every record, and keep track of which objects have had their attributes changed in Python. But again it does lazy evaluation, so it's not like it sets every attribute on initialization. Recent versions of SQLAlchemy also have a feature that you can construct a query using the ORM methods, but if it's a query on certain fields rather than on an entire ORM object, it returns RowProxy's just like the SQL level, so it bypasses most of the ORM's overhead. But again, you should do a side-by-side comparision to see how much the overhead actually is, because sometimes it surprises you. I have an import routine that reads 10,000+ records from CSV files and puts them in an empty database, and it takes 30 seconds to run either with or without the ORM. On the other hand, I have some log-processing scripts that process hundreds of thousands or millions of records, and the speedup is significant if I switch from ORM-level processing to SQL-level processing. (But again, I can use the ORM methods to construct the queries; I just avoid returning ORM objects or inserting ORM objects.) The third thing you can do is to hold a long-lived connection throughout the application, rather than letting the engine check out a connection on every query. That avoids the overhead of the connection pool. But that probably makes little difference. The purpose of the connection pool is to avoid the larger overhead of actually connecting to the database on every query. That's slow on some databases like PostgreSQL, but fast on others like SQLite. So the pool actually improves performance, and raw DB-API does not have a connection pool. This again depends on your application. A short-lived, single-threaded utility can just hold a connection for simplicity. But a multithreaded web application really benefits from a pool, so that you don't have to manage connections. (Or at most, you hold a connection open for a single function or single request.) SQLAlchemy is wonderful because it's multi-level: you can give it SQL strings, use the SQL builder, or use the ORM, depending on the application or even in different places in the same application. Python never had a multi-level database library before SQLAlchemy. I don't know how common it is in other programming languages. Michael Bayer also writes excellent documentation. (He wrote SQLAlchemy and Mako.) So I would definitely recommend using it. The other issue is, raw benchmarks of X framework does N requests per hour are usually unrealistic. They use an empty application to measure the framework's overhead. But in the real world of complex applications using databases and performing calculations, the overhead of the framework is dwarfed by the overhead of the application code. If you have a very small, simple application like Twitter with a huge number of request, then the framework's performance would be close to the benchmark. Otherwise it will degrade in ways that aren't framework specific. (I.e., they'd be the same in Pyramid or Flask.) If your application is so high-volume that it approaches the hardware's capacity, then you should look at parallel servers as well as different frameworks or languages. It may be that the cost of a second server is less than the programming-time cost or inconvenience of using a "simple, streamlined" framework or C-like language. Unless the application is very simple, in which case a minimalistic framework may be perfect for it. In terms of Python WSGI applications, there are two separate overheads: the WSGI server, and the framework/application. The CherryPy server is considered the most robust at high loads, compared to other multhithreaded Python servers. You can use it with Pyramid, just set the "[server:main]" section in the INI file. Asynchronous servers may have higher performance than mulththreaded, but the difficulty of making an application asynchronous-safe may outweigh the advantages. You can also use a module like mod_wsgi to avoid the overhead of a separate WSGI HTTP server. -- Mike Orr <[email protected]> -- You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en.
