Hi Mike, Thanks for the detailed information. It certainly helps.
We would certainly consider the information on SQLAlchemy for DB related performance (there are certainly other mechanisms to improve performance like bulk queries, probably stored procedures etc.). The platform we are building is not simple - it’s a media (mostly video) based content management and delivery platform with multiple features. As far as my understanding goes on Pylons/Pyramid, the requests handling and delivery (to proper controller and action), management of threads (not sure if this is applicable) is handled by the framework . The HTTP protocol level transaction handling, detailed implementation of the stack etc. are handled by the web server/framework. The job remaining then by the application (or the application developer) is processing the requests and building the response (which also the framework helps in building with various libraries). As you mentioned, the application development would certainly play a role in the performance of the overall system. I agree that the ease of development is important but probably not at the cost of system performance. While choosing the overall framework for our platform, we just wanted to make sure that we are choosing the right framework. Thanks, Pinakee Please don't print this e-mail unless you really need to, this will preserve trees on planet earth. ----------------------------Disclaimer------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The information contained in this message (including any attachments) is confidential and may be privileged. If you have received it by mistake please notify the sender by return e-mail and permanently delete this message and any attachments from your system. Please note that e-mails are susceptible to change and malwares. VVIDIA COMMUNICATIONS PVT LTD. (including its group companies) shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. -------------------------------------------------------------------------------------------------------------------------Disclaimer---------------------------------------------------------------------------------------------------------- -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mike Orr Sent: 07 June 2012 01:41 To: [email protected] Subject: Re: Pylons/Pyramid Performance On Wed, Jun 6, 2012 at 4:03 AM, Vlad K. <[email protected]> wrote: > On 06/06/2012 07:49 AM, Andi wrote: >> >> just a benchmark, but better than nothing. found that during our research. >> http://blog.curiasolutions.com/the-great-web-framework-shootout/ >> >> andi >> >> (sent right out of my head) >> > > > The Curiasolutions shootout is interesting. However, even for a > synthetic benchmark it is highly unbalanced. For example, it shows > Pyramid yielding more rps than Bottle on Hello World. But then throws > Pyramid way lower than Bottle on templated db task. > > If you take a look at the benchmark code, you'll notice: > > - both use SQLite with a local file db, which is ok > - Bottle uses SQLite driver directly > - Pyramid uses SQLAlchemy which incurs significant overhead > > You can use database drivers directly in Pyramid. You don't need > sqlalchemy, or transaction extensions, they are not required by > Pyramid, just a chosen default. SQLAlchemy's overhead is not necessarily significant, and there are ways to use SQLAlchemy to minimze the overhead. That comes at the expense of convenience, of course, so you'd want to do a side-by-side comparison of a typical task to see how much the overhead matters. The biggest performance improvement is when you do bulk queries in the database. One SQL statement that does some calculations and returns a small number of result records. That avoids the overhead of loading every record into some Python data type. You can also do bulk updates and deletes. Optimized bulk processing at the C level is a feature of all SQL-compatible databases. It often is not a feature of non-SQL databases. For instance, CouchDB can perform a Javascript query and return some records, but I don't know if it's any faster than going through all the records in Python. The second-largest optimization is to use SQLAlchemy's SQL builder level rather than the ORM level. You can send it a SQL string to execute, or use the builder methods to construct the SQL statement. This essentially runs at the same speed as raw DB-API because it's doing the same thing. The result is an iterable of RowProxy's, which incur some minimal overhead. It you access fields by key rather than attribute (x[0] or x["foo"] vs x.foo), it's supposed to be faster. RowProxy uses lazy evaluation, so it avoids processing the underlying row tuple except as necessary. The ORM has to instantiate a Python object for every record, and keep track of which objects have had their attributes changed in Python. But again it does lazy evaluation, so it's not like it sets every attribute on initialization. Recent versions of SQLAlchemy also have a feature that you can construct a query using the ORM methods, but if it's a query on certain fields rather than on an entire ORM object, it returns RowProxy's just like the SQL level, so it bypasses most of the ORM's overhead. But again, you should do a side-by-side comparision to see how much the overhead actually is, because sometimes it surprises you. I have an import routine that reads 10,000+ records from CSV files and puts them in an empty database, and it takes 30 seconds to run either with or without the ORM. On the other hand, I have some log-processing scripts that process hundreds of thousands or millions of records, and the speedup is significant if I switch from ORM-level processing to SQL-level processing. (But again, I can use the ORM methods to construct the queries; I just avoid returning ORM objects or inserting ORM objects.) The third thing you can do is to hold a long-lived connection throughout the application, rather than letting the engine check out a connection on every query. That avoids the overhead of the connection pool. But that probably makes little difference. The purpose of the connection pool is to avoid the larger overhead of actually connecting to the database on every query. That's slow on some databases like PostgreSQL, but fast on others like SQLite. So the pool actually improves performance, and raw DB-API does not have a connection pool. This again depends on your application. A short-lived, single-threaded utility can just hold a connection for simplicity. But a multithreaded web application really benefits from a pool, so that you don't have to manage connections. (Or at most, you hold a connection open for a single function or single request.) SQLAlchemy is wonderful because it's multi-level: you can give it SQL strings, use the SQL builder, or use the ORM, depending on the application or even in different places in the same application. Python never had a multi-level database library before SQLAlchemy. I don't know how common it is in other programming languages. Michael Bayer also writes excellent documentation. (He wrote SQLAlchemy and Mako.) So I would definitely recommend using it. The other issue is, raw benchmarks of X framework does N requests per hour are usually unrealistic. They use an empty application to measure the framework's overhead. But in the real world of complex applications using databases and performing calculations, the overhead of the framework is dwarfed by the overhead of the application code. If you have a very small, simple application like Twitter with a huge number of request, then the framework's performance would be close to the benchmark. Otherwise it will degrade in ways that aren't framework specific. (I.e., they'd be the same in Pyramid or Flask.) If your application is so high-volume that it approaches the hardware's capacity, then you should look at parallel servers as well as different frameworks or languages. It may be that the cost of a second server is less than the programming-time cost or inconvenience of using a "simple, streamlined" framework or C-like language. Unless the application is very simple, in which case a minimalistic framework may be perfect for it. In terms of Python WSGI applications, there are two separate overheads: the WSGI server, and the framework/application. The CherryPy server is considered the most robust at high loads, compared to other multhithreaded Python servers. You can use it with Pyramid, just set the "[server:main]" section in the INI file. Asynchronous servers may have higher performance than mulththreaded, but the difficulty of making an application asynchronous-safe may outweigh the advantages. You can also use a module like mod_wsgi to avoid the overhead of a separate WSGI HTTP server. -- Mike Orr <[email protected]> -- You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en. -- You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en.
