RE: Pylons/Pyramid Performance

Biswas, Pinakee Wed, 06 Jun 2012 21:49:02 -0700

Hi Mike,

Thanks for the detailed information. It certainly helps.

We would certainly consider the information on SQLAlchemy for DB related 
performance (there are certainly other mechanisms to improve performance like 
bulk queries, probably stored procedures etc.). 

The platform we are building is not simple - it’s a media (mostly video) based 
content management and delivery platform with multiple features. 

As far as my understanding goes on Pylons/Pyramid, the requests handling and 
delivery (to proper controller and action), management of threads (not sure if 
this is applicable)  is handled by the framework . The HTTP protocol level 
transaction handling, detailed implementation of the stack etc. are handled by 
the web server/framework. 
The job remaining then by the application (or the application developer) is 
processing the requests and building the response (which also the framework 
helps in building with various libraries). 
As you mentioned, the application development would certainly play a role in 
the performance of the overall system. 

I agree that the ease of development is important but probably not at the cost 
of system performance. 

While choosing the overall framework for our platform, we just wanted to make 
sure that we are choosing the right framework. 

Thanks,
Pinakee 

 Please don't print this e-mail unless you really need to, this will preserve 
trees on planet earth. 
----------------------------Disclaimer-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The information contained in this message (including any attachments) is 
confidential and may be privileged. If you have received it by mistake please 
notify the sender by return e-mail and permanently delete this message and any 
attachments from your system. Please note that e-mails are susceptible to 
change and malwares. VVIDIA COMMUNICATIONS PVT LTD. (including its group 
companies) shall not be liable for the improper or incomplete transmission of 
the information contained in this communication nor for any delay in its 
receipt or damage to your system. 
-------------------------------------------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------------------

-----Original Message-----
From: [email protected] [mailto:[email protected]] 
On Behalf Of Mike Orr
Sent: 07 June 2012 01:41
To: [email protected]
Subject: Re: Pylons/Pyramid Performance

On Wed, Jun 6, 2012 at 4:03 AM, Vlad K. <[email protected]> wrote:
> On 06/06/2012 07:49 AM, Andi wrote:
>>
>> just a benchmark, but better than nothing. found that during our research.
>> http://blog.curiasolutions.com/the-great-web-framework-shootout/
>>
>> andi
>>
>> (sent right out of my head)
>>
>
>
> The Curiasolutions shootout is interesting. However, even for a 
> synthetic benchmark it is highly unbalanced. For example, it shows 
> Pyramid yielding more rps than Bottle on Hello World. But then throws 
> Pyramid way lower than Bottle on templated db task.
>
> If you take a look at the benchmark code, you'll notice:
>
> - both use SQLite with a local file db, which is ok
> - Bottle uses SQLite driver directly
> - Pyramid uses SQLAlchemy which incurs significant overhead
>
> You can use database drivers directly in Pyramid. You don't need 
> sqlalchemy, or transaction extensions, they are not required by 
> Pyramid, just a chosen default.

SQLAlchemy's overhead is not necessarily significant, and there are ways to use 
SQLAlchemy to minimze the overhead. That comes at the expense of convenience, 
of course, so you'd want to do a side-by-side comparison of a typical task to 
see how much the overhead matters.

The biggest performance improvement is when you do bulk queries in the 
database. One SQL statement that does some calculations and returns a small 
number of result records. That avoids the overhead of loading every record into 
some Python data type. You can also do bulk updates and deletes.  Optimized 
bulk processing at the C level is a feature of all SQL-compatible databases. It 
often is not a feature of non-SQL databases. For instance, CouchDB can perform 
a Javascript query and return some records, but I don't know if it's any faster 
than going through all the records in Python.

The second-largest optimization is to use SQLAlchemy's SQL builder level rather 
than the ORM level. You can send it a SQL string to execute, or use the builder 
methods to construct the SQL statement.
This essentially runs at the same speed as raw DB-API because it's doing the 
same thing. The result is an iterable of RowProxy's, which incur some minimal 
overhead. It you access fields by key rather than attribute (x[0] or x["foo"] 
vs x.foo), it's supposed to be faster.
RowProxy uses lazy evaluation, so it avoids processing the underlying row tuple 
except as necessary.

The ORM has to instantiate a Python object for every record, and keep track of 
which objects have had their attributes changed in Python.
But again it does lazy evaluation, so it's not like it sets every attribute on 
initialization.  Recent versions of SQLAlchemy also have a feature that you can 
construct a query using the ORM methods, but if it's a query on certain fields 
rather than on an entire ORM object, it returns RowProxy's just like the SQL 
level, so it bypasses most of the ORM's overhead.

But again, you should do a side-by-side comparision to see how much the 
overhead actually is, because sometimes it surprises you. I have an import 
routine that reads 10,000+ records from CSV files and puts them in an empty 
database, and it takes 30 seconds to run either with or without the ORM. On the 
other hand, I have some log-processing scripts that process hundreds of 
thousands or millions of records, and the speedup is significant if I switch 
from ORM-level processing to SQL-level processing. (But again, I can use the 
ORM methods to construct the queries; I just avoid returning ORM objects or 
inserting ORM objects.)

The third thing you can do is to hold a long-lived connection throughout the 
application, rather than letting the engine check out a connection on every 
query. That avoids the overhead of the connection pool. But that probably makes 
little difference. The purpose of the connection pool is to avoid the larger 
overhead of actually connecting to the database on every query. That's slow on 
some databases like PostgreSQL, but fast on others like SQLite. So the pool 
actually improves performance, and raw DB-API does not have a connection pool.
This again depends on your application. A short-lived, single-threaded utility 
can just hold a connection for simplicity. But a multithreaded web application 
really benefits from a pool, so that you don't have to manage connections. (Or 
at most, you hold a connection open for a single function or single request.)

SQLAlchemy is wonderful because it's multi-level: you can give it SQL strings, 
use the SQL builder, or use the ORM, depending on the application or even in 
different places in the same application.
Python never had a multi-level database library before SQLAlchemy. I don't know 
how common it is in other programming languages. Michael Bayer also writes 
excellent documentation. (He wrote SQLAlchemy and
Mako.) So I would definitely recommend using it.

The other issue is, raw benchmarks of X framework does N requests per hour are 
usually unrealistic. They use an empty application to measure the framework's 
overhead. But in the real world of complex applications using databases and 
performing calculations, the overhead of the framework is dwarfed by the 
overhead of the application code.
If you have a very small, simple application like Twitter with a huge number of 
request, then the framework's performance would be close to the benchmark. 
Otherwise it will degrade in ways that aren't framework specific. (I.e., they'd 
be the same in Pyramid or Flask.) If your application is so high-volume that it 
approaches the hardware's capacity, then you should look at parallel servers as 
well as different frameworks or languages. It may be that the cost of a second 
server is less than the programming-time cost or inconvenience of using a 
"simple, streamlined" framework or C-like language. Unless the application is 
very simple, in which case a minimalistic framework may be perfect for it.

In terms of Python WSGI applications, there are two separate
overheads: the WSGI server, and the framework/application. The CherryPy server 
is considered the most robust at high loads, compared to other multhithreaded 
Python servers. You can use it with Pyramid, just set the "[server:main]" 
section in the INI file. Asynchronous servers may have higher performance than 
mulththreaded, but the difficulty of making an application asynchronous-safe 
may outweigh the advantages. You can also use a module like mod_wsgi to avoid 
the overhead of a separate WSGI HTTP server.

--
Mike Orr <[email protected]>

--
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en.

RE: Pylons/Pyramid Performance

Reply via email to