[web2py] Re: Making the DAL orders-of-magnitude faster...

Anthony Tue, 19 Nov 2013 12:25:21 -0800

Note, you can do:

db(query).select(..., processor=myprocessor)


where myprocessor is a function that takes rows, fields, and 
colnames arguments. Perhaps, as you suggest, we could include a built-in 
quick processor that simply generates a list of dictionaries or Storage 
objects. We might also include options to turn various features on and off 
(e.g., turn on parsing of dates, but turn off generation of Reference 
objects for reference fields).

Note, you might also see if .select(..., cacheable=False) speeds things up 
a bit.

Anthony

On Tuesday, November 19, 2013 3:00:24 PM UTC-5, Arnon Marcus wrote:
>
> We recently did some profiling on some slow queries, and found out 
> something pretty surprising:
> The parsing of the result-set into *Rows *object, incurs an overhead that 
> is many times the time of the actual query (including the entire round-trip 
> to the database).
>
> We are using web2py 2.7.2 and a PostgreSQL database (9.3x64 on 
> CentOS-6.4x64) using python 2.6x64 (on Win7x64) and profiling with cProfile 
> and RunSnakeRun.
>
> It's a moderately-complex query of a result-set of about a 1000 rows in a 
> single query.
> Here is the result:
>
>
> <https://lh6.googleusercontent.com/-ZHKeRiR6Ytw/Uou0mhcE76I/AAAAAAAAAGw/wHLjnuT6IwI/s1600/NormalSelect.jpg>
>
>
> The time it takes the database to fetch them, is 0.02s.
> The entire time of the select, including the parse, is about 0.79s (In 
> other tests it was 1.07s).
> It is just an example, but we had queries that took around 5.x-9.x seconds 
> (!) in which the database round-trip was a mere 0.04s...
>
> This is an order-of-magnitude difference between the parsing and the 
> actual query.
> The gap gets worse the more results there are in the result-set.
>
> Now, in any discussion about performance of server-side web-frameworks, 
> you almost unanimously here this argument:
> "...the database always ends up being the bottleneck anyways..."
> So, this is not supposed to happen, right?
> Either we have a really slow framework, or a really fast database...
>
> I then did a little digging, to see if we can use this to our advantage 
> (since we rarely have a need for complex usage-patters of the *Rows *object 
> anyways...)
>
> First thing I did, was to see if the *executesql *method, could return 
> something just a bit more useful then it usually does, when using 
> *as_dict=True 
> *on it - I changed this line:
>     def executesql(self, query, placeholders=None, as_dict=False,fields=
> None, colnames=None):
>         ....
>         if as_dict:
>             if not hasattr(adapter.cursor,'description'):
>                 raise RuntimeError("database does not support 
> executesql(...,as_dict=True)")
>             columns = adapter.cursor.description
>             fields = [f[0] for f in columns]
>             data = adapter._fetchall()
>             return [dict(zip(fields,row)) for row in data]
> into this:
>     def executesql(self, query, placeholders=None, as_dict=False,fields=
> None, colnames=None):
>         ....
>         if as_dict:
>             if not hasattr(adapter.cursor,'description'):
>                 raise RuntimeError("database does not support 
> executesql(...,as_dict=True)")
>             columns = adapter.cursor.description
>             fields = *colnames **or *[f[0] for f in columns]
>             data = adapter._fetchall()
>             return [dict(zip(fields,row)) for row in data]
> 1 line of code...
>
> This single change, makes the function *potentially*-able to return a 
> list of dictionaries, each with *practically-usable* keys.
> We went after a key-naming of '<TableName>.<FieldName>' as is in the 
> actual SQL string.
> Then I though, why not use the SQL string itself?
> And thus the *quickSelect *method was born into the *Set* class:
>
>     def select(self, *fields, **attributes):
>         adapter = self.db._adapter
>         tablenames = adapter.tables(self.query,
>                                     attributes.get('join',None),
>                                     attributes.get('left',None),
>                                     attributes.get('orderby',None),
>                                     attributes.get('groupby',None))
>         fields = adapter.expand_all(fields, tablenames)
>         return adapter.select(self.query,fields,attributes)
> ###############################################################
>     def quickSelect(self, *fields, **attributes):
>         query = self._select(*fields, **attributes)
>         colnames = query.split(' FROM')[0][7:-1].split(', ')
>         return self._db.executesql(query, colnames=colnames, as_dict=True)
> ###############################################################
>     def nested_select(self,*fields,**attributes):
>         return Expression(self.db,self._select(*fields,**attributes))
> 3 lines of code... Could even be shorten to 2...
>
> Was was the outcome?
> Well, lets see:
>
>
> <https://lh4.googleusercontent.com/-A_2xljL5gJQ/Uou5nSpH4lI/AAAAAAAAAHA/NtCjwJQxICc/s1600/QuickSelect.jpg>
> Voila!
> 0.04s for a *practically-usable *list of dictionaries(!)
> An order-of-magnitude improvement in less than 5 lines of code...
>
> I *though *I could do *better*, and wrap the result-set myself with a 
> simple class, and inject the dictionaries into the object.__dict__ 
> attribute, so I could *emulate*, and perhaps even *point-in-place* *replace 
> *the "real" *select() *method itself (including support for *JOIN*s)
> This way, I could leave all our app-code unchanged:
>
> class SimpleObject:
>     def __init__(self, attributes=None):
>         self.__dict__ = attributes
>
>
> class Set(object):
>     ...
>     def select(self, *fields, **attributes):
>         query = self._select(*fields, **attributes)
>         colnames = query.split(' FROM')[0][7:-1].split(', ')
>         tableNames = [colname.split('.')[0] for colname in colnames]
>         uniqueTableNames = set(tableNames)
>         records = self._db.executesql(query, colnames=colnames, as_dict=
> True)
>         if len(uniqueTableNames) is 1: # Single Table
>             tableNameLemgth = len(uniqueTableNames[0]) + 1
>             return [
>                 SimpleObject(
>                     dict(
>                         [
>                             (
>                                 key[tableNameLemgth:-1],
>                                 value
>                             ) for key, value in record.iteritems()
>                         ]
>                     ) for record in records
>                 )
>             ]
>         else: # Multiple Tables (JOIN)
>             tableNamesLemgth = dict(   # A dictionary mapping table-names 
> to their length+1
>                 [                      # This is for later "easy" 
> splicing of the key-names
>                     (
>                          tableName,
>                          (len(tableName) + 1)
>                      ) for tableName in uniqueTableNames
>                 ]
>             )
>             return [
>                 SimpleObject(
>                     dict(
>                         [
>                             (
>                                 tableName,
>                                 SimpleObject(
>                                     dict(
>                                         [
>                                             (
>                                                 key[tableNamesLemgth[
> tableName]:-1],
>                                                 value
>                                             ) for key, value in record.
> iteritems() if (
>                                               key[:(tableNamesLemgth[
> tableName] - 1)] == tableName
>                                               )
>                                         ]
>                                     )
>                                 )
>                             ) for tableName in tableNames
>                         ]
>                     )
>                 ) for record in records
>             ]
>
> But that didn't work as I expected... I got an even worst result than the 
> original web2py *Rows *parsing...
> Tried to remove the SimpleObject class from the equation, and just return 
> nested-dictionaries - No luck... Same result...
> So I guess the *Rows *parsing isn't "slow" after all, as it's doing a 
> better job than I can...
>
> However, I still think there is room for having my *quickSelect()* method, 
> in conjunction with the small addition to the *executesql()* method - for 
> people who want to dramatically speed-up their application, and are willing 
> to pay the price of getting back a flat list of dictionaries with 
> "table.filed" key-names.
>
> We're finding it potentially useful enough, that we are prepared to 
> experiment this in our entire app...
> The speed-bump is just too impressive (and tempting) to overlook...
>
> What say you?
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[web2py] Re: Making the DAL orders-of-magnitude faster...

Reply via email to