[web2py] Re: Making the DAL orders-of-magnitude faster...

Anthony Tue, 19 Nov 2013 12:31:59 -0800

You can also always do:

db.executesql(db(query)._select(), as_dict=True)


to get just a list of dictionaries for any DAL select. Allowing custom 
column names, as you suggest, sounds like a good idea.

Anthony

On Tuesday, November 19, 2013 3:24:52 PM UTC-5, Anthony wrote:
>
> Note, you can do:
>
> db(query).select(..., processor=myprocessor)
>
> where myprocessor is a function that takes rows, fields, and 
> colnames arguments. Perhaps, as you suggest, we could include a built-in 
> quick processor that simply generates a list of dictionaries or Storage 
> objects. We might also include options to turn various features on and off 
> (e.g., turn on parsing of dates, but turn off generation of Reference 
> objects for reference fields).
>
> Note, you might also see if .select(..., cacheable=False) speeds things 
> up a bit.
>
> Anthony
>
> On Tuesday, November 19, 2013 3:00:24 PM UTC-5, Arnon Marcus wrote:
>>
>> We recently did some profiling on some slow queries, and found out 
>> something pretty surprising:
>> The parsing of the result-set into *Rows *object, incurs an overhead 
>> that is many times the time of the actual query (including the entire 
>> round-trip to the database).
>>
>> We are using web2py 2.7.2 and a PostgreSQL database (9.3x64 on 
>> CentOS-6.4x64) using python 2.6x64 (on Win7x64) and profiling with cProfile 
>> and RunSnakeRun.
>>
>> It's a moderately-complex query of a result-set of about a 1000 rows in a 
>> single query.
>> Here is the result:
>>
>>
>> <https://lh6.googleusercontent.com/-ZHKeRiR6Ytw/Uou0mhcE76I/AAAAAAAAAGw/wHLjnuT6IwI/s1600/NormalSelect.jpg>
>>
>>
>> The time it takes the database to fetch them, is 0.02s.
>> The entire time of the select, including the parse, is about 0.79s (In 
>> other tests it was 1.07s).
>> It is just an example, but we had queries that took around 5.x-9.x 
>> seconds (!) in which the database round-trip was a mere 0.04s...
>>
>> This is an order-of-magnitude difference between the parsing and the 
>> actual query.
>> The gap gets worse the more results there are in the result-set.
>>
>> Now, in any discussion about performance of server-side web-frameworks, 
>> you almost unanimously here this argument:
>> "...the database always ends up being the bottleneck anyways..."
>> So, this is not supposed to happen, right?
>> Either we have a really slow framework, or a really fast database...
>>
>> I then did a little digging, to see if we can use this to our advantage 
>> (since we rarely have a need for complex usage-patters of the *Rows *object 
>> anyways...)
>>
>> First thing I did, was to see if the *executesql *method, could return 
>> something just a bit more useful then it usually does, when using 
>> *as_dict=True 
>> *on it - I changed this line:
>>     def executesql(self, query, placeholders=None, as_dict=False,fields=
>> None, colnames=None):
>>         ....
>>         if as_dict:
>>             if not hasattr(adapter.cursor,'description'):
>>                 raise RuntimeError("database does not support 
>> executesql(...,as_dict=True)")
>>             columns = adapter.cursor.description
>>             fields = [f[0] for f in columns]
>>             data = adapter._fetchall()
>>             return [dict(zip(fields,row)) for row in data]
>> into this:
>>     def executesql(self, query, placeholders=None, as_dict=False,fields=
>> None, colnames=None):
>>         ....
>>         if as_dict:
>>             if not hasattr(adapter.cursor,'description'):
>>                 raise RuntimeError("database does not support 
>> executesql(...,as_dict=True)")
>>             columns = adapter.cursor.description
>>             fields = *colnames **or *[f[0] for f in columns]
>>             data = adapter._fetchall()
>>             return [dict(zip(fields,row)) for row in data]
>> 1 line of code...
>>
>> This single change, makes the function *potentially*-able to return a 
>> list of dictionaries, each with *practically-usable* keys.
>> We went after a key-naming of '<TableName>.<FieldName>' as is in the 
>> actual SQL string.
>> Then I though, why not use the SQL string itself?
>> And thus the *quickSelect *method was born into the *Set* class:
>>
>>     def select(self, *fields, **attributes):
>>         adapter = self.db._adapter
>>         tablenames = adapter.tables(self.query,
>>                                     attributes.get('join',None),
>>                                     attributes.get('left',None),
>>                                     attributes.get('orderby',None),
>>                                     attributes.get('groupby',None))
>>         fields = adapter.expand_all(fields, tablenames)
>>         return adapter.select(self.query,fields,attributes)
>> ###############################################################
>>     def quickSelect(self, *fields, **attributes):
>>         query = self._select(*fields, **attributes)
>>         colnames = query.split(' FROM')[0][7:-1].split(', ')
>>         return self._db.executesql(query, colnames=colnames, as_dict=True
>> )
>> ###############################################################
>>     def nested_select(self,*fields,**attributes):
>>         return Expression(self.db,self._select(*fields,**attributes))
>> 3 lines of code... Could even be shorten to 2...
>>
>> Was was the outcome?
>> Well, lets see:
>>
>>
>> <https://lh4.googleusercontent.com/-A_2xljL5gJQ/Uou5nSpH4lI/AAAAAAAAAHA/NtCjwJQxICc/s1600/QuickSelect.jpg>
>> Voila!
>> 0.04s for a *practically-usable *list of dictionaries(!)
>> An order-of-magnitude improvement in less than 5 lines of code...
>>
>> I *though *I could do *better*, and wrap the result-set myself with a 
>> simple class, and inject the dictionaries into the object.__dict__ 
>> attribute, so I could *emulate*, and perhaps even *point-in-place* *replace 
>> *the "real" *select() *method itself (including support for *JOIN*s)
>> This way, I could leave all our app-code unchanged:
>>
>> class SimpleObject:
>>     def __init__(self, attributes=None):
>>         self.__dict__ = attributes
>>
>>
>> class Set(object):
>>     ...
>>     def select(self, *fields, **attributes):
>>         query = self._select(*fields, **attributes)
>>         colnames = query.split(' FROM')[0][7:-1].split(', ')
>>         tableNames = [colname.split('.')[0] for colname in colnames]
>>         uniqueTableNames = set(tableNames)
>>         records = self._db.executesql(query, colnames=colnames, as_dict=
>> True)
>>         if len(uniqueTableNames) is 1: # Single Table
>>             tableNameLemgth = len(uniqueTableNames[0]) + 1
>>             return [
>>                 SimpleObject(
>>                     dict(
>>                         [
>>                             (
>>                                 key[tableNameLemgth:-1],
>>                                 value
>>                             ) for key, value in record.iteritems()
>>                         ]
>>                     ) for record in records
>>                 )
>>             ]
>>         else: # Multiple Tables (JOIN)
>>             tableNamesLemgth = dict(   # A dictionary mapping 
>> table-names to their length+1
>>                 [                      # This is for later "easy" 
>> splicing of the key-names
>>                     (
>>                          tableName,
>>                          (len(tableName) + 1)
>>                      ) for tableName in uniqueTableNames
>>                 ]
>>             )
>>             return [
>>                 SimpleObject(
>>                     dict(
>>                         [
>>                             (
>>                                 tableName,
>>                                 SimpleObject(
>>                                     dict(
>>                                         [
>>                                             (
>>                                                 key[tableNamesLemgth[
>> tableName]:-1],
>>                                                 value
>>                                             ) for key, value in record.
>> iteritems() if (
>>                                               key[:(tableNamesLemgth[
>> tableName] - 1)] == tableName
>>                                               )
>>                                         ]
>>                                     )
>>                                 )
>>                             ) for tableName in tableNames
>>                         ]
>>                     )
>>                 ) for record in records
>>             ]
>>
>> But that didn't work as I expected... I got an even worst result than the 
>> original web2py *Rows *parsing...
>> Tried to remove the SimpleObject class from the equation, and just return 
>> nested-dictionaries - No luck... Same result...
>> So I guess the *Rows *parsing isn't "slow" after all, as it's doing a 
>> better job than I can...
>>
>> However, I still think there is room for having my *quickSelect()* method, 
>> in conjunction with the small addition to the *executesql()* method - 
>> for people who want to dramatically speed-up their application, and are 
>> willing to pay the price of getting back a flat list of dictionaries with 
>> "table.filed" key-names.
>>
>> We're finding it potentially useful enough, that we are prepared to 
>> experiment this in our entire app...
>> The speed-bump is just too impressive (and tempting) to overlook...
>>
>> What say you?
>>
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[web2py] Re: Making the DAL orders-of-magnitude faster...

Reply via email to