Re: [web2py] Re: Cache performance with computed columns

Niphlod Mon, 25 Mar 2013 03:54:10 -0700

well, when you plan to use cache you do it because:
a) the additional roundtrip is negligible
b) using the cache means your database is not used
I'll explain it further....
A query that takes 0.18ms to get "computed" on the database side can very 
well end up in 18sec as soon as your database gets "hit" by several 
requests.
A cached query that takes 0.18ms to get stored and 0.18ms to get retrieved 
(unpacked, decompressed, unpickled, choose your own "term" that suits the 
job) will take, if the cache backend is "proper", 0.36ms the first time and 
0.18ms from the 2nd to the nth.
Even so, some complicated queries (or uncomplicated for you, but 
complicated for the db backend) can be speedied up just requesting less 
fields or computing in raw python something that can be usually done on the 
backend.
This kind of optimization (i.e. relying on python's datetime operations vs 
an extract(epoch from datefield) on the db) will be less and less useful as 
soon as the resultsets cardinality grows (i.e. the more rows you have the 
better chance for the db to get the results faster than having a basic 
query and processing the results in python). 
Assuming your "python filter" cuts out the 5% on the total of the rows, as 
soon as the number of rows grows you'll just waste time unpacking that 
additional 5%, and the time difference will be more explicit.
On a total unrelated note, remember that Python is fast, but dbs are fast 
too, especially when leveraged by all their features (e.g. you could set an 
index on the "computed" epoch and that query will probably gain a lot of 
speedup).
This is meant to explain why Q2 is faster than Q1. 
On the reason why Q1 with cache isn't faster than Q2 with cache, since the 
time difference is negligible, it's possible that the time difference gets 
"spent" on calculating the automatic key for the cache of the query (on the 
first implementation needs to add the "elapsed time" part) . Just to be 
sure, try timing


def Q1():
    your first implementation, without cache
def Q2():
    your second implementation, without cache, without the check in python

with 
rows1 = cache('q1', lambda : Q1())


..... and, separately
rows2 = cache('q2', lambda: Q2())
rows2 = [r for row in rows ...]




On Monday, March 25, 2013 11:12:50 AM UTC+1, Paolo valleri wrote:
>
> Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just 
> sharing with you all these results.
> In both the cases I have the time constrain, in one case it is in the 
> query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1 
> is no longer checking the constrain while Q2 is still checking all the 
> constrain for all rows. Moreover, computing the .epoch() should take a well 
> defined amount of time but for queries without cache, with cache it should 
> be stored the computed result.
>
> To sum up, when cache is enabled, with the same amount of rows (nearly 
> 300): 
> - Q2 is taking less time than Q1 even if it has to compute explicitly the 
> time constrain
>  - Q1 is taking more time than Q2, but it has to retrieve from cache only 
> one field more than Q1 among a dozen of fields
>
> Given that, retrieving more data from memcached can be slower than 
> re-computing it for each request.
>
>
>  Paolo
>
>
> 2013/3/25 [email protected] <javascript:> <[email protected]<javascript:>
> >
>
>>
>>
>>  Paolo
>>
>>
>> 2013/3/25 Niphlod <[email protected] <javascript:>>
>>
>>> I didn't get the point. ....
>>> Q1 is slower (obviously) cause of the epoch() function and the 
>>> additional filtering
>>> Q1 with cache shows more or less the same duration as Q2 with cache. 
>>> the key point is "even when the cache is enabled the first query is 
>>> taking longer" ... longer than what ?
>>> Q1 with cache is speedier that:
>>> - Q1 without cache, and that's ok
>>> - Q2 without cache, and that's ok
>>> but it's slower than
>>> - Q2 with cache 
>>> however 3 ms (assuming those timings are in seconds) with all the moving 
>>> parts of a memcache instance seems unfair to blame entirely on web2py
>>>
>>>
>>>
>>> On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote:
>>>>
>>>> Dear all, I want to share with you some results. I implemented a query 
>>>> in two different ways. Given the following common code:
>>>> start = db.record.with_alias('start_**point')
>>>> end = db.record.with_alias('end_**point')
>>>> elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch()
>>>>
>>>> The first query is (the constrain is in the query):
>>>> rows = db( query & (elapsed_time  < 86400) ).select(
>>>> start.ALL, 
>>>> end.ALL, 
>>>>  start.gathered_on.epoch(),
>>>> end.gathered_on.epoch(),
>>>> elapsed_time,
>>>>  orderby=start.gathered_on.**epoch(),
>>>> left=start.on( (start.mac == end.mac) & (start.gathered_on < 
>>>> end.gathered_on)),
>>>>  cache=(cache.memcache, 3600),
>>>> cacheable = True
>>>> )
>>>> The second one is (the constrain is explicitly tested latter):
>>>> rows = db( query ).select(
>>>> start.ALL, 
>>>> end.ALL, 
>>>> start.gathered_on.epoch(),
>>>>  end.gathered_on.epoch(),
>>>> elapsed_time,
>>>> orderby=start.gathered_on.**epoch(),
>>>>  left=start.on( (start.mac == end.mac) & (start.gathered_on < 
>>>> end.gathered_on)),
>>>> cache=(cache.memcache, 3600),
>>>>  cacheable = True
>>>> )
>>>> rows2 = [r for r in rows if (r.end_point.gathered_on - 
>>>> r.start_point.gathered_on < datetime.timedelta(days=1)) ]
>>>>
>>>> From the timing results I got that the second query is always faster, 
>>>> with or without cache:
>>>> Q_1: 0.273243904114 
>>>> Q_1 with cache: 0.0182011127472
>>>> Q_2: 0.250607967377
>>>> Q_2 with cache: 0.0158171653748
>>>>
>>>> Beside the fact that they are just a few milliseconds of difference and 
>>>> that all the rows satisfy the constrain, what is not clear to me is why 
>>>> even when the cache is enabled the first query is taking longer. The 
>>>> question that came to my mind is about computed columns, are they cached?
>>>>
>>>> Paolo
>>>>
>>>  -- 
>>>  
>>> --- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "web2py-users" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/web2py/9RfFmNQc-tE/unsubscribe?hl=en.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>>
>>
>>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [web2py] Re: Cache performance with computed columns

Reply via email to