Re: [web2py] Re: Cache performance with computed columns

Paolo valleri Mon, 25 Mar 2013 04:21:36 -0700

Thanks for the very clear explanation.
I made the following:
t0 = time.time()
rows = cache.memcache ('q1', lambda : q1(query))
t1 = time.time()
rows2 = cache.memcache('q2', lambda: q2(query))
rows2 = [r for r in rows2 if (r.end_point.gathered_on - 
r.start_point.gathered_on < datetime.timedelta(days=1)) ]
t2 = time.time() 
print 'q1:', t1-t0
print 'q2:', t2-t1 
and I got from three different requests:
q1: 0.0104899406433
q2: 0.0106620788574
q1: 0.011244058609
q2: 0.0156059265137
q1: 0.010370016098
q2: 0.0133299827576
Beside memcache, I tried with cache.ram getting:
q1: 9.77516174316e-06
q2: 0.00217413902283
q1: 1.62124633789e-05
q2: 0.00206780433655
q1: 8.10623168945e-06
q2: 0.00120091438293
Given that, in a local env cache.ram is faster than memcache.


Paolo

On Monday, March 25, 2013 11:54:07 AM UTC+1, Niphlod wrote:
>
> well, when you plan to use cache you do it because:
> a) the additional roundtrip is negligible
> b) using the cache means your database is not used
> I'll explain it further....
> A query that takes 0.18ms to get "computed" on the database side can very 
> well end up in 18sec as soon as your database gets "hit" by several 
> requests.
> A cached query that takes 0.18ms to get stored and 0.18ms to get retrieved 
> (unpacked, decompressed, unpickled, choose your own "term" that suits the 
> job) will take, if the cache backend is "proper", 0.36ms the first time and 
> 0.18ms from the 2nd to the nth.
> Even so, some complicated queries (or uncomplicated for you, but 
> complicated for the db backend) can be speedied up just requesting less 
> fields or computing in raw python something that can be usually done on the 
> backend.
> This kind of optimization (i.e. relying on python's datetime operations vs 
> an extract(epoch from datefield) on the db) will be less and less useful as 
> soon as the resultsets cardinality grows (i.e. the more rows you have the 
> better chance for the db to get the results faster than having a basic 
> query and processing the results in python). 
> Assuming your "python filter" cuts out the 5% on the total of the rows, as 
> soon as the number of rows grows you'll just waste time unpacking that 
> additional 5%, and the time difference will be more explicit.
> On a total unrelated note, remember that Python is fast, but dbs are fast 
> too, especially when leveraged by all their features (e.g. you could set an 
> index on the "computed" epoch and that query will probably gain a lot of 
> speedup).
> This is meant to explain why Q2 is faster than Q1. 
> On the reason why Q1 with cache isn't faster than Q2 with cache, since the 
> time difference is negligible, it's possible that the time difference gets 
> "spent" on calculating the automatic key for the cache of the query (on the 
> first implementation needs to add the "elapsed time" part) . Just to be 
> sure, try timing
>
> def Q1():
>     your first implementation, without cache
> def Q2():
>     your second implementation, without cache, without the check in python
>
> with 
> rows1 = cache('q1', lambda : Q1())
>
>
> ..... and, separately
> rows2 = cache('q2', lambda: Q2())
> rows2 = [r for row in rows ...]
>
>
>
>
> On Monday, March 25, 2013 11:12:50 AM UTC+1, Paolo valleri wrote:
>>
>> Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just 
>> sharing with you all these results.
>> In both the cases I have the time constrain, in one case it is in the 
>> query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1 
>> is no longer checking the constrain while Q2 is still checking all the 
>> constrain for all rows. Moreover, computing the .epoch() should take a well 
>> defined amount of time but for queries without cache, with cache it should 
>> be stored the computed result.
>>
>> To sum up, when cache is enabled, with the same amount of rows (nearly 
>> 300): 
>> - Q2 is taking less time than Q1 even if it has to compute explicitly the 
>> time constrain
>>  - Q1 is taking more time than Q2, but it has to retrieve from cache only 
>> one field more than Q1 among a dozen of fields
>>
>> Given that, retrieving more data from memcached can be slower than 
>> re-computing it for each request.
>>
>>
>>  Paolo
>>
>>
>> 2013/3/25 [email protected] <[email protected]>
>>
>>>
>>>
>>>  Paolo
>>>
>>>
>>> 2013/3/25 Niphlod <[email protected]>
>>>
>>>> I didn't get the point. ....
>>>> Q1 is slower (obviously) cause of the epoch() function and the 
>>>> additional filtering
>>>> Q1 with cache shows more or less the same duration as Q2 with cache. 
>>>> the key point is "even when the cache is enabled the first query is 
>>>> taking longer" ... longer than what ?
>>>> Q1 with cache is speedier that:
>>>> - Q1 without cache, and that's ok
>>>> - Q2 without cache, and that's ok
>>>> but it's slower than
>>>> - Q2 with cache 
>>>> however 3 ms (assuming those timings are in seconds) with all the 
>>>> moving parts of a memcache instance seems unfair to blame entirely on 
>>>> web2py
>>>>
>>>>
>>>>
>>>> On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote:
>>>>>
>>>>> Dear all, I want to share with you some results. I implemented a query 
>>>>> in two different ways. Given the following common code:
>>>>> start = db.record.with_alias('start_**point')
>>>>> end = db.record.with_alias('end_**point')
>>>>> elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch()
>>>>>
>>>>> The first query is (the constrain is in the query):
>>>>> rows = db( query & (elapsed_time  < 86400) ).select(
>>>>> start.ALL, 
>>>>> end.ALL, 
>>>>>  start.gathered_on.epoch(),
>>>>> end.gathered_on.epoch(),
>>>>> elapsed_time,
>>>>>  orderby=start.gathered_on.**epoch(),
>>>>> left=start.on( (start.mac == end.mac) & (start.gathered_on < 
>>>>> end.gathered_on)),
>>>>>  cache=(cache.memcache, 3600),
>>>>> cacheable = True
>>>>> )
>>>>> The second one is (the constrain is explicitly tested latter):
>>>>> rows = db( query ).select(
>>>>> start.ALL, 
>>>>> end.ALL, 
>>>>> start.gathered_on.epoch(),
>>>>>  end.gathered_on.epoch(),
>>>>> elapsed_time,
>>>>> orderby=start.gathered_on.**epoch(),
>>>>>  left=start.on( (start.mac == end.mac) & (start.gathered_on < 
>>>>> end.gathered_on)),
>>>>> cache=(cache.memcache, 3600),
>>>>>  cacheable = True
>>>>> )
>>>>> rows2 = [r for r in rows if (r.end_point.gathered_on - 
>>>>> r.start_point.gathered_on < datetime.timedelta(days=1)) ]
>>>>>
>>>>> From the timing results I got that the second query is always faster, 
>>>>> with or without cache:
>>>>> Q_1: 0.273243904114 
>>>>> Q_1 with cache: 0.0182011127472
>>>>> Q_2: 0.250607967377
>>>>> Q_2 with cache: 0.0158171653748
>>>>>
>>>>> Beside the fact that they are just a few milliseconds of difference 
>>>>> and that all the rows satisfy the constrain, what is not clear to me is 
>>>>> why 
>>>>> even when the cache is enabled the first query is taking longer. The 
>>>>> question that came to my mind is about computed columns, are they cached?
>>>>>
>>>>> Paolo
>>>>>
>>>>  -- 
>>>>  
>>>> --- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "web2py-users" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/web2py/9RfFmNQc-tE/unsubscribe?hl=en.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>  
>>>>  
>>>>
>>>
>>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [web2py] Re: Cache performance with computed columns

Reply via email to