Thanks for the very clear explanation.
I made the following:
t0 = time.time()
rows = cache.memcache ('q1', lambda : q1(query))
t1 = time.time()
rows2 = cache.memcache('q2', lambda: q2(query))
rows2 = [r for r in rows2 if (r.end_point.gathered_on -
r.start_point.gathered_on < datetime.timedelta(days=1)) ]
t2 = time.time()
print 'q1:', t1-t0
print 'q2:', t2-t1
and I got from three different requests:
q1: 0.0104899406433
q2: 0.0106620788574
q1: 0.011244058609
q2: 0.0156059265137
q1: 0.010370016098
q2: 0.0133299827576
Beside memcache, I tried with cache.ram getting:
q1: 9.77516174316e-06
q2: 0.00217413902283
q1: 1.62124633789e-05
q2: 0.00206780433655
q1: 8.10623168945e-06
q2: 0.00120091438293
Given that, in a local env cache.ram is faster than memcache.
Paolo
On Monday, March 25, 2013 11:54:07 AM UTC+1, Niphlod wrote:
>
> well, when you plan to use cache you do it because:
> a) the additional roundtrip is negligible
> b) using the cache means your database is not used
> I'll explain it further....
> A query that takes 0.18ms to get "computed" on the database side can very
> well end up in 18sec as soon as your database gets "hit" by several
> requests.
> A cached query that takes 0.18ms to get stored and 0.18ms to get retrieved
> (unpacked, decompressed, unpickled, choose your own "term" that suits the
> job) will take, if the cache backend is "proper", 0.36ms the first time and
> 0.18ms from the 2nd to the nth.
> Even so, some complicated queries (or uncomplicated for you, but
> complicated for the db backend) can be speedied up just requesting less
> fields or computing in raw python something that can be usually done on the
> backend.
> This kind of optimization (i.e. relying on python's datetime operations vs
> an extract(epoch from datefield) on the db) will be less and less useful as
> soon as the resultsets cardinality grows (i.e. the more rows you have the
> better chance for the db to get the results faster than having a basic
> query and processing the results in python).
> Assuming your "python filter" cuts out the 5% on the total of the rows, as
> soon as the number of rows grows you'll just waste time unpacking that
> additional 5%, and the time difference will be more explicit.
> On a total unrelated note, remember that Python is fast, but dbs are fast
> too, especially when leveraged by all their features (e.g. you could set an
> index on the "computed" epoch and that query will probably gain a lot of
> speedup).
> This is meant to explain why Q2 is faster than Q1.
> On the reason why Q1 with cache isn't faster than Q2 with cache, since the
> time difference is negligible, it's possible that the time difference gets
> "spent" on calculating the automatic key for the cache of the query (on the
> first implementation needs to add the "elapsed time" part) . Just to be
> sure, try timing
>
> def Q1():
> your first implementation, without cache
> def Q2():
> your second implementation, without cache, without the check in python
>
> with
> rows1 = cache('q1', lambda : Q1())
>
>
> ..... and, separately
> rows2 = cache('q2', lambda: Q2())
> rows2 = [r for row in rows ...]
>
>
>
>
> On Monday, March 25, 2013 11:12:50 AM UTC+1, Paolo valleri wrote:
>>
>> Hi Niphlod, thanks for the answer. I am not blaming on web2py I am just
>> sharing with you all these results.
>> In both the cases I have the time constrain, in one case it is in the
>> query (Q1) while in the second it is explicit(Q2). With cache enabled, Q1
>> is no longer checking the constrain while Q2 is still checking all the
>> constrain for all rows. Moreover, computing the .epoch() should take a well
>> defined amount of time but for queries without cache, with cache it should
>> be stored the computed result.
>>
>> To sum up, when cache is enabled, with the same amount of rows (nearly
>> 300):
>> - Q2 is taking less time than Q1 even if it has to compute explicitly the
>> time constrain
>> - Q1 is taking more time than Q2, but it has to retrieve from cache only
>> one field more than Q1 among a dozen of fields
>>
>> Given that, retrieving more data from memcached can be slower than
>> re-computing it for each request.
>>
>>
>> Paolo
>>
>>
>> 2013/3/25 [email protected] <[email protected]>
>>
>>>
>>>
>>> Paolo
>>>
>>>
>>> 2013/3/25 Niphlod <[email protected]>
>>>
>>>> I didn't get the point. ....
>>>> Q1 is slower (obviously) cause of the epoch() function and the
>>>> additional filtering
>>>> Q1 with cache shows more or less the same duration as Q2 with cache.
>>>> the key point is "even when the cache is enabled the first query is
>>>> taking longer" ... longer than what ?
>>>> Q1 with cache is speedier that:
>>>> - Q1 without cache, and that's ok
>>>> - Q2 without cache, and that's ok
>>>> but it's slower than
>>>> - Q2 with cache
>>>> however 3 ms (assuming those timings are in seconds) with all the
>>>> moving parts of a memcache instance seems unfair to blame entirely on
>>>> web2py
>>>>
>>>>
>>>>
>>>> On Monday, March 25, 2013 9:48:14 AM UTC+1, Paolo valleri wrote:
>>>>>
>>>>> Dear all, I want to share with you some results. I implemented a query
>>>>> in two different ways. Given the following common code:
>>>>> start = db.record.with_alias('start_**point')
>>>>> end = db.record.with_alias('end_**point')
>>>>> elapsed_time = end.gathered_on.epoch() - start.gathered_on.epoch()
>>>>>
>>>>> The first query is (the constrain is in the query):
>>>>> rows = db( query & (elapsed_time < 86400) ).select(
>>>>> start.ALL,
>>>>> end.ALL,
>>>>> start.gathered_on.epoch(),
>>>>> end.gathered_on.epoch(),
>>>>> elapsed_time,
>>>>> orderby=start.gathered_on.**epoch(),
>>>>> left=start.on( (start.mac == end.mac) & (start.gathered_on <
>>>>> end.gathered_on)),
>>>>> cache=(cache.memcache, 3600),
>>>>> cacheable = True
>>>>> )
>>>>> The second one is (the constrain is explicitly tested latter):
>>>>> rows = db( query ).select(
>>>>> start.ALL,
>>>>> end.ALL,
>>>>> start.gathered_on.epoch(),
>>>>> end.gathered_on.epoch(),
>>>>> elapsed_time,
>>>>> orderby=start.gathered_on.**epoch(),
>>>>> left=start.on( (start.mac == end.mac) & (start.gathered_on <
>>>>> end.gathered_on)),
>>>>> cache=(cache.memcache, 3600),
>>>>> cacheable = True
>>>>> )
>>>>> rows2 = [r for r in rows if (r.end_point.gathered_on -
>>>>> r.start_point.gathered_on < datetime.timedelta(days=1)) ]
>>>>>
>>>>> From the timing results I got that the second query is always faster,
>>>>> with or without cache:
>>>>> Q_1: 0.273243904114
>>>>> Q_1 with cache: 0.0182011127472
>>>>> Q_2: 0.250607967377
>>>>> Q_2 with cache: 0.0158171653748
>>>>>
>>>>> Beside the fact that they are just a few milliseconds of difference
>>>>> and that all the rows satisfy the constrain, what is not clear to me is
>>>>> why
>>>>> even when the cache is enabled the first query is taking longer. The
>>>>> question that came to my mind is about computed columns, are they cached?
>>>>>
>>>>> Paolo
>>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "web2py-users" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/web2py/9RfFmNQc-tE/unsubscribe?hl=en.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>>
>>>
>>>
>>
--
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.