Hi, thanks for the reply.

Yes you are right, users who can get to the search page ideally would be 
able to search/see all the records.
However you are not right by saying that there is no performance impact on 
fetching first 50 and next 50, next 1000, etc.

I did some benchmarking, so please have a look at my results as reported by 
the web2py toolbar:

1) Page first loads --> ~2 sec (Not great, but not that bad too)

SELECT  cdr.calldate, cdr.dstchannel, cdr.clid, cdr.src, cdr.dst, cdr.billsec, 
cdr.disposition, cdr.userfield, cdr.lastapp, cdr.uniqueid FROM cdr WHERE 
((((((((cdr.lastapp <> 'BackGround') AND (cdr.lastapp <> 'VoiceMail')) AND 
(cdr.lastapp <> 'VoiceMailMain')) AND (cdr.lastapp <> 'Hangup')) AND 
(cdr.lastapp <> 'Playback')) AND (cdr.lastapp <> 'Wait')) AND (cdr.dst REGEXP 
'[^s]{1}')) AND (cdr.calldate < '2014-10-04 20:31:11')) ORDER BY cdr.calldate 
DESC LIMIT 50 OFFSET 0;

2377.98ms2) Getting next 50 records (or page 2 in the paginator 
breadcrumbs) --> again ~2 sec (Not great, but I can live with it)

SELECT  cdr.calldate, cdr.dstchannel, cdr.clid, cdr.src, cdr.dst, cdr.billsec, 
cdr.disposition, cdr.userfield, cdr.lastapp, cdr.uniqueid FROM cdr WHERE 
((((((((cdr.lastapp <> 'BackGround') AND (cdr.lastapp <> 'VoiceMail')) AND 
(cdr.lastapp <> 'VoiceMailMain')) AND (cdr.lastapp <> 'Hangup')) AND 
(cdr.lastapp <> 'Playback')) AND (cdr.lastapp <> 'Wait')) AND (cdr.dst REGEXP 
'[^s]{1}')) AND (cdr.calldate < '2014-10-04 20:55:23')) ORDER BY cdr.calldate 
DESC LIMIT 50 OFFSET 50;

2408.69ms
3) #3 --> 2444.45ms, #4 --> 2498.09ms, #5 --> 2525.10ms. And I stopped 
here, as it seemed that pattern is clear, every next page will load in 
roughly 2-3 sec.

4) But see, what happens, if I jump from here to the page 400 --> *38 sec 
!!! o_O*

SELECT  cdr.calldate, cdr.dstchannel, cdr.clid, cdr.src, cdr.dst, cdr.billsec, 
cdr.disposition, cdr.userfield, cdr.lastapp, cdr.uniqueid FROM cdr WHERE 
((((((((cdr.lastapp <> 'BackGround') AND (cdr.lastapp <> 'VoiceMail')) AND 
(cdr.lastapp <> 'VoiceMailMain')) AND (cdr.lastapp <> 'Hangup')) AND 
(cdr.lastapp <> 'Playback')) AND (cdr.lastapp <> 'Wait')) AND (cdr.dst REGEXP 
'[^s]{1}')) AND (cdr.calldate < '2014-10-04 21:01:10')) ORDER BY cdr.calldate 
DESC LIMIT 50 OFFSET 19950;

*38364.08ms*
5) I also tried page 4000, just to see, how many minutes that would take, 
but my Nginx timed out with e504...


I probably would have never discovered this issue, if I accidently did not 
click on the last page uri (>>), which also timed out.

My problem is, that in general search would never yield 1m rows, if users 
would use sane queries. However sometimes they need large datasets, and 
there I take advantage of the grid's export features (csv, tsv, xml), but I 
can't tolerate, that if someone will click on the last page, or try to 
change sort order by clicking on the table headers, it will "shut down" the 
system, as it also, like I mentioned, requires tremendous amount of 
resources.

I have an impression, that despite the fact, that it actually slice data 
sets, it also passes over the wire all intermediate results. In my example 
at stage 4, even though it sliced from 19950 to 20000 it also sent over the 
wire all those unwanted 19950 records?.. :/ 

Perhaps that is not the case, but you can see, that from 2 sec at the 
beginning it jumped to the almost 40 sec - which is a significant impact on 
the performance, I reckon.







2014 m. spalis 4 d., šeštadienis 20:08:50 UTC+1, Niphlod rašė:
>
> ok, lets reset us the question a bit. 
> This "original" issue had a logic because the grid makes TWO queries by 
> default. One is to retrieve the data for a particular page (and there's 
> NOTHING we can do about it) , the other is to count the total of the 
> records disregarding the pagination.
> Now, if your backend takes a lot of time to retrieve the page-related data 
> you want to show to the user, that a problem you need to fix in your 
> database (or your model, or whatever else).
> What web2py can fix (and did it already) is to avoid counting the 
> additional roundtrip to calculate the total of records (the count can 
> accomodate all kinds of logics now, fixed value, one of your choice, 
> dynamic, caching it, etc.).
>
> The thing you posted 
> """
> Same story here. I like *grid*, I like pagination, however I'm missing 
> the feature of limiting the total number of rows in the output.
> In my case *grid* can easily generate over 1+m rows in the output, which 
> when paginating by 50 rows per page, would produce 20k+ pages...
> Who needs that many pages? Who would click 20+k times anyway? 
> """
> doesn't point towards something the grid can fix.
>
> If records are in your table, and users can see 1 million records because 
> there are no "permission" issues, why would you limit your presentation to 
> only a slice of it ? There's no difference in performances trying to fetch 
> the first 50 records of a 1 million recordset or the next 50. Also, if you 
> don't want your users to reach page 10, then do a simple math 
> (no_of_records_per_page*max_page_they_reach) and set the corresponding 
> value to cache_count.
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to