https://bugzilla.wikimedia.org/show_bug.cgi?id=32679
Roan Kattouw <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #9 from Roan Kattouw <[email protected]> 2011-11-30 12:35:11 UTC --- (In reply to comment #7) > Roan: Can you post what you thought might reduce the number of queries here? > For the public record: (1) the cache timeout is 15 minutes for all fundraising campaigns, so we're recomputing the numbers for the 2007-2010 campaigns every 15 minutes. Since no one is gonna go back in time to donate to those campaigns, that's just a waste of resources. The cache timeout should be configurable per campaign and set to a high value (maybe not quite infinity, but 168h or something?) for the 2007-2010 campaigns and to 15 mins for the 2011 campaign (2) the current code uses one query ('dailyTotals') to get all the data that goes into the bars, then uses another 6 (!) queries ('dailyTotalMax', 'yearlyTotalMax', etc.) to find the maximum for each column. These maxima are needed to scale the graphs correctly, but they can be computed much more easily from the data that's already been fetched from the first query (use max() or, if the array structure is too weird, roll your own loop that goes through all data points and computes the maxima) (3) remove the CONVERT_TZ() stuff from the queries; the timezone stuff is broken anyway and can't easily be reenabled without breaking caching (3b) stop using the timezone query parameter. The form element is disabled, but the query parameter still allows cache pollution to occur In the current code, there are 7 queries per campaign (one to get the data, six to get the maxima) and 5 campaigns, so 7*5=35 queries are done per recache (i.e. every 15 minutes when the memcached entry expires). If #1 and #2 are both implemented, this would be down to one query. The number of queries matters here, because every single query that we're talking about here is profoundly evil. I've included two samples below for shock value. Unfortunately, using these queries is necessary right now because we don't have a summary table. What Domas suggested we use in the longer term (and I support this suggestion) is either a statistics table that's updated by a cron job that runs the slow query hourly or daily (not ideal but not hard to do), or introduce a summary table with counts that CiviCRM would update along with the main public_reporting table (this is harder, but provides cheap live data). The base query (to get the data) is: SELECT DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(received), '+00:00', '$timezone'), '%Y-%m-%d'), sum(converted_amount), count(*), avg(converted_amount), max(converted_amount) FROM public_reporting WHERE received >= 'Nov 16 2011' AND received <= 'Jan 15 2012' AND converted_amount >= 1 AND converted_amount <= 10000 ORDER BY received GROUP BY DATE_FORMAT(FROM_UNIXTIME(received), '+00:00', '$timezone'), '%Y-%m%d'); The queries to get the maxima look like (this one is for the highest daily average): SELECT avg(converted_amount) as sum FROM public_reporting WHERE received >= 'Nov 16 2011' AND received <= 'Jan 15 2012' AND converted_amount >= 1 AND converted_amount <= 10000 ORDER BY sum DESC GROUP BY DATE_FORMAT(FROM_UNIXTIME(received), '+00:00', '$timezone'), '%Y-%m%d'); LIMIT 1; (In these queries, $timezone is the timezone requested by the user. This feature is currently semi-disabled: there is no form control for setting the timezone, so $timezone is '+0:00' unless the user sets a query parameter by hand (and pollutes the cache that way, oops!).) -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
