https://bugzilla.wikimedia.org/show_bug.cgi?id=14237
--- Comment #10 from Philippe Verdy <verd...@wanadoo.fr> 2010-09-30 22:42:08 UTC --- note that multiple parameters for the syntax I propose may be reduced to just one: {{PAGESINCATEGORY:categoryname|restriction}} where restriction may be: - "" : no namespace id at all, useful to add namespaces - "*" : all namespace ids (the default), useful to remove namespaces followed by one or more of: - "+id" : add this namespace id to the current list - "-id" : remove this namespace id from the list if the restriction does not start by "*" or "+" or "-", then "+" is implied The namespace id could be either the numeric id, or a selector like "talk" to select all talk namespaces, and "subject" to select all subject namespaces. The namespace id can then take the forms: - an integer, the raw namespace number - a name, a namespace name (converted to a namespace id, should recognize the synonyms, notably localized names or English names, or site-specific names) - "odd": all odd namespace ids (i.e. "talk" namespaces associated to any subject namespace) - "even": all even namespace ids (i.e. "subject" namespaces) For example: - {{PAGESINCATEGORY:categoryname|*}} : equivalent to {{PAGESINCATEGORY:categoryname|}} and to {{PAGESINCATEGORY:categoryname}} (existing syntax) - {{PAGESINCATEGORY:categoryname|:}} : count only pages of the main namespace, that are members of the specified category name - {{PAGESINCATEGORY:categoryname|0}} : count only pages of the main namespace, that are members of the specified category name ; equivalent to {{PAGESINCATEGORY:categoryname|+0}} - {{PAGESINCATEGORY:categoryname|+project:+talk}} : count only pages of the "project:" or of any talk namespaces, that are members of the specified category name ; equivalent to {{PAGESINCATEGORY:categoryname|+0}} - {{PAGESINCATEGORY:categoryname|-talk}} : count all pages of any namespace excluding the talk namespaces (odd ids) that are members of the specified category name; equivalent to {{PAGESINCATEGORY:categoryname|*-talk}} The restriction can easily be implemented as WHERE clauses in the SQL select that will match the specified namespace ids, combined as a parenthetic list of 'OR id=value' (positive selections), followed by a list of exclusions with 'AND NOT id=value' (negative selections), and possibly with the "IN" operator if sets are available in the SQL syntax. Some ideas about the SQL server-side cost of counting members in a specific category: The SQL cost should with the restrictions above will be either the same (or better) as performing a select without the namespace restriction (because this is just a restriction of the existing syntax, and this should never reduce the selectivity of the SQL query, but may in fact help to improve it). However, this means that the existing restriction (for costly parser functions) should remain (because counting pages that are members of a category, independantly of which namespace they belong may be costly in very populated categories, depending on how members of categories are indexed). As this cost is effectively the cost of a: SELECT COUNT(*) from categorymembers WHERE category_pageid = $CATEGORYPAGEID AND member_namespaceid = $CATEGORYNAMESPACEID aggregate (note: I don't know the exact schema impelementation which varies across Mediawiki versions, so replace the table names and column names appropriately), one way to solve it would be to use: SELECT 1 from categorymembers WHERE category_pageid = $CATEGORYPAGEID AND member_namespaceid = $CATEGORYNAMESPACEID LIMIT 50 and then let the PHP code count the returned "1" rows: if there are 50 rows, then the category is too much populated, and COUNT(*) may take time, so the function can be considered costly. If the cost limit is reached, just return this limit value to the page calling the function, otherwise perform the same select, replacing "SELECT 1" by "SELECT COUNT(*)" (without the LIMIT clause) to return the exact value, or return the last known estimate from a separate caching aggregate table that will be updated separately (using a max timestamp of validity), to avoid reusing the same aggregate repetitively because of templated pages using this function and frequent accesses by many users viewing or editing various pages. The value specified in the "LIMIT" clause above (here "50") may be tuned; and this first check (for performance) may be removed completely, or removed if the SQL schema includes an index that precompute aggregates for counting members in each specific category (in which case there will not be any need to perform a SELECT COUNT(*) aggregate, given that the count will be retrieved directly from a precomputed aggregate caching table, that should be updated asynchronously, either as a batch, or when the selective SELECT in the cache detects that the stored value is out of date, in which case it will perform the SELECT COUNT(*) from the non-cached table, just to update the caching table and its timestamp). -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l