https://bugzilla.wikimedia.org/show_bug.cgi?id=1211





--- Comment #41 from Aryeh Gregor <[email protected]>  2008-12-20 
23:13:38 UTC ---
(In reply to comment #40)
> I suggest that we add a cl_namespace column to the table and also add that
> column to the cl_sortkey index.  This will allow efficient independent
> selection (and thus paging) of a category's members based on the namespace,
> which would not only resolve this bug but would help with #450 as well.  It's
> really not feasible to resolve #450 unless the different namespaces can be
> paged independently.
>
> I'll be happy to provide a software patch for both of these bugs (I'm almost
> done with the one for #450) but want to be sure that the proposed schema 
> change
> is acceptable, first.

First of all, this implementation of a fix would depend on bug 16012.  All
schema changes to large tables are suspended until there's a reasonable way of
executing them, apparently, and categorylinks is certainly a large table.

Second of all, you're (further) denormalizing the database here.  This might be
justifiable -- certainly it seems like the only reasonable way to fix bug 450
-- but it's a cost.  You'd have to make sure that anything that moves pages
also updates cl_namespace.  This is not currently a cost we have anywhere else,
AFAIK: either things are associated with the page_id and don't have to be
updated, or are associated with the namespace/title combination rather than the
page itself (e.g., pl_namespace and pl_title) and therefore still don't have to
be updated.  We can expect that bugs will occur here as cl_namespace sometimes
isn't updated on moves, for whatever reason.

Thirdly, this would probably *require* bug 450 to be fixed to be reasonably
efficient.  In the current setup, we'd need something like 'SELECT ... WHERE
cl_from=xxx AND cl_namespace NOT IN (' . NS_FILE . ',' . NS_CATEGORY . ') ORDER
BY cl_sortkey', which MySQL (at least older versions like 4.0) is too stupid to
execute efficiently: it can't use the index for ordering if there's a range
condition like IN in the WHERE clause.  (I think it would be smart enough in
5.0, or at least 5.1.)

And as for the proposal in bug 450, I see two issues.  First of all, you'd
surely need one query per namespace on each category view, which could be a
couple dozen on a lot of wikis.  That seems kind of excessive, if it's
avoidable.  Second of all, the current structure of the category table reflects
the pages/subcats/files breakdown.  If counts are given separately for all
namespaces, you'd need to store all the counts in the category table.  This
would either require an ALTER TABLE for every namespace added or removed (not
happening), or else breaking off a new categorycount table like (cc_cat,
cc_namespace, cc_count) to store the counts and deleting the cat_pages,
cat_subcats, cat_files from the category table.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are the assignee for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to