https://bugzilla.wikimedia.org/show_bug.cgi?id=18078





--- Comment #2 from Brad Jorsch <[email protected]>  2009-03-23 
18:13:20 UTC ---
implicitGroupby would only fix the problem here accidentally. The real problem
is:

1. The behavior of "SELECT DISTINCT foo FROM baz ORDER BY bar" is not well
defined (except in the case that (foo,bar) could be a unique key for the result
set). MySQL ignores the undefinedness and arbitrarily chooses the value of bar
from the first row fetched, while PostgreSQL throws an error. The SQL standard
seems to call for PostgreSQL's behavior, BTW.
2. MySQL filesorts for any well-defined variation of the query that I can think
of, for example "SELECT foo, MAX(bar) as max_bar FROM baz GROUP BY foo ORDER BY
max_bar". Unless the rules are different for AbuseFilter than for API queries,
a query that filesorts will bring the wrath of domas upon us.

As you implied, one possible "fix" for the problem is to continue using the
accidentally-working query for MySQL and the correct query for PostgreSQL.
implicitGroupby() could act as a "is this MySQL?" flag when we consider only
MySQL versus PostgreSQL, but it makes as much sense as using
cascadingDeletes(), cleanupTriggers(), strictIPs(), implicitOrderby(),
realTimestamps(), searchableIPs(), or functionalIndexes() for the same purpose
since the problem has nothing to do with whether the database sorts the result
rows to implement GROUP BY. Better IMO would be to explicitly check $wgDBtype
== 'mysql', since then it's clearly marked as being a MySQL-specific hack.


BTW, as far as I can tell the reason the query works in MySQL is because it
uses the page_timestamp index to fetch the rows, and the "ORDER BY
rev_timestamp DESC LIMIT 10" somehow causes it to use the index in reverse
order. This makes its arbitrary choice of which rev_timestamp to keep be the
maximum timestamp, so the later application of the ORDER BY does what we
wanted. If anything changes to make the arbitrary choice not be the maximum
rev_timestamp, it will start giving incorrect results. In fact, I found in my
testing that simply leaving out the "LIMIT 10" seems to make MySQL use the
page_timestamp index in forward order, so the ORDER BY sorts by each user's
earliest edit rather than their most recent.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to