> I can't get the following statement to complete with reasonable time. Upgraded to 7.4.1, and realized that NOT IN is far more efficient than IN, EXISTS or NOT EXISTS, at least for the amount and distribution of data that I have. Here are some numbers from before and after performing the problematic clean up operation:
| Before | After ------------------------+-----------+----------- COUNT(*) | 6'104'075 | 6'104'075 COUNT(session) | 5'945'272 | 3'640'659 COUNT(DISTINCT session) | 2'865'570 | 560'957 The following query completes within less than three hours on a machine with a high load, versa many many hours for any of the alternatives: UPDATE requests SET session = NULL WHERE session NOT IN ( SELECT r.session FROM requests r WHERE r.session IS NOT NULL GROUP BY r.session HAVING COUNT(*) > 1 ); Note that in order to correctly reverse an IN subquery, IS NOT NULL needs to be added. Interestingly, the query planner believes that using EXISTS would be more efficient than NOT IN, and IN only slightly less efficient; I assume the query planner is not able to accurately estimate the number of rows returned by the subquery. EXISTS 351'511 NOT IN 376'577 IN 386'780 LEFT JOIN 18'263'826 NOT EXISTS 7'241'815'330 ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster