Re: [PERFORM] Using ctid column changes plan drastically

2012-08-01 Thread Kevin Grittner
Thomas Kellerer spam_ea...@gmx.net wrote: I finally found a solution that runs fine: DELETE FROM dupes a WHERE EXISTS (SELECT 1 FROM dupes b WHERE b.first_name = a.first_name AND b.last_name = a.last_name AND b.ctid

Re: [PERFORM] Using ctid column changes plan drastically

2012-07-25 Thread Thomas Kellerer
Tom Lane, 24.07.2012 19:12: Well, it would only help if you're running a PG version that's new enough to recognize the NOT EXISTS as an anti-join; and even then, it's possible that joining on a tid column forecloses enough plan types that you don't get any real benefit. But I'm just guessing.

[PERFORM] Using ctid column changes plan drastically

2012-07-24 Thread Thomas Kellerer
Hi, I was testing a query to delete duplicates to see how well using ctid works if the table doesn't have a unique identifier available. The table definition is: create table dupes ( id integer primary key, first_name text, last_name text ); My test table has 100.000 rows with

Re: [PERFORM] Using ctid column changes plan drastically

2012-07-24 Thread Tom Lane
Thomas Kellerer spam_ea...@gmx.net writes: DELETE FROM dupes WHERE id NOT IN (SELECT min(b.id) FROM dupes b GROUP BY first_name, last_Name HAVING count(*) 1); Doesn't that kill the non-duplicates too? Why does the usage of the CTID

Re: [PERFORM] Using ctid column changes plan drastically

2012-07-24 Thread Thomas Kellerer
Tom Lane, 24.07.2012 16:23: Thomas Kellerer spam_ea...@gmx.net writes: DELETE FROM dupes WHERE id NOT IN (SELECT min(b.id) FROM dupes b GROUP BY first_name, last_Name HAVING count(*) 1); Doesn't that kill the non-duplicates too? Ah

Re: [PERFORM] Using ctid column changes plan drastically

2012-07-24 Thread Tom Lane
Thomas Kellerer spam_ea...@gmx.net writes: Tom Lane, 24.07.2012 16:23: IIRC, type tid doesn't have any hash support. So the bad plan is expected? Joins on tid columns just aren't supported very well at the moment. Partly that's from lack of round tuits, and partly it's because it doesn't seem

Re: [PERFORM] Using ctid column changes plan drastically

2012-07-24 Thread Thomas Kellerer
Tom Lane wrote on 24.07.2012 17:55: Joins on tid columns just aren't supported very well at the moment. Partly that's from lack of round tuits, and partly it's because it doesn't seem all that wise to encourage people to use them. There are gotchas if any of the rows receive concurrent updates.

Re: [PERFORM] Using ctid column changes plan drastically

2012-07-24 Thread Tom Lane
Thomas Kellerer spam_ea...@gmx.net writes: Tom Lane wrote on 24.07.2012 17:55: FWIW, it might be helpful to cast this as a NOT EXISTS rather than NOT IN subquery. Hmm. How would you change that into an NOT EXISTS clause (so that one of the duplicates remains) Everything I come up with is