[PERFORM] ANTI-JOIN needs table, index scan not possible?
Hi, I need an ANTI-JOIN (not exists SELECT something from table.../ left join table WHERE table.id IS NULL) on the same table. Acutally I have an index to serve the not exists question, but the query planner chooses to to a bitmap heap scan. The table has 100 Mio rows, so doing a heap scan is messed up... It would be really fast if Postgres could compare the to indicies. Does Postgres have to visit the table for this ANTI-JOIN? I know the table has to be visitied at some point to serve the MVCC, but why so early? Can NOT ESISTS only be fixed by the table, because it could miss soemthing otherwise? -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] big joins not converging
Dan Ancona da at vizbang.com writes: his is a choice between developing some in-house capacity for this and sending people to various vendors so we'll probably lean on the vendors for now, at least while we work on it. I would try to do the record matching in house and see how far you get, even if you are talking to vendors concurrently. You might get lucky, and you will learn a lot about your data and how much to expect and pay for vendor solutions. I would: Try building multi column indices on both tables for what you think are the same rows, and match deterministically (if you have a key like social security, then do this again on full names). Examine your data to see what hits, what misses, what hits multiple. If you know there is a good and an iffy table, you can use a left outer, otherwise you need a full outer. Then put all your leftovers from each into new tables, and try again with something fuzzy. If you build the indices and use = and it is still slow, ask again here -- that shouldn't happen. And you're right fork, Record Linkage is in fact an entire academic discipline! Indeed. Look for blocking and editing with your data first, I think. I find this problem pretty interesting, so I would love to hear your results. I am right now matching building permits to assessor parcels I wish I was using PG ... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ANTI-JOIN needs table, index scan not possible?
I know the table has to be visitied at some point to serve the MVCC, but why so early? Can NOT ESISTS only be fixed by the table, because it could miss soemthing otherwise? Possibly because the index entries you're anti-joining against may point to deleted tuples, so you would erroneously omit rows from the join result if you skip the visibility check? --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Table partitioning problem
On Thu, Mar 10, 2011 at 3:25 AM, Samba GUEYE samba.gu...@intesens.com wrote: The measure insertion is successful but problems raise up when inserting in the simpleMeasure table because it can't find the foreign key inserted the measure table and do not look at the partitionned tables Yes, that's how it works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?
Marti Raudsepp marti at juffo.org writes: If you don't mind long recovery times in case of a crash, set checkpoint_segments to ~100 and checkpoint_completion_target=0.9; this will improve write throughput significantly. Sounds good. Also, if you don't mind CORRUPTing your database after a crash, setting fsync=off and full_page_writes=off gives another significant boost. I probably won't do this... ;) UPDATE on a table with many indexes will probably be slower. If you want to speed up this part, use INSERT INTO x SELECT and take this chance to partition your table, Like the following? Will it rebuild the indexes in a sensical way? BEGIN; CREATE TABLE tempfoo as SELECT *, foo + bar AS newcol FROM bar; TRUNCATE foo; ALTER TABLE foo ADD COLUMN newcol; INSERT INTO foo SELECT * FROM tempfoo; DROP TABLE tempfoo; END; such that each individual partition and most indexes will fit in your cache. Is there a rule of thumb on tradeoffs in a partitioned table? About half the time, I will want to do GROUP BY's that use the partition column, but about half the time I won't. (I would use the partition column whatever I am most likely to cluster by in a single big table, right?) For example, I might intuitively partition by age5 (into 20 tables like tab00, tab05, tab10, etc). Often a query would be SELECT ... FROM PARENTTABLE GROUP BY age5, race, etc, but often it would be GROUP BY state or whatever with no age5 component. I know I can experiment ;), but it takes a while to load anything, and i would rather stand on the shoulders. Thanks so much for all your helps! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ANTI-JOIN needs table, index scan not possible?
Thanks for the answer. so there's no way around this problem? A nice index bitmap merge thing would be super fast. Big table ANTI JOIN queries with only a few results expected, are totally broken, if this is true. This way the query breaks my neck. This is a massive downside of postgres which makes this kind of query impossible. Mysql gives you the answer in a few seconds :-( Possibly because the index entries you're anti-joining against may point to deleted tuples, so you would erroneously omit rows from the join result if you skip the visibility check? --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Schon gehört? GMX hat einen genialen Phishing-Filter in die Toolbar eingebaut! http://www.gmx.net/de/go/toolbar -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ANTI-JOIN needs table, index scan not possible?
On Fri, Mar 11, 2011 at 06:54:39PM +0100, hans wulf wrote: Thanks for the answer. so there's no way around this problem? A nice index bitmap merge thing would be super fast. Big table ANTI JOIN queries with only a few results expected, are totally broken, if this is true. This way the query breaks my neck. This is a massive downside of postgres which makes this kind of query impossible. Mysql gives you the answer in a few seconds :-( Super! I am glad that MySQL can meet your needs. No software is perfect and you should definitely chose based on your use-case. Regards, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ANTI-JOIN needs table, index scan not possible?
Kenneth Marshall k...@rice.edu wrote: On Fri, Mar 11, 2011 at 06:54:39PM +0100, hans wulf wrote: so there's no way around this problem? A nice index bitmap merge thing would be super fast. Big table ANTI JOIN queries with only a few results expected, are totally broken, if this is true. This way the query breaks my neck. This is a massive downside of postgres which makes this kind of query impossible. Mysql gives you the answer in a few seconds :-( Super! I am glad that MySQL can meet your needs. No software is perfect and you should definitely chose based on your use-case. Well, as far as I can see we haven't yet seen near enough information to diagnose the issue, suggest alternative ways to write the query which might perform better, or determine whether there's an opportunity to improve the optimizer here. Hans, please read this page and provide more detail: http://wiki.postgresql.org/wiki/SlowQueryQuestions -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance