Re: [PERFORM] experiments in query optimization
On Wed, 31 Mar 2010, Faheem Mitha wrote: On Tue, 30 Mar 2010, Robert Haas wrote: On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha fah...@email.unc.edu You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. There are situations where scanning the entire table to build up a hash table is more expensive than using an index. Why not test it? Certainly, but I don't know what you and Robert have in mind, and I'm not experienced enough to make an educated guess. I'm open to specific suggestions. Try creating an index on geno on the columns that are being used for the join. Ok, I'll try that. I guess the cols in question on geno are idlink_id and anno_id. I thought that I already had indexes on them, but no. Maybe I had indexes, but removed them. Looking at this more closely, idlink_id and anno_id are primary keys, so already have indexes on them, so my understanding (from the docs) is there is no purpose in creating them. That's why I removed the indexes that were there (back last August, actually, according to my logs). Anyway, doesn't look there is anything I can do here. Does anyone have additions or corrections to this? Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
Hi Eliot, Thanks for the comment. On Thu, 1 Apr 2010, Eliot Gable wrote: On Thu, Apr 1, 2010 at 7:46 AM, Faheem Mitha fah...@email.unc.edu wrote: Looking at this more closely, idlink_id and anno_id are primary keys, so already have indexes on them, so my understanding (from the docs) is there is no purpose in creating them. That's why I removed the indexes that were there (back last August, actually, according to my logs). Anyway, doesn't look there is anything I can do here. Does anyone have additions or corrections to this? When you do a join, you typically have a foreign key in one table referencing a primary key in another table. While designating a foreign key does put a constraint on the key to ensure referential integrity, it does not put an index on the column that is being designated as a foreign key. If I understand correctly, the scan done as the inner loop of the nested loop scan for the join is going to be your foreign key column, not your primary key column. Thus, if you have no index on the foreign key column, you will be forced to do a sequential table scan to do the join. In that case the hash-based join will almost certainly be faster (especially for such a large number of rows). If you put an index on the foreign key, then the inner scan can be an index scan and that might turn out to be faster than building the hash indexes on all the table rows. Somebody can correct me if I'm wrong. I had set the foreign keys in question (on the geno table) to be primary keys. This is because this setup is basically a glorified spreadsheet, and I don't want more than one cell corresponding to a particular tuple of idlink.id and anno.id (the conceptual rows and cols). Since a primary key defines an index, I thought putting indexes on idlink_id and anno_id was redundant. However, it looks like (unsurprisingly) the index corresponding to the primary key is across both columns, which may not be what is wanted for the aforesaid join. Ie. ALTER TABLE ONLY geno ADD CONSTRAINT geno_pkey PRIMARY KEY (idlink_id, anno_id) (As a side comment, with respect to the indexes on the other side of the joins, in one case, we have idlink.id = geno.idlink_id, and idlink.id is a primary key too. In the other, namely geno.anno_id = dedup_patient_anno.id, dedup_patient_anno is a CTE, so no index on dedup_patient_anno.id. But maybe indexes aren't needed there.) Here is the join SELECT decode_genotype(geno.snpval_id, %(allelea)s, %(alleleb)s) AS g, geno.idlink_id, geno.anno_id FROMgeno INNER JOIN dedup_patient_anno ON geno.anno_id = dedup_patient_anno.id INNER JOIN idlink ON geno.idlink_id = idlink.id ORDER BY idlink_id, anno_id Here is the table dump. -- Name: geno; Type: TABLE; Schema: hapmap; Owner: snp; Tablespace: -- CREATE TABLE geno ( idlink_id integer NOT NULL, anno_id integer NOT NULL, snpval_id integer NOT NULL ) WITH (autovacuum_enabled=true); ALTER TABLE hapmap.geno OWNER TO snp; -- -- Name: geno_pkey; Type: CONSTRAINT; Schema: hapmap; Owner: snp; Tablespace: -- ALTER TABLE ONLY geno ADD CONSTRAINT geno_pkey PRIMARY KEY (idlink_id, anno_id); () -- -- Name: geno_anno_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner: snp -- ALTER TABLE ONLY geno ADD CONSTRAINT geno_anno_id_fkey FOREIGN KEY (anno_id) REFERENCES anno(id) ON UPDATE CASCADE ON DELETE CASCADE; -- -- Name: geno_idlink_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner: snp -- ALTER TABLE ONLY geno ADD CONSTRAINT geno_idlink_id_fkey FOREIGN KEY (idlink_id) REFERENCES idlink(id) ON UPDATE CASCADE ON DELETE CASCADE; -- -- Name: geno_snpval_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner: snp -- ALTER TABLE ONLY geno ADD CONSTRAINT geno_snpval_id_fkey FOREIGN KEY (snpval_id) REFERENCES snpval(val) ON UPDATE CASCADE ON DELETE CASCADE; * So, should I add indexes on the individual foreign key cols idlink_id and anno_id after all? Regards, Faheem. -- Eliot Gable We do not inherit the Earth from our ancestors: we borrow it from our children. ~David Brower I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime. ~David Brower Nice quotes. Esse oportet ut vivas, non vivere ut edas. (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Thu, 1 Apr 2010, Robert Haas wrote: On Thu, Apr 1, 2010 at 2:15 PM, Faheem Mitha fah...@email.unc.edu wrote: I had set the foreign keys in question (on the geno table) to be primary keys. This is because this setup is basically a glorified spreadsheet, and I don't want more than one cell corresponding to a particular tuple of idlink.id and anno.id (the conceptual rows and cols). Since a primary key defines an index, I thought putting indexes on idlink_id and anno_id was redundant. However, it looks like (unsurprisingly) the index corresponding to the primary key is across both columns, which may not be what is wanted for the aforesaid join Actually it is what is wanted - that is good. I see. So, should I add indexes on the individual foreign key cols idlink_id and anno_id after all? I doubt that would help. You're sure of this? The bottom line may be that you're dealing with hundreds of millions of rows here, so things are going to take a long time. Of course you can always get more/faster memory, a bigger I/O subsystem, faster processors... and it could be that with detailed study there are optimizations that could be done even without spending money, but I think I'm about tapped out on what I can do over an Internet mailing list. Thanks for your assistance, Robert. It's been educational. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Thu, 1 Apr 2010, Eliot Gable wrote: On Thu, Apr 1, 2010 at 3:01 PM, Faheem Mitha fah...@email.unc.edu wrote: So, should I add indexes on the individual foreign key cols idlink_id and anno_id after all? I doubt that would help. You're sure of this? It is always best to test and be certain. Fair enough. I may also try disabling hash joins and see what happens... Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Wed, 31 Mar 2010, Matthew Wakeling wrote: On Tue, 30 Mar 2010, Faheem Mitha wrote: work_mem = 1 GB (see diag.{tex/pdf}). Sure, but define sane setting, please. I guess part of the point is that I'm trying to keep memory low You're trying to keep memory usage low, but you have work_mem set to 1GB? I'm trying to keep both runtime and memory usage low. I assume that with lower levels of memory, the runtime would be longer, other things being equal. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Tue, 30 Mar 2010, Kevin Grittner wrote: Faheem Mitha fah...@email.unc.edu wrote: If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? Because the planner looks at a very wide variety of plans, some of which may use many allocations of work_mem size, and some of which don't. The costs are compared and the lowest cost one is chosen. If you are close to the tipping point then even a very small change might affect which is chosen. It pays to keep the work_mem setting sane so that unexpected plan changes don't cause problems. Sure, but define sane setting, please. I guess part of the point is that I'm trying to keep memory low, and it seems this is not part of the planner's priorities. That it, it does not take memory usage into consideration when choosing a plan. If that it wrong, let me know, but that is my understanding. Look at the plans and their costs to get a feel for what's being chosen and why. Although it's a very bad idea to use these in production, you can often shift the plan to something you *think* would be better using the enable_* settings, to see what the planner thinks such a plan will cost and where it thinks the cost would be; that can help in tuning the settings. Right. You mean to close off certain options to the planner using 'Planner Method Configuration'. I suppose one can also use 'Planner Cost Constants' to alter plan behaviour. I haven't tried changing these. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. There are situations where scanning the entire table to build up a hash table is more expensive than using an index. Why not test it? Certainly, but I don't know what you and Robert have in mind, and I'm not experienced enough to make an educated guess. I'm open to specific suggestions. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On thing which I haven't really mentioned in this thread or in my writeup, is that the planners value for the number of rows in geno is way off base some of the time. It is around 800 million, it thinks it is 100 million. I don't know if this is significant or not, or what to do about it. eg. in the ped_bigjoin EXPLAIN ANALYZE VERBOSE: - Sort (cost=56855882.72..57144683.54 rows=115520330 width=42) (actual time=23027732.092..37113627.380 rows=823086774 loops=1) Output: (CASE WHEN (hapmap.geno.snpval_id = (-1)) THEN '0 0'::text WHEN (hapmap.geno.snpval_id = 0) THEN (((dedup_patient_anno.allelea_id)::text || ' '::text) || (dedup_patient_anno.allelea_id)::text) WHEN (hapmap.geno.snpval_id = 1) THEN (((dedup_patient_anno.allelea_id)::text || ' '::text) || (dedup_patient_anno.alleleb_id)::text) WHEN (hapmap.geno.snpval_id = 2) THEN (((dedup_patient_anno.alleleb_id)::text || ' '::text) || (dedup_patient_anno.alleleb_id)::text) ELSE NULL::text END), hapmap.geno.idlink_id, hapmap.geno.anno_id, pheno.patientid, pheno.phenotype, sex.code Faheem. On Tue, 30 Mar 2010, Kevin Grittner wrote: Faheem Mitha fah...@email.unc.edu wrote: If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? Because the planner looks at a very wide variety of plans, some of which may use many allocations of work_mem size, and some of which don't. The costs are compared and the lowest cost one is chosen. If you are close to the tipping point then even a very small change might affect which is chosen. It pays to keep the work_mem setting sane so that unexpected plan changes don't cause problems. Look at the plans and their costs to get a feel for what's being chosen and why. Although it's a very bad idea to use these in production, you can often shift the plan to something you *think* would be better using the enable_* settings, to see what the planner thinks such a plan will cost and where it thinks the cost would be; that can help in tuning the settings. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. There are situations where scanning the entire table to build up a hash table is more expensive than using an index. Why not test it? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Mon, 29 Mar 2010, Robert Haas wrote: On Thu, Mar 25, 2010 at 3:57 PM, Faheem Mitha fah...@email.unc.edu wrote: Hi everyone, I've been trying to reduce both memory usage and runtime for a query. Comments/suggestions gratefully received. Details are at http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf See particularly Section 1 - Background and Discussion. If you want a text version, see http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex For background see http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf Please CC any replies to me at the above email address. Thanks. Didn't you (or someone) post about these queries before? I did write to the list about an earlier version of these queries, yes. In fact you replied to that message. It's not really too clear to me from reading this what specific questions you're trying to answer. Quote from opt.{tex/pdf}, Section 1: If I have to I can use Section~\ref{ped_hybrid} and Section~\ref{tped_hybrid}, but I am left wondering why I get the performance I do out of the earlier versions. Specifically, why is Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and why does the memory usage in Section~\ref{ped_phenoout} blow up relative to Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}? One random thought: WHERE row_number() = 1 is not too efficient. Try using LIMIT or DISTINCT ON instead. Possibly. However, the CTE that uses WHERE row_number() = 1 doesn't dominate the runtime or memory usage, so I'm not too concerned about it. If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Mon, 29 Mar 2010, Robert Haas wrote: On Mon, Mar 29, 2010 at 2:31 PM, Faheem Mitha fah...@email.unc.edu wrote: It's not really too clear to me from reading this what specific questions you're trying to answer. Quote from opt.{tex/pdf}, Section 1: If I have to I can use Section~\ref{ped_hybrid} and Section~\ref{tped_hybrid}, but I am left wondering why I get the performance I do out of the earlier versions. Specifically, why is Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and why does the memory usage in Section~\ref{ped_phenoout} blow up relative to Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}? Here and in the document, you refer to section numbers for the hybrid version but I don't see where you define what the hybrid version actually is. It is defined later in the file. I don't know if you are looking at the pdf, but if so, it is Section 2.4 (for the hybrid PED query). In the text file, I guess the easist way would be to grep for the label ped_hybrid. And the differences between your queries are not real clear either - first you say you took out pheno and sex because they weren't necessary, but then you decide to put them back. I don't know what that means. If they're not necessary, leave them out. I don't see where I say that pheno and sex weren't necessary. In fact, the word 'necessary' does not appear in the opt document. I took them out to see how it would affect performance. Which is does, dramatically. I say So, I decided to remove the joins to tables corresponding to the patient data, namely pheno and sex, and the runtime dropped to 150 min, while the memory stayed around 5G. Maybe I wasn't being sufficiently explicit here. Perhaps So, I decided to remove the joins to tables corresponding to the patient data, namely pheno and sex, to see how it would affect performance... would have been better. One random thought: WHERE row_number() = 1 is not too efficient. Try using LIMIT or DISTINCT ON instead. Possibly. However, the CTE that uses WHERE row_number() = 1 doesn't dominate the runtime or memory usage, so I'm not too concerned about it. Hmm, you might be right. If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? Well each sort or hash can use an amount of memory that is limited from above by work_mem. So if you write the query in a way that involves more sorts or hashes, each one can add up to 1GB to your memory usage, plus overhead. However, it doesn't look like any of your queries including 30 sorts or hashes, so I'm thinking that the RSS number probably also includes some of the shared memory that has been mapped into each backend's address space. RSS is not a terribly reliable number when dealing with shared memory; it's hard to say what that really means. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. Well, a hash join is not usually the first thing that pops to mind when dealing with a table that has 825 million rows (geno). I don't know if a nested loop with inner-indexscan would be faster, but it would almost certainly use less memory. Can you provide an illustration of what you mean? I don't know what a nested loop with inner-indexscan is in this context. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] experiments in query optimization
Hi everyone, I've been trying to reduce both memory usage and runtime for a query. Comments/suggestions gratefully received. Details are at http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf See particularly Section 1 - Background and Discussion. If you want a text version, see http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex For background see http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf Please CC any replies to me at the above email address. Thanks. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query optimization
On Wed, 25 Nov 2009, Robert Haas wrote: On Wed, Nov 25, 2009 at 5:54 PM, Faheem Mitha fah...@email.unc.edu wrote: Well, any method of DISTINCT-ifying is likely to be somewhat slow, but I've had good luck with SELECT DISTINCT ON (...) in the past, as compared with other methods. YMMV - the only way to find out is to benchmark it. I don't think it's non-deterministic if you order by the DISTINCT-ON columns and enough extras to break any ties - you should get the first one of each set. Right, but adding enough extras to break ties is up to the user, and the language doesn't guarantee anything, so it feels more fragile. If not, you might want to look at some way of pre-marking the non-duplicate rows so that you don't have to recompute that each time. What are the options re pre-marking? Well, what I usually do is - if I'm going to do the same distinct-ification frequently, I add an extra column (say, a boolean) and set it to true for all and only those rows which will pass the distinct-ification filter. Then I can just say WHERE that column name. Yes, I see. The problem with is premarking is that the selection is somewhat dynamic, in the sense that this depends on the idlink table, which depends on patient data, which can change. Then you might be able to use the underlying table directly in the next CTE, which will usually permit better optimization, more use of indices, etc. It seems pretty unfortunate that dedup_patient_anno joins against geno and then patient_geno does what appears to be the same join again. Is there some way to eliminate that? If so it will probably help. You don't say whether you are looking at the PED or TPED query, so I'll assume PED. They are similar anyway. I see your point re the joins. You mean anno INNER JOIN geno followed by geno INNER JOIN dedup_patient_anno ? I think the point of the first join is to reduce the anno table based on information from the geno table. The result is basically a subset of the anno table with some potential duplication removed, which is then re-joined to the geno table. I agree this seems a bit suboptimal, and there might be a better way to do this. Yeah, I didn't think about it in detail, but it looks like it should be possible. Eliminating joins can sometimes have *dramatic* effects on query performance, and it never hurts. Failing all else, couldn't I smoosh together the two queries and do a triple join? For reference, the two CTEs in question, from the PED query, are as follows. dedup_patient_anno AS ( SELECT * FROM (SELECT *, row_number() OVER(PARTITION BY anno.rsid ORDER BY anno.id) FROM anno INNER JOIN geno ON anno.id = geno.anno_id WHEREidlink_id= (SELECT MIN(id) FROMidlink ) ) AS s WHERE row_number = '1' ), patient_geno AS ( SELECT geno.idlink_id AS idlink_id, geno.anno_id AS anno_id, geno.snpval_id AS snpval_id, allelea_id, alleleb_id FROMgeno INNER JOIN dedup_patient_anno ON geno.anno_id = dedup_patient_anno.id ), Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query optimization
Hi Robert, Thanks very much for your suggestions. On Wed, 25 Nov 2009, Robert Haas wrote: On Mon, Nov 23, 2009 at 5:47 PM, Faheem Mitha fah...@email.unc.edu wrote: Hi everybody, I've got two queries that needs optimizing. Actually, there are others, but these are pretty representative. You can see the queries and the corresponding plans at http://bulldog.duhs.duke.edu/~faheem/snpdb/opt.pdf or http://bulldog.duhs.duke.edu/~faheem/snpdb/opt.tex if you prefer text (latex file, effectively text in this case) The background to this is at http://bulldog.duhs.duke.edu/~faheem/snpdb/diag.pdf If more details are required, let me know and I can add them. I'd appreciate suggestions about how to make these queries go faster. Please CC this email address on any replies. I've found that a good way to approach optimizing queries of this type is to look at the EXPLAIN ANALYZE results and figure out which parts of the query are slow. Then simplify the rest of the query as much as possible without eliminating the slowness. Then try to figure out how to optimize the simplified query: rewrite the logic, add indices, change the schema, etc. Lastly start adding the other bits back in. Good strategy. Now I just have to understand EXPLAIN ANALYZE well enough to figure out which bits are slow. :-) It looks like the dedup_patient_anno CTE is part of your problem. Try pulling that piece out and optimizing it separately. I wonder if that could be rewritten to use SELECT DISTINCT ON (...) and whether that would be any faster. Isn't SELECT DISTINCT supposed to be evil, since in general the result is not deterministic? I think I had SELECT DISTINCT earlier, and removed it because of that, with the help of Andrew (RhodiumToad on #postgresql) I didn't compare the corresponding subqueries separately, so don't know what speed difference this made. If not, you might want to look at some way of pre-marking the non-duplicate rows so that you don't have to recompute that each time. What are the options re pre-marking? Then you might be able to use the underlying table directly in the next CTE, which will usually permit better optimization, more use of indices, etc. It seems pretty unfortunate that dedup_patient_anno joins against geno and then patient_geno does what appears to be the same join again. Is there some way to eliminate that? If so it will probably help. You don't say whether you are looking at the PED or TPED query, so I'll assume PED. They are similar anyway. I see your point re the joins. You mean anno INNER JOIN geno followed by geno INNER JOIN dedup_patient_anno ? I think the point of the first join is to reduce the anno table based on information from the geno table. The result is basically a subset of the anno table with some potential duplication removed, which is then re-joined to the geno table. I agree this seems a bit suboptimal, and there might be a better way to do this. Once you've got those parts of the query as well-optimized as you can, add the next pieces in and start hacking on those. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] query optimization
Hi everybody, I've got two queries that needs optimizing. Actually, there are others, but these are pretty representative. You can see the queries and the corresponding plans at http://bulldog.duhs.duke.edu/~faheem/snpdb/opt.pdf or http://bulldog.duhs.duke.edu/~faheem/snpdb/opt.tex if you prefer text (latex file, effectively text in this case) The background to this is at http://bulldog.duhs.duke.edu/~faheem/snpdb/diag.pdf If more details are required, let me know and I can add them. I'd appreciate suggestions about how to make these queries go faster. Please CC this email address on any replies. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query optimization
On Mon, 23 Nov 2009, Thom Brown wrote: Hi Faheem, There appears to be a discrepancy between the 2 PDFs you provided. One says you're using PostgreSQL 8.3, and the other shows you using common table expressions, which are only available in 8.4+. Yes, sorry. I'm using Postgresql 8.4. I guess I should go through diag.pdf and make sure all the information is current. Thanks for pointing out my error. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query optimization
On Tue, 24 Nov 2009, Sebastian Jörgensen wrote: How often are the tables you query from updated? Quite rarely. Once in a while. The large tables, eg. geno, are basically static. Regards, Faheem. Rgds Sebastian On Tue, Nov 24, 2009 at 12:52 AM, marcin mank marcin.m...@gmail.com wrote: On Tue, Nov 24, 2009 at 12:49 AM, Faheem Mitha fah...@email.unc.edu wrote: Yes, sorry. I'm using Postgresql 8.4. I guess I should go through diag.pdf and make sure all the information is current. Thanks for pointing out my error. excellent report! about the copy problem: You seem to have created the primary key before doing the copy (at least that`s what the dump before copy says). This is bad. Create it after the copy. Greetings Marcin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance