Re: [PERFORM] Why Wal_buffer is 64KB
Hi All, Thank you for all the support. I have noticed one more thing here, that if you turn off the fsync and try to run the transaction than its breaking the currnet filenode and generating another filenode. Is it true that whenever you turn off or on the fsync the filenode will break and create one more on that table. Regards Raghavendra On Fri, Mar 26, 2010 at 7:30 PM, Scott Marlowe scott.marl...@gmail.comwrote: On Fri, Mar 26, 2010 at 7:43 AM, Pierre C li...@peufeu.com wrote: After fsync/syncronous_commit off Do not use fsync off, it is not safe. Who cares about the performance of fsync=off, when in practice you'd never use it with real data. synchronnous_commit=off is fine for some applications, though. There are situations where it's ok, when all the data are reproduceable from other sources, etc. for instance I have a reporting server that is a slony slave that runs with fsync off. If it does crash and I can recreate the node in an hour or so and be back online. With fsync off the machine is too slow to do its job, and it's not the primary repo of the real data, so it's ok there.
Re: [PERFORM] Why Wal_buffer is 64KB
On Mon, Mar 29, 2010 at 12:00 AM, Tadipathri Raghu traghu@gmail.com wrote: Hi All, Thank you for all the support. I have noticed one more thing here, that if you turn off the fsync and try to run the transaction than its breaking the currnet filenode and generating another filenode. Is it true that whenever you turn off or on the fsync the filenode will break and create one more on that table. From what I understand, with fsync on or off the same stuff gets written. It's just not guaranteed to go out in the right order or right now, but eventually. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Wal_buffer is 64KB
Hi Scott, Yes, May i know any particular reason for behaving this. Are its looking for any consistency. I havnt got any clear picture here. Could you Please explain this.. Thanks Regards Raghavendra On Mon, Mar 29, 2010 at 12:15 PM, Scott Marlowe scott.marl...@gmail.comwrote: On Mon, Mar 29, 2010 at 12:00 AM, Tadipathri Raghu traghu@gmail.com wrote: Hi All, Thank you for all the support. I have noticed one more thing here, that if you turn off the fsync and try to run the transaction than its breaking the currnet filenode and generating another filenode. Is it true that whenever you turn off or on the fsync the filenode will break and create one more on that table. From what I understand, with fsync on or off the same stuff gets written. It's just not guaranteed to go out in the right order or right now, but eventually.
Re: [PERFORM] Optimizer showing wrong rows in plan
On Mon, 29 Mar 2010, Tadipathri Raghu wrote: As per the documentation, one page is 8kb, when i create a table with int as one column its 4 bytes. If i insert 2000 rows, it should be in one page only as its 8kb, but its extending vastly as expected. Example shown below, taking the previous example table test with one column. There is more to a row than just the single int column. The space used by a column will include a column start marker (data length), transaction ids, hint bits, an oid, a description of the types of the columns, and finally your data columns. That takes a bit more space. Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizer showing wrong rows in plan
Hi Mattew, Thank you for the information. Once again, I like to thank each and everyone in this thread for there ultimate support. Regards Raghavendra On Mon, Mar 29, 2010 at 4:47 PM, Matthew Wakeling matt...@flymine.orgwrote: On Mon, 29 Mar 2010, Tadipathri Raghu wrote: As per the documentation, one page is 8kb, when i create a table with int as one column its 4 bytes. If i insert 2000 rows, it should be in one page only as its 8kb, but its extending vastly as expected. Example shown below, taking the previous example table test with one column. There is more to a row than just the single int column. The space used by a column will include a column start marker (data length), transaction ids, hint bits, an oid, a description of the types of the columns, and finally your data columns. That takes a bit more space. Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizer showing wrong rows in plan
See http://www.postgresql.org/docs/current/static/storage-page-layout.html for all of what is taking up the space. Short version: Per block overhead is 24 bytes Per row overhead is 23 bytes + some alignment loss + the null bitmap if you have nullable columns On Mon, Mar 29, 2010 at 8:24 AM, raghavendra t raagavendra@gmail.comwrote: Hi Mattew, Thank you for the information. Once again, I like to thank each and everyone in this thread for there ultimate support. Regards Raghavendra On Mon, Mar 29, 2010 at 4:47 PM, Matthew Wakeling matt...@flymine.orgwrote: On Mon, 29 Mar 2010, Tadipathri Raghu wrote: As per the documentation, one page is 8kb, when i create a table with int as one column its 4 bytes. If i insert 2000 rows, it should be in one page only as its 8kb, but its extending vastly as expected. Example shown below, taking the previous example table test with one column. There is more to a row than just the single int column. The space used by a column will include a column start marker (data length), transaction ids, hint bits, an oid, a description of the types of the columns, and finally your data columns. That takes a bit more space. Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How much memory is PostgreSQL using
PostgreSQL 8.4.3 OS: Linux Red Hat 4.x I changed my strategy with PostgreSQL recently to use a large segment of memory for shared buffers with the idea of caching disk blocks. How can I see how much memory PostgreSQL is using for this? I tried: ps aux | grep post | sort -k4 This lists the processes using memory at the bottom. Are they sharing memory or using individual their own blocks of memory? When I do top I see that VIRT is the value of my shared buffers plus a tiny bit. I see %MEM is only 2.4%, 2.6%, 1.0%,1.5%, and 1.1% for all of the running processes. Do I add these percentages up to see what amount of VIRT I am really using? Or is there some way to ask PostgreSQL how much memory are you using to cache disk blocks currently? When you do a PG_DUMP does PostgreSQL put the disk blocks into shared buffers as it runs? Thanks, Lance Campbell Software Architect/DBA/Project Manager Web Services at Public Affairs 217-333-0382
[PERFORM] Performance regarding LIKE searches
Hi, I am querying a Postgresql 8.3 database table that has approximately 22 million records. The (explain analyze) query is listed below: gdr_gbrowse_live= explain analyze SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name as n WHERE (n.id=f.id AND lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' AND n.display_name0); QUERY PLAN Nested Loop (cost=0.01..5899.93 rows=734 width=884) (actual time=0.033..0.033 rows=0 loops=1) - Index Scan using name_name_lower_pattern_ops_idx on name n (cost=0.01..9.53 rows=734 width=4) (actual time=0.032..0.032 rows=0 loops=1) Index Cond: ((lower((name)::text) ~=~ 'Scaffold:scaffold'::text) AND (lower((name)::text) ~~ 'Scaffold:scaffole'::text)) Filter: ((display_name 0) AND (lower((name)::text) ~~ 'Scaffold:scaffold_163:1000..1199%'::text)) - Index Scan using feature_pkey on feature f (cost=0.00..8.01 rows=1 width=884) (never executed) Index Cond: (f.id = n.id) Total runtime: 0.119 ms (7 rows) I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fast and performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow query log this: 2010-03-29 09:34:38.083 PDT,gdr_gbrowse_live,gdr_gbrowse_live,11649,10.0.0.235:59043,4bb0399d.2d81,8,SELECT,2010-03-28 22:24:45 PDT,4/118607,0,LOG,0,duration: 21467.467 ms execute dbdpg_p25965_9: SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name as n WHERE (n.id=f.id AND lower(n.name) LIKE $1) ,parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%',,, GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long. Does anyone have any ideas why the query duration is so different? Randall Svancara Systems Administrator/DBA/Developer Main Bioinformatics Laboratory -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How much memory is PostgreSQL using
Campbell, Lance wrote: Or is there some way to ask PostgreSQL how much memory are you using to cache disk blocks currently? You can install contrib/pg_buffercache into each database and count how many used blocks are there. Note that running queries using that diagnostic tool is really intensive due to the locks it takes, so be careful not to do that often on a production system. When you do a PG_DUMP does PostgreSQL put the disk blocks into shared buffers as it runs? To some extent. Most pg_dump activity involves sequential scans that are reading an entire table. Those are no different from any other process that will put disk blocks into shared_buffers. However, that usage pattern makes pg_dump particularly likely to run into an optimization in 8.3 and later that limits how much of shared_buffers is used when sequentially scanning a large table. See P10 of http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf for the exact implementation. Basically, anything bigger than shared_buffers / 4 uses a 256K ring to limit its cache use, but it's a little more complicated than that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance regarding LIKE searches
randa...@bioinfo.wsu.edu writes: I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fast and performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow query log this: 2010-03-29 09:34:38.083 PDT,gdr_gbrowse_live,gdr_gbrowse_live,11649,10.0.0.235:59043,4bb0399d.2d81,8,SELECT,2010-03-28 22:24:45 PDT,4/118607,0,LOG,0,duration: 21467.467 ms execute dbdpg_p25965_9: SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name as n WHERE (n.id=f.id AND lower(n.name) LIKE $1) ,parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%',,, GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long. Does anyone have any ideas why the query duration is so different? You're not going to get an index optimization when the LIKE pattern isn't a constant (and left-anchored, but this is). It is possible to get the planner to treat a query parameter as a constant (implying a re-plan on each execution instead of having a cached plan). I believe what you have to do at the moment is use unnamed rather than named prepared statements. The practicality of this would depend a lot on your client-side software stack, which you didn't mention. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance regarding LIKE searches
Tom, We are using perl 5.10 with postgresql DBD. Can you point me in the right direction in terms of unamed and named prepared statements? Thanks, Randall Svancara Systems Administrator/DBA/Developer Main Bioinformatics Laboratory - Original Message - From: Tom Lane t...@sss.pgh.pa.us To: randa...@bioinfo.wsu.edu Cc: pgsql-performance@postgresql.org Sent: Monday, March 29, 2010 10:00:03 AM Subject: Re: [PERFORM] Performance regarding LIKE searches randa...@bioinfo.wsu.edu writes: I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fast and performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow query log this: 2010-03-29 09:34:38.083 PDT,gdr_gbrowse_live,gdr_gbrowse_live,11649,10.0.0.235:59043,4bb0399d.2d81,8,SELECT,2010-03-28 22:24:45 PDT,4/118607,0,LOG,0,duration: 21467.467 ms execute dbdpg_p25965_9: SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name as n WHERE (n.id=f.id AND lower(n.name) LIKE $1) ,parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%',,, GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long. Does anyone have any ideas why the query duration is so different? You're not going to get an index optimization when the LIKE pattern isn't a constant (and left-anchored, but this is). It is possible to get the planner to treat a query parameter as a constant (implying a re-plan on each execution instead of having a cached plan). I believe what you have to do at the moment is use unnamed rather than named prepared statements. The practicality of this would depend a lot on your client-side software stack, which you didn't mention. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance regarding LIKE searches
On 3/29/2010 12:23 PM, randa...@bioinfo.wsu.edu wrote: Tom, We are using perl 5.10 with postgresql DBD. Can you point me in the right direction in terms of unamed and named prepared statements? Thanks, Randall Svancara Systems Administrator/DBA/Developer Main Bioinformatics Laboratory - Original Message - From: Tom Lanet...@sss.pgh.pa.us To: randa...@bioinfo.wsu.edu Cc: pgsql-performance@postgresql.org Sent: Monday, March 29, 2010 10:00:03 AM Subject: Re: [PERFORM] Performance regarding LIKE searches randa...@bioinfo.wsu.edu writes: I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fast and performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow query log this: 2010-03-29 09:34:38.083 PDT,gdr_gbrowse_live,gdr_gbrowse_live,11649,10.0.0.235:59043,4bb0399d.2d81,8,SELECT,2010-03-28 22:24:45 PDT,4/118607,0,LOG,0,duration: 21467.467 ms execute dbdpg_p25965_9: SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name as n WHERE (n.id=f.id AND lower(n.name) LIKE $1) ,parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%',,, GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long. Does anyone have any ideas why the query duration is so different? You're not going to get an index optimization when the LIKE pattern isn't a constant (and left-anchored, but this is). It is possible to get the planner to treat a query parameter as a constant (implying a re-plan on each execution instead of having a cached plan). I believe what you have to do at the moment is use unnamed rather than named prepared statements. The practicality of this would depend a lot on your client-side software stack, which you didn't mention. regards, tom lane I'm just going to guess, but DBD::Pg can do real prepare or fake prepare. It does real by default. Try setting: $dbh-{pg_server_prepare} = 0; before you prepare/run that statement and see if it makes a difference. http://search.cpan.org/dist/DBD-Pg/Pg.pm#prepare -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Thu, Mar 25, 2010 at 3:57 PM, Faheem Mitha fah...@email.unc.edu wrote: Hi everyone, I've been trying to reduce both memory usage and runtime for a query. Comments/suggestions gratefully received. Details are at http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf See particularly Section 1 - Background and Discussion. If you want a text version, see http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex For background see http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf Please CC any replies to me at the above email address. Thanks. Didn't you (or someone) post about these queries before? It's not really too clear to me from reading this what specific questions you're trying to answer. One random thought: WHERE row_number() = 1 is not too efficient. Try using LIMIT or DISTINCT ON instead. If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. You might need to create some indices, too. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Mon, 29 Mar 2010, Robert Haas wrote: On Thu, Mar 25, 2010 at 3:57 PM, Faheem Mitha fah...@email.unc.edu wrote: Hi everyone, I've been trying to reduce both memory usage and runtime for a query. Comments/suggestions gratefully received. Details are at http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf See particularly Section 1 - Background and Discussion. If you want a text version, see http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex For background see http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf Please CC any replies to me at the above email address. Thanks. Didn't you (or someone) post about these queries before? I did write to the list about an earlier version of these queries, yes. In fact you replied to that message. It's not really too clear to me from reading this what specific questions you're trying to answer. Quote from opt.{tex/pdf}, Section 1: If I have to I can use Section~\ref{ped_hybrid} and Section~\ref{tped_hybrid}, but I am left wondering why I get the performance I do out of the earlier versions. Specifically, why is Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and why does the memory usage in Section~\ref{ped_phenoout} blow up relative to Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}? One random thought: WHERE row_number() = 1 is not too efficient. Try using LIMIT or DISTINCT ON instead. Possibly. However, the CTE that uses WHERE row_number() = 1 doesn't dominate the runtime or memory usage, so I'm not too concerned about it. If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Mon, Mar 29, 2010 at 2:31 PM, Faheem Mitha fah...@email.unc.edu wrote: It's not really too clear to me from reading this what specific questions you're trying to answer. Quote from opt.{tex/pdf}, Section 1: If I have to I can use Section~\ref{ped_hybrid} and Section~\ref{tped_hybrid}, but I am left wondering why I get the performance I do out of the earlier versions. Specifically, why is Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and why does the memory usage in Section~\ref{ped_phenoout} blow up relative to Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}? Here and in the document, you refer to section numbers for the hybrid version but I don't see where you define what the hybrid version actually is. And the differences between your queries are not real clear either - first you say you took out pheno and sex because they weren't necessary, but then you decide to put them back. I don't know what that means. If they're not necessary, leave them out. One random thought: WHERE row_number() = 1 is not too efficient. Try using LIMIT or DISTINCT ON instead. Possibly. However, the CTE that uses WHERE row_number() = 1 doesn't dominate the runtime or memory usage, so I'm not too concerned about it. Hmm, you might be right. If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? Well each sort or hash can use an amount of memory that is limited from above by work_mem. So if you write the query in a way that involves more sorts or hashes, each one can add up to 1GB to your memory usage, plus overhead. However, it doesn't look like any of your queries including 30 sorts or hashes, so I'm thinking that the RSS number probably also includes some of the shared memory that has been mapped into each backend's address space. RSS is not a terribly reliable number when dealing with shared memory; it's hard to say what that really means. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. Well, a hash join is not usually the first thing that pops to mind when dealing with a table that has 825 million rows (geno). I don't know if a nested loop with inner-indexscan would be faster, but it would almost certainly use less memory. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Mon, 29 Mar 2010, Robert Haas wrote: On Mon, Mar 29, 2010 at 2:31 PM, Faheem Mitha fah...@email.unc.edu wrote: It's not really too clear to me from reading this what specific questions you're trying to answer. Quote from opt.{tex/pdf}, Section 1: If I have to I can use Section~\ref{ped_hybrid} and Section~\ref{tped_hybrid}, but I am left wondering why I get the performance I do out of the earlier versions. Specifically, why is Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and why does the memory usage in Section~\ref{ped_phenoout} blow up relative to Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}? Here and in the document, you refer to section numbers for the hybrid version but I don't see where you define what the hybrid version actually is. It is defined later in the file. I don't know if you are looking at the pdf, but if so, it is Section 2.4 (for the hybrid PED query). In the text file, I guess the easist way would be to grep for the label ped_hybrid. And the differences between your queries are not real clear either - first you say you took out pheno and sex because they weren't necessary, but then you decide to put them back. I don't know what that means. If they're not necessary, leave them out. I don't see where I say that pheno and sex weren't necessary. In fact, the word 'necessary' does not appear in the opt document. I took them out to see how it would affect performance. Which is does, dramatically. I say So, I decided to remove the joins to tables corresponding to the patient data, namely pheno and sex, and the runtime dropped to 150 min, while the memory stayed around 5G. Maybe I wasn't being sufficiently explicit here. Perhaps So, I decided to remove the joins to tables corresponding to the patient data, namely pheno and sex, to see how it would affect performance... would have been better. One random thought: WHERE row_number() = 1 is not too efficient. Try using LIMIT or DISTINCT ON instead. Possibly. However, the CTE that uses WHERE row_number() = 1 doesn't dominate the runtime or memory usage, so I'm not too concerned about it. Hmm, you might be right. If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? Well each sort or hash can use an amount of memory that is limited from above by work_mem. So if you write the query in a way that involves more sorts or hashes, each one can add up to 1GB to your memory usage, plus overhead. However, it doesn't look like any of your queries including 30 sorts or hashes, so I'm thinking that the RSS number probably also includes some of the shared memory that has been mapped into each backend's address space. RSS is not a terribly reliable number when dealing with shared memory; it's hard to say what that really means. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. Well, a hash join is not usually the first thing that pops to mind when dealing with a table that has 825 million rows (geno). I don't know if a nested loop with inner-indexscan would be faster, but it would almost certainly use less memory. Can you provide an illustration of what you mean? I don't know what a nested loop with inner-indexscan is in this context. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance