Re: [PERFORM] Pgsql - Red Hat Linux - VS MySQL VS MSSQL
On Sun, Jul 13, 2003 at 12:51:02PM -0700, Balazs Wellisch wrote: Alternatively, you simply compile 7.3.3 from source. I've upgraded most my machines that way. Unfortunatelly, compiling from source is not really an option for us. We use RPMs only to ease the installation and upgrade process. We have over a hundred servers to maintaine and having to compile and recompile software everytime a new release comes out would be way too much work. If you aren't settled on OS yet, take a look at FreeBSD, or one of the linuxes that have better app management. Keeping pgsql up-to-date using ports on FreeBSD is pretty painless (for that matter, so is keeping the OS itself up-to-date). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Odd explain estimate
On Thu, Jul 31, 2003 at 04:59:21PM -0400, Andrew Sullivan wrote: On Thu, Jul 31, 2003 at 02:51:45PM -0500, Jim C. Nasby wrote: If you really needed to set enable_seqscan=false (did you really? Are you sure that's not the cheapest way?), you might want to investigate expainding the statistics on the indexed column, increasing the correlation through clustering, and other such tricks. Well, if I don't do this it wants to seqscan a table that occupies 350k pages, instead of pulling a couple thousand rows. I started running it with the seqscan and it's already taken way longer than it does if I disable seqscan. I guess I'll try expanding the statistics. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Odd explain estimate
On Fri, Aug 01, 2003 at 08:16:12AM -0400, Andrew Sullivan wrote: On Thu, Jul 31, 2003 at 05:59:59PM -0500, Jim C. Nasby wrote: Well, if I don't do this it wants to seqscan a table that occupies 350k pages, instead of pulling a couple thousand rows. I started running it with the seqscan and it's already taken way longer than it does if I disable seqscan. That was indeed the question. If it uses a seqscan when it ought not to do, then there's something wrong with the statistics, or you haven't vacuum analysed correctly, or your table needs vacuum full (is it really 350k pages, or is that mostly dead space?), c. -- all the usual bad-seqscan candidates. enable_seqscan=off is probably not a good strategy for any moderately complicated query. If the planner were perfect, of course, you'd never need it at all. Set statistics on the ID colum to 1000, vacuum analyze, and it's good to go now. Thanks for your help! -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Poor pg_dump performance
First, sorry if this has been answered before; the list search seems to be down... This is on a quad Xeon-PII 450 machine running FBSD 4.8. 84386 pgsql 64 0 104M99M RUN1 78:20 61.87% 61.87% postgres 84385 decibel 64 0 3748K 2268K CPU1 3 49:49 37.79% 37.79% pg_dump (note that the CPU percents are per-cpu, so 100% would be 100% of one CPU) According to vmstat, there's very little disk I/O, so that's not a bottleneck. The command I used was: pg_dump -vFc -f pgsql-20030906.cdb stats It should be compressing, but if that was the bottleneck, shouldn't the pg_dump process be at 100% CPU? It does seem a bit coincidental that the two procs seem to be taking 100% of one CPU (top shows them running on different CPUs though). This is version 7.3.4. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] possible improvement between G4 and G5
On Tue, Apr 06, 2004 at 01:47:22AM -0400, Tom Lane wrote: Qing Zhao [EMAIL PROTECTED] writes: We have got a G5 64-bit processor to replace an old G4 32-bit processor. Given everything else equal, should we see a big improvement on PG's performance? Nope. Database performance typically depends on disk performance first, and RAM size second. A 64-bit processor might help by allowing you to install more RAM, but you didn't say that you had. Memory bandwidth is a consideration too, so you might see some performance improvements on a G5. We recently debated between Xeons and Opterons in a new PGSQL server and a little poking around on the lists indicated that the Opterons did perform better, presumably due to the increased memory bandwidth. Incidentally, this is why you need about 2x the CPUs on Sun hardware vs RS6000 hardware for database stuff (and that gets expensive if you're paying per CPU!). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Poor performance of group by query
Anyone have any ideas why this query would be so slow? stats=# explain analyze SELECT work_units, min(raw_rank) AS rank FROM Trank_work_overall GROUP BY work_units; QUERY PLAN --- HashAggregate (cost=1050.12..1085.98 rows=14347 width=16) (actual time=163149.981..163227.758 rows=17849 loops=1) - Seq Scan on trank_work_overall (cost=0.00..804.41 rows=49141 width=16) (actual time=0.071..328.682 rows=49091 loops=1) Total runtime: 163296.212 ms (3 rows) stats=# \d Trank_work_overall Table pg_temp_1.trank_work_overall Column | Type | Modifiers ++--- raw_rank | bigint | work_units | bigint | stats=# FreeBSD fritz.distributed.net 5.2.1-RELEASE FreeBSD 5.2.1-RELEASE #1: Wed Apr 7 18:42:52 CDT 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/FRITZ amd64 The machine is a dual opteron with 4G of memory. The query in question was not hitting the disk at all. PostgreSQL 7.4.2 compiled with -O3. Also, if I set enable_hashagg = false, it runs in less than a second. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Horribly slow hash join
Note the time for the hash join step: -- Hash Join (cost=357.62..26677.99 rows=93668 width=62) (actual time=741.159..443381.011 rows=49091 loops=1) Hash Cond: (outer.work_today = inner.work_units) - Hash Join (cost=337.11..24784.11 rows=93668 width=54) (actual time=731.374..417188.519 rows=49091 loops=1) Hash Cond: (outer.work_total = inner.work_units) - Seq Scan on email_rank (cost=0.00..22240.04 rows=254056 width=46) (actual time=582.145..1627.759 rows=49091 loops=1) Filter: (project_id = 8) - Hash (cost=292.49..292.49 rows=17849 width=16) (actual time=148.944..148.944 rows=0 loops=1) - Seq Scan on rank_tie_overall o (cost=0.00..292.49 rows=17849 width=16) (actual time=0.059..75.984 rows=17849 loops=1) - Hash (cost=17.81..17.81 rows=1081 width=16) (actual time=8.996..8.996 rows=0 loops=1) - Seq Scan on rank_tie_today d (cost=0.00..17.81 rows=1081 width=16) (actual time=0.080..4.635 rows=1081 loops=1) Total runtime: 619047.032 ms By comparison: stats=# set enable_hashjoin=false; SET stats=# explain analyze select * from email_rank, rank_tie_overall o, rank_tie_today d WHERE email_rank.work_today = d.work_units AND email_rank.work_total = o.work_units AND email_rank.project_id = :ProjectID; QUERY PLAN - Merge Join (cost=55391.69..56823.23 rows=93668 width=80) (actual time=2705.344..3349.318 rows=49091 loops=1) Merge Cond: (outer.work_units = inner.work_today) - Index Scan using work_units_today on rank_tie_today d (cost=0.00..23.89 rows=1081 width=16) (actual time=0.150..4.874 rows=1081 loops=1) - Sort (cost=55391.69..55625.86 rows=93668 width=64) (actual time=2705.153..2888.039 rows=49091 loops=1) Sort Key: email_rank.work_today - Merge Join (cost=45047.64..47656.93 rows=93668 width=64) (actual time=1685.414..2494.342 rows=49091 loops=1) Merge Cond: (outer.work_units = inner.work_total) - Index Scan using work_units_overall on rank_tie_overall o (cost=0.00..361.34 rows=17849 width=16) (actual time=0.122..79.383 rows=17849 loops=1) - Sort (cost=45047.64..45682.78 rows=254056 width=48) (actual time=1685.228..1866.215 rows=49091 loops=1) Sort Key: email_rank.work_total - Seq Scan on email_rank (cost=0.00..22240.04 rows=254056 width=48) (actual time=786.515..1289.101 rows=49091 loops=1) Filter: (project_id = 8) Total runtime: 3548.087 ms Even though the second case is only a select, it seems clear that something's wrong... -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Horribly slow hash join
Dammit, I somehow deleted a bunch of replies to this. Did a TODO ever come out of this? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?
On Thu, Aug 26, 2004 at 12:04:48PM -0700, J. Andrew Rogers wrote: The major caveat to having tables of this type is that you can only have a primary key index. No other indexes are possible because the heap constantly undergoes local reorganizations if you have a lot of write traffic, the same kind of reorganization you would normally expect in a BTree index. This isn't true, at least in 9i. You can create whatever indexes you want on an index-organized table. I believe that the index stores the PK value instead of the ROWID. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?
On Thu, Aug 26, 2004 at 11:39:42PM -0400, Greg Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: Updated TODO item: o Automatically maintain clustering on a table This would require some background daemon to maintain clustering during periods of low usage. It might also require tables to be only paritally filled for easier reorganization. It also might require creating a merged heap/index data file so an index lookup would automatically access the heap data too. Fwiw, I would say the first would is also a might. None of the previous discussions here presumed a maintenance daemon. The discussions before talked about a mechanism to try to place new tuples as close as possible to the proper index position. I would also suggest making some distinction between a cluster system similar to what we have now but improved to maintain the clustering continuously, and an actual index-organized-table where the tuples are actually only stored in a btree structure. They're two different approaches to similar problems. But they might both be useful to have, and have markedly different implementation details. There's a third approach that I think is worth considering. Half of the benefit to clustered tables is limiting the number of pages you need to access when scanning the primary key. The order of tuples in the pages themselves isn't nearly as important as ordering of the pages. This means you can get most of the benefit of an index-organized table just by being careful about what page you place a tuple on. What I'm thinking of is some means to ensure all the tuples on a page are within some PK range, but not worrying about the exact order within the page since it's relatively cheap to scan through the page in memory. Some pros: This would probably mean less change to the code that inserts tuples. No need for a background daemon. No need to create a new B-Tree table structure. Ideally, there won't be need to move tuples around, which should mean that current indexing code doesn't need to change. Cons: Need to have some way to deal with pages that fill up. To gain full benefit some means of indicating what range of PK values are on a page might be needed. It's not as beneficial as a true IOT since you don't get the benefit of storing your tuples inline with your B-Tree. I'm sure there's a ton of things I'm missing, especially since I'm not familiar with the postgresql code, but hopefully others can explore this further. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
On Mon, Sep 13, 2004 at 11:07:35PM +0100, Simon Riggs wrote: PostgreSQL's functionality is in many ways similar to Oracle Partitioning. Loading up your data in many similar tables, then creating a view like: CREATE VIEW BIGTABLE (idate, col1, col2, col3...) AS SELECT 200409130800, col1, col2, col3... FROM table200409130800 UNION ALL SELECT 200409131000, col1, col2, col3... FROM table200409131000 UNION ALL SELECT 200409131200, col1, col2, col3... FROM table200409131200 ...etc... will allow the PostgreSQL optimizer to eliminate partitions from the query when you run queries which include a predicate on the partitioning_col, e.g. select count(*) from bigtable where idate = 200409131000 will scan the last two partitions only... There are a few other ways of creating the view that return the same answer, but only using constants in that way will allow the partitions to be eliminated from the query, and so run for much longer. Is there by any chance a set of functions to manage adding and removing partitions? Certainly this can be done by hand, but having a set of tools would make life much easier. I just looked but didn't see anything on GBorg. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioning
On Wed, Sep 15, 2004 at 02:09:31PM -0700, J. Andrew Rogers wrote: On Tue, 2004-09-14 at 21:30, Joe Conway wrote: That's exactly what we're doing, but using inherited tables instead of a union view. With inheritance, there is no need to rebuild the view each time a table is added or removed. Basically, in our application, tables are partitioned by either month or week, depending on the type of data involved, and queries are normally date qualified. We do something very similar, also using table inheritance and a lot of triggers to automatically generate partitions and so forth. It works pretty well, but it is a custom job every time I want to implement a partitioned table. You can save a lot on speed and space if you use it to break up large tables with composite indexes, since you can drop columns from the table depending on how you use it. A big part of Forgive my ignorance, but I didn't think you could have a table that inherits from a parent not have all the columns. Or is that not what you mean by 'you can drop columns from the table...'? This is one advantage I see to a big UNION ALL view; if you're doing partitioning based on unique values, you don't actually have to store that value in the partition tables. For example, http://stats.distributed.net has a table that details how much work each participant did each day for each project. Storing project_id in that table is an extra 4 bytes... doesn't sound like much until you consider that the table has over 130M rows right now. So it would be nice to have an easy way to partition the table based on unique project_id's and not waste space in the partition tables on a field that will be the same for every row (in each partition). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Article about PostgreSQL and RAID in Brazil
On Thu, Sep 16, 2004 at 10:50:33AM -0700, Josh Berkus wrote: The second step is to have lots of disks; 5 drives is a minimum for really good performance. 3-drive RAID5, in particular, is a poor performer for PostgreSQL, often resulting in I/O that is 40% or less as efficient as a single disk due to extremely slow random seeks and little parallelization. Once you have 6 drives or more, opinions are divided on whether RAID 10 or RAID 5 is better. I think it partly depends on your access pattern. What about benefits from putting WAL and pg_temp on seperate drives? Specifically, we have a box with 8 drives, 2 in a mirror with the OS and WAL and pg_temp; the rest in a raid10 with the database on it. Do you think it would have been better to make one big raid10? What if it was raid5? And what if it was only 6 drives total? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Article about PostgreSQL and RAID in Brazil
On Thu, Sep 16, 2004 at 02:07:37PM -0700, Josh Berkus wrote: Jim, What about benefits from putting WAL and pg_temp on seperate drives? Specifically, we have a box with 8 drives, 2 in a mirror with the OS and WAL and pg_temp; the rest in a raid10 with the database on it. Do you think it would have been better to make one big raid10? What if it was raid5? And what if it was only 6 drives total? OSDL's finding was that even with a large RAID array, it still benefits you to have WAL on a seperate disk resource ... substantially, like 10% total performance.However, your setup doesn't get the full possible benefit, since WAL is sharing the array with other resources. Yes, but if a 3 drive raid array is 40% slower than a single disk it seems like the 10% benefit for having WAL on a seperate drive would still be a losing proposition. BTW, my experience with our setup is that the raid10 is almost always the IO bottleneck, and not the mirror with everything else on it. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
On Thu, Sep 23, 2004 at 08:29:25AM -0700, Mr Pink wrote: Not knowing anything about the internals of pg, I don't know how this relates, but in theory, query plan caching is not just about saving time re-planning queries, it's about scalability. Optimizing queries requires shared locks on the database metadata, which, as I understand it causes contention and serialization, which kills scalability. One of the guru's can correct me if I'm wrong here, but AFAIK metadata lookups use essentially the same access methods as normal queries. This means MVCC is used and no locking is required. Even if locks were required, they would be shared read locks which wouldn't block each other. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
On Mon, Sep 27, 2004 at 09:30:31PM +0100, Matt Clark wrote: It's certainly the case that the typical web app (which, along with warehouses, seems to be one half of the needy apps), could probably do worse than use pooling as well. I'm not well up enough on pooling to know how bulletproof it is though, which is why I included it in my list of things that make me go 'hmm'. It would be really nice not to have to take both things together. If you're not using a connection pool of some kind then you might as well forget query plan caching, because your connect overhead will swamp the planning cost. This does not mean you have to use something like pgpool (which makes some rather questionable claims IMO); any decent web application language/environment will support connection pooling. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching of Queries
On Fri, Oct 01, 2004 at 06:43:42AM +0100, Matt Clark wrote: If you're not using a connection pool of some kind then you might as well forget query plan caching, because your connect overhead will swamp the planning cost. This does not mean you have to use something like pgpool (which makes some rather questionable claims IMO); any decent web application language/environment will support connection pooling. Hmm, a question of definition - there's a difference between a pool and a persistent connection. Pretty much all web apps have one connection per process, which is persistent (i.e. not dropped and remade for each request), but not shared between processes, therefore not pooled. OK, that'd work too... the point is if you're re-connecting all the time it doesn't really matter what else you do for performance. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
On Fri, Oct 01, 2004 at 10:10:40AM -0700, Josh Berkus wrote: Transparent query caching is the industry standard for how these things are handled. However, Postgres' lack of this feature has made me consider other approaches, and I'm starting to wonder if the standard query caching -- where a materialized query result, or some reduction thereof, is cached in database memory -- isn't the best way to cache things. I'm going to abbreviate it SQC for the rest of this e-mail. Not to quibble, but are you sure that's the standard? Oracle and DB2 don't do this, and I didn't think MSSQL did either. What they do do is cache query *plans*. This is a *huge* deal in Oracle; search http://asktom.oracle.com for 'soft parse'. In any case, I think a means of marking some specific queries as being cachable is an excellent idea; perfect for 'static data' scenarios. What I don't know is how much will be saved. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance suggestions for an update-mostly database?
And obviously make sure you're vacuuming frequently. On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote: Steve, I'm used to performance tuning on a select-heavy database, but this will have a very different impact on the system. Does anyone have any experience with an update heavy system, and have any performance hints or hardware suggestions? Minimal/no indexes on the table(s).Raise checkpoint_segments and consider using commit_siblings/commit_delay if it's a multi-stream application. Figure out ways to do inserts instead of updates where possible, and COPY instead of insert, where possible. Put your WAL on its own disk resource. I'm a little curious as to what kind of app would be 95% writes. A log? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Does PostgreSQL run with Oracle?
On Fri, Oct 15, 2004 at 10:19:48AM -0700, Steve Atkins wrote: On Fri, Oct 15, 2004 at 11:54:44AM -0500, [EMAIL PROTECTED] wrote: My basic question to the community is is PostgreSQL approximately as fast as Oracle? I'm currently running single processor UltraSPARC workstations, and intend to use Intel Arch laptops and Linux. The application is a big turnkey workstation app. I know the hardware switch alone will enhance performance, and may do so to the point where even a slower database will still be adequate. I have found that PostgreSQL seems to perform poorly on Solaris/SPARC (less so after recent improvements, but still...) compared to x86 systems - more so than the delta between Oracle on the two platforms. Just a gut impression, but it might mean that comparing the two databases on SPARC may not be that useful comparison if you're planning to move to x86. As a point of reference, an IBM hardware sales rep I worked with a few years ago told me that he got a lot of sales from Oracle shops that were running Sun and switched to RS/6000. Basically, for a given workload, it would take 2x the number of Sun CPUs as RS/6000 CPUs. The difference in Oracle licensing costs was usually enough to pay for the new hardware in one year. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] OS desicion
On Wed, Oct 20, 2004 at 09:38:51AM -0700, Josh Berkus wrote: Tom, You are asking the wrong question. The best OS is the OS you (and/or the customer) knows and can administer competently. I'll have to 2nd this. I'll 3rd but add one tidbit: FreeBSD will schedule disk I/O based on process priority, while linux won't. This can be very handy for things like vacuum. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
Note that most people are now moving away from raw devices for databases in most applicaitons. The relatively small performance gain isn't worth the hassles. On Thu, Oct 21, 2004 at 12:27:27PM +0200, Steinar H. Gunderson wrote: On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote: I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. This really sounds like a poor replacement for just making PostgreSQL use raw devices to me. (I have no idea why that isn't done already, but presumably it isn't all that easy to get right. :-) ) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Sequential Scan with LIMIT
On Sun, Oct 24, 2004 at 04:11:53PM -0400, Tom Lane wrote: But the LIMIT will cut the cost of the seqscan case too. Given the numbers you posit above, about one row in five will have 'myval', so a seqscan can reasonably expect to hit the first matching row in the first page of the table. This is still cheaper than doing an index scan (which must require reading at least one index page plus at least one table page). The test case you are showing is probably suffering from nonrandom placement of this particular data value; which is something that the statistics we keep are too crude to detect. Isn't that exactly what pg_stats.correlation is? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Sequential Scan with LIMIT
On Thu, Oct 28, 2004 at 07:49:28PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: On Sun, Oct 24, 2004 at 04:11:53PM -0400, Tom Lane wrote: The test case you are showing is probably suffering from nonrandom placement of this particular data value; which is something that the statistics we keep are too crude to detect. Isn't that exactly what pg_stats.correlation is? No. A far-from-zero correlation gives you a clue that on average, *all* the data values are placed nonrandomly ... but it doesn't really tell you much one way or the other about a single data value. Maybe I'm confused about what the original issue was then... it appeared that you were suggesting PGSQL was doing a seq scan instead of an index scan because it thought it would find it on the first page if the data was randomly distributed. If the correlation is highly non-zero though, shouldn't it 'play it safe' and assume that unless it's picking the min or max value stored in statistics it will be better to do an index scan, since the value it's looking for is probably in the middle of the table somewhere? IE: if the values in the field are between 1 and 5 and the table is clustered on that field then clearly an index scan would be better to find a row with field=3 than a seq scan. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] seqscan strikes again
) (actual time=0.018..0.038 rows=10 loops=1) Total runtime: 218.644 ms (15 rows) opensims=# I'd really like to avoid putting a 'set enable_seqscan=false' in my code, especially since this query only has a problem if it's run on a large date/time window, which normally doesn't happen. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] seqscan strikes again
Which column would you recommend? Did something stick out at you? On Tue, Nov 09, 2004 at 03:14:36PM -0800, Joshua D. Drake wrote: opensims=# I'd really like to avoid putting a 'set enable_seqscan=false' in my code, especially since this query only has a problem if it's run on a large date/time window, which normally doesn't happen. Try increasing your statistics target for the column and then rerunning analyze. Sincerely, Joshua D. Drake -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Analyzer is clueless
On Wed, Nov 17, 2004 at 10:32:48PM +, Simon Riggs wrote: The main issue is that PostgreSQL's default histogram statistics setting is lower than other RDBMS. This means that it is less able to discriminate between cases such as yours that are close to the edge. This is a trade-off between run-time of the ANALYZE command and the benefit it produces. As Joshua suggests, increasing the statistics target for this table will likely allow the optimizer to correctly determine the selectivity of the index and take the right path. Is there still a good reason to have the histogram stats so low? Should the default be changed to more like 100 at this point? Also, how extensively does the planner use n_distinct, null_frac, reltuples and the histogram to see what the odds are of finding a unique value or a low number of values? I've seen cases where it seems the planer doesn't think it'll be getting a unique value or a small set of values even though stats indicates that it should be. One final question... would there be interest in a process that would dynamically update the histogram settings for tables based on how distinct/unique each field was? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] memcached and PostgreSQL
If instead of a select you do a select for update I think this would be transaction safe. Nothing would be able to modify the data in the database between when you do the SELECT and when you commit. If the transaction fails the value in memcached will be correct. Also, it's not clear if you're doing an update here or not... If you're doing an update then this wouldn't work. You'd want to do your update, then re-insert the value into memcached outside of the update transaction. On Tue, Nov 23, 2004 at 02:20:34PM -0800, Sean Chittenden wrote: My point was that there are two failure cases --- one where the cache is slightly out of date compared to the db server --- these are cases where the cache update is slightly before/after the commit. I was thinking about this and ways to minimize this even further. Have memcache clients add data and have a policy to have the database only delete data. This sets the database up as the bottleneck again, but then you have a degree of transactionality that couldn't be previously achieved with the database issuing replace commands. For example: 1) client checks the cache for data and gets a cache lookup failure 2) client beings transaction 3) client SELECTs data from the database 4) client adds the key to the cache 5) client commits transaction This assumes that the client won't rollback or have a transaction failure. Again, in 50M transactions, I doubt one of them would fail (sure, it's possible, but that's a symptom of bigger problems: memcached isn't an RDBMS). The update case being: 1) client begins transaction 2) client updates data 3) database deletes record from memcache 4) client commits transaction 5) client adds data to memcache The second is where the cache update happens and the commit later fails, or the commit happens and the cache update never happens. Having pgmemcache delete, not replace data addresses this second issue. -sc -- Sean Chittenden ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgres vs. DSpam
FWIW, those queries won't be able to use an index. A better WHERE clause would be: AND last_hit CURRENT_DATE - 60 On Fri, Nov 26, 2004 at 02:37:12PM +1300, Andrew McMillan wrote: On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote: Hi, I have installed the dspam filter (http://www.nuclearelephant.com/projects/dspam) on our mail server (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users with a quite low traffic of 4000 messages/day. So it's a quite common platform/environment, nothing spectacular. I am using DSpam with PostgreSQL here. I have a daily job that cleans the DSpam database up, as follows: DELETE FROM dspam_token_data WHERE (innocent_hits*2) + spam_hits 5 AND CURRENT_DATE - last_hit 60; DELETE FROM dspam_token_data WHERE innocent_hits = 1 AND CURRENT_DATE - last_hit 30; DELETE FROM dspam_token_data WHERE CURRENT_DATE - last_hit 180; DELETE FROM dspam_signature_data WHERE CURRENT_DATE - created_on 14; VACUUM dspam_token_data; VACUUM dspam_signature_data; I also occasionally do a VACUUM FULL ANALYZE; on the database as well. In all honesty though, I think that MySQL is better suited to DSpam than PostgreSQL is. Please, could anyone explain me this difference? Is Postgres that bad? Is MySQL that good? Am I the only one to have observed this behavior? I believe that what DSpam does that is not well-catered for in the way PostgreSQL operates, is that it does very frequent updates to rows in (eventually) quite large tables. In PostgreSQL the UPDATE will result internally in a new record being written, with the old record being marked as deleted. That old record won't be re-used until after a VACUUM has run, and this means that the on-disk tables will have a lot of dead rows in them quite quickly. The reason that PostgreSQL operates this way, is a direct result of the way transactional support is implemented, and it may well change in a version or two. It's got better over the last few versions, with things like pg_autovacuum, but that approach still doesn't suit some types of database updating. Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!! - -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Normalization or Performance
On Thu, Dec 02, 2004 at 03:05:55PM -0200, Alvaro Nunes Melo wrote: Hi, Before writing this mail, I'd researched a little about this topic, and got some opinions from guys like Fabien Pascal, who argues that logical design should be separated from physical design, and other sources. As this is not fact, I'm writing to you guys, that make things work in real world. We started our first big (for our company standards) project always thinking in normalization. But once we imported legacy data into the DB, things got harder. One example is the clients status. A client might be active, inactive or pending (for many reasons). We store all the status a client have since it is in the system. To check what is the actual status of a client, we get the last status from this historical status table. This take a considerable time, so our best results were achieved building a function that checks the status and indexing this function. The problem is that indexed functions mus bu immutable, so as you can figure, if the status change after the creation of the index, the retunr of the function is still the same. What do you suggest for situations like this? Should I add a field to clients table and store its actual status, keep storing data in the historical table an control its changes with a trigger? It seems you shouldn't have to resort to this. SELECT status FROM client_status WHERE client_id = blah ORDER BY status_date DESC LIMIT 1 should be pretty fast given an index on client_id, status_date (which should be able to be unique). There are other situations that are making things difficult to us. For example, one query must return the total amount a client bought in the last 90 days. It's taking too long, when we must know it for many clients, many times. So should I create summarization tables to store this kind of stuff, update it with a trigger in daily basis (for example), and solve this problem with one join? This sounds like a more likely candidate for a summary table, though you might not want to use a trigger. Unless you need absolutely up-to-date information it seems like a nightly process to update the totals would be better and more efficient. Our database is not that big. The larger table has about 7.000.000 rows. About 50.000 clients, half of them active. All that I'd point out above uses indexes for queries, etc. But even with this it's not been fast enough. We have a Dell server for this (I know, the Dell issue), a Dual Xeon 2.8, SCSI HD, 1 GB mem. Do we need better hardware for our system? Is all this on a single HD? That's going to be a huge bottleneck. You'll be much better off with a mirrored partition for your WAL files and either raid5 or raid10 for the database itself. You'd probably be better off with more memory as well. If you're going to buy a new box instead of upgrade your existing one, I'd recommend going with an Opteron because of it's much better memory bandwidth. For reference, stats.distributed.net is a dual Opteron 244 1.8GHz with 4G ram, a 200G mirror for WAL and the system files and a 6x200G RAID10 for the database (all SATA drives). The largest table 120M rows and 825,000 8k pages. I can scan 1/5th of that table via an index scan in about a minute. (The schema can be found at http://minilink.org/cvs.distributed.net/l3.sql. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Partitioned table performance
Sorry for the late reply, so I included the whole thread. Should this be a TODO? On Wed, Dec 15, 2004 at 08:30:08PM -0500, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: But I'm a bit puzzled. Why would Append have any significant cost? It's just taking the tuples from one plan node and returning them until they run out, then taking the tuples from another plan node. It should have no i/o cost and hardly any cpu cost. Where is the time going? As best I can tell by profiling, the cost of the Append node per se is indeed negligible --- no more than a couple percent of the runtime in CVS tip for a test case similar to Stacy White's example. It looks bad in EXPLAIN ANALYZE, but you have to realize that passing the tuples up through the Append node doubles the instrumentation overhead of EXPLAIN ANALYZE, which is pretty sizable already. (If you turn on \timing in psql and try the query itself vs. EXPLAIN ANALYZE, the actual elapsed time is about double, at least for me.) The other effect, which I hadn't expected, is that the seqscans themselves actually slow down. I get regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM super_foo ; QUERY PLAN Aggregate (cost=16414.32..16414.32 rows=1 width=4) (actual time=32313.980..32313.988 rows=1 loops=1) - Append (cost=0.00..13631.54 rows=556555 width=4) (actual time=0.232..21848.401 rows=524289 loops=1) - Seq Scan on super_foo (cost=0.00..0.00 rows=1 width=4) (actual time=0.020..0.020 rows=0 loops=1) - Seq Scan on sub_foo1 super_foo (cost=0.00..6815.77 rows=278277 width=4) (actual time=0.187..6926.395 rows=262144 loops=1) - Seq Scan on sub_foo2 super_foo (cost=0.00..6815.77 rows=278277 width=4) (actual time=0.168..7026.953 rows=262145 loops=1) Total runtime: 32314.993 ms (6 rows) regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM sub_foo1; QUERY PLAN Aggregate (cost=8207.16..8207.16 rows=1 width=4) (actual time=9850.420..9850.428 rows=1 loops=1) - Seq Scan on sub_foo1 (cost=0.00..6815.77 rows=278277 width=4) (actual time=0.202..4642.401 rows=262144 loops=1) Total runtime: 9851.423 ms (3 rows) Notice the actual times for the sub_foo1 seqscans. That increase (when counted for both input tables) almost exactly accounts for the difference in non-EXPLAIN ANALYZE runtime. After digging around, I find that the reason for the difference is that the optimization to avoid a projection step (ExecProject) isn't applied for scans of inheritance unions: /* * Can't do it with inheritance cases either (mainly because Append * doesn't project). */ if (rel-reloptkind != RELOPT_BASEREL) return false; So if you were to try the example in a pre-7.4 PG, which didn't have that optimization, you'd probably find that the speeds were just about the same. (I'm too lazy to verify this though.) I looked briefly at what it would take to cover this case, and decided that it's a nontrivial change, so it's too late to do something about it for 8.0. I think it's probably possible to fix it though, at least for cases where the child tables have rowtypes identical to the parent. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioned table performance
On Wed, Dec 15, 2004 at 11:56:40AM -0800, Josh Berkus wrote: Greg, Well Oracle has lots of partitioning intelligence pushed up to the planner to avoid overhead. If you have a query with something like WHERE date = '2004-01-01' and date is your partition key (even if it's a range) then Oracle will figure out which partition it will need at planning time. Hmmm ... well, we're looking at making a spec for Postgres Table Partitioning. Maybe you could help? This is something I've been thinking about doing for http://stats.distributed.net; is there a formal project for this somewhere? On a different note, has anyone looked at the savings you get by ommitting the partition field from the child tables? ISTM that the savings would be substantial for narrow tables. Of course that most likely means doing a union view instead of inheritence, but I'm guessing here. The table I'm thinking of partitioning is quite narrow (see below), so I suspect that dropping project_id out would result in a substantial savings (there's basically nothing that ever queries across the whole table). With the data distribution, I suspect just breaking project ID's 205, 5, and 25 into partitioned tables that didn't contain project_id would save about 450M (4bytes * 95% * 130M). (the table has ~130M rows) Table public.email_contrib Column | Type | Modifiers +-+--- project_id | integer | not null id | integer | not null date | date| not null team_id| integer | work_units | bigint | not null Indexes: email_contrib_pkey primary key, btree (project_id, id, date) email_contrib__pk24 btree (id, date) WHERE (project_id = 24) email_contrib__pk25 btree (id, date) WHERE (project_id = 25) email_contrib__pk8 btree (id, date) WHERE (project_id = 8) email_contrib__project_date btree (project_id, date) Foreign-key constraints: fk_email_contrib__id FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE fk_email_contrib__team_id FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE stats=# select * from pg_stats where tablename='email_contrib' and attname='project_id'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals |most_common_freqs | histogram_bounds | correlation +---++---+---++---+-+--+- public | email_contrib | project_id | 0 | 4 | 6 | {205,5,25,8,24,3} | {0.461133,0.4455,0.0444333,0.0418667,0.0049,0.00216667} | | 0.703936 -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft
On Mon, Jan 10, 2005 at 12:46:01PM -0500, Alex Turner wrote: You sir are correct! You can't use perl in MS-SQL or Oracle ;). On the other hand, PL/SQL is incredibly powerful, especially combined with all the tools/utilities that come with Oracle. I think you'd be hard-pressed to find too many real-world examples where you could do something with a PostgreSQL procedural language that you couldn't do with PL/SQL. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance delay
On Thu, Jan 13, 2005 at 07:14:10PM +0800, Hasnul Fadhly bin Hasan wrote: Hi Richard, Thanks for the reply.. is that the case? i thought it would comply to the where condition first.. and after that it will format the output to what we want.. That is in fact exactly what it's doing. The second query is faster not because of the where clause, but because of the limit clause. The first query builds a list of id, long2ip(srcip), long2ip(dstip) for the timestamp range, then it orders that list and gives you the first 30. The second query builds a list of everything from sometable for the timestamp range, orders it, keeps the first 30, THEN in calculates long2ip based on that list of 30 items. Hasnul Richard Huxton wrote: Hasnul Fadhly bin Hasan wrote: Hi, just want to share with all of you a wierd thing that i found when i tested it. i was doing a query that will call a function long2ip to convert bigint to ips. so the query looks something like this. select id, long2ip(srcip), long2ip(dstip) from sometable where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 23:59' order by id limit 30; for your info, there are about 300k rows for that timeframe. it cost me about 57+ secs to get the list. which is about the same if i query select id, long2ip(srcip), long2ip(dstip) from sometable where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 23:59' it will cost me about 57+ secs also. Now if i did this select id,long2ip(srcip), long2ip(dstip) from ( * from sometable where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 23:59' order by id limit 30) as t; it will cost me about 3+ secs The difference will be that in the final case you only make 30 calls to long2ip() whereas in the first two you call it 300,000 times and then throw away most of them. Try running EXPLAIN ANALYSE ... for both - that will show how PG is planning the query. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Thu, Jan 20, 2005 at 10:08:47AM -0500, Stephen Frost wrote: * Christopher Kings-Lynne ([EMAIL PROTECTED]) wrote: PostgreSQL has replication, but not partitioning (which is what you want). It doesn't have multi-server partitioning.. It's got partitioning within a single server (doesn't it? I thought it did, I know it was discussed w/ the guy from Cox Communications and I thought he was using it :). No, PostgreSQL doesn't support any kind of partitioning, unless you write it yourself. I think there's some work being done in this area, though. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Thu, Jan 20, 2005 at 10:40:02PM -0200, Bruno Almeida do Lago wrote: I was thinking the same! I'd like to know how other databases such as Oracle do it. In a nutshell, in a clustered environment (which iirc in oracle means shared disks), they use a set of files for locking and consistency across machines. So you better have fast access to the drive array, and the array better have caching of some kind. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Thu, Jan 20, 2005 at 07:12:42AM -0800, Joshua D. Drake wrote: then I was thinking. Couldn't he use multiple databases over multiple servers with dblink? It is not exactly how I would want to do it, but it would provide what he needs I think??? Yes seems to be the only solution ... but I'm a little disapointed about this ... could you explain me why there is not this kind of functionnality ... it seems to be a real need for big applications no ? Because it is really, really hard to do correctly and hard equals expensive. To expand on what Josh said, the expense in this case is development resources. If you look on the developer site you'll see a huge TODO list and a relatively small list of PostgreSQL developers. To develop a cluster solution similar to RAC would probably take the efforts of the entire development team for a year or more, during which time very little else would be done. I'm glad to see your persistance in wanting to use PostgreSQL, and there might be some kind of limited clustering scheme that could be implemented without a great amount of effort by the core developers. In that case I think there's a good chance you could find people willing to work on it. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Odd number of rows expected
I have a query that thinks it's going to generate a huge number of rows, when in fact it won't: INSERT INTO page_log.rrs ( bucket_id, page_id,project_id,other, hits,min_hits,max_hits,total_duration,min_duration,max_duration ) SELECT a.rrs_bucket_id, page_id,project_id,other , count(*),count(*),count(*),sum(duration),min(duration),max(duration) FROM (SELECT b.bucket_id AS rrs_bucket_id, s.* FROM rrs.bucket b JOIN page_log.log s ON ( b.prev_end_time log_time AND b.end_time = log_time ) WHERE b.rrs_id = '1' AND b.end_time = '2005-01-21 20:23:00+00' AND b.end_time '1970-01-01 00:00:00+00' ) a GROUP BY rrs_bucket_id, page_id,project_id,other; QUERY PLAN --- Subquery Scan *SELECT* (cost=170461360504.98..183419912556.69 rows=91175544 width=77) - GroupAggregate (cost=170461360504.98..183418316984.67 rows=91175544 width=29) - Sort (cost=170461360504.98..171639141309.21 rows=471112321692 width=29) Sort Key: b.bucket_id, s.page_id, s.project_id, s.other - Nested Loop (cost=0.00..17287707964.10 rows=471112321692 width=29) - Seq Scan on bucket b (cost=0.00..9275.84 rows=281406 width=20) Filter: ((rrs_id = 1) AND (end_time = '2005-01-21 20:23:00+00'::timestamp with time zone) AND (end_time '1970-01-01 00:00:00+00'::timestamp with time zone)) - Index Scan using log__log_time on log s (cost=0.00..36321.24 rows=1674137 width=33) Index Cond: ((outer.prev_end_time s.log_time) AND (outer.end_time = s.log_time)) The final rowcount after the aggregate will actually be about 1.9M rows: QUERY PLAN --- Subquery Scan *SELECT* (cost=170461360504.98..183419912556.69 rows=91175544 width=77) (actual time=156777.374..234613.843 rows=1945123 loops=1) - GroupAggregate (cost=170461360504.98..183418316984.67 rows=91175544 width=29) (actual time=156777.345..214246.751 rows=1945123 loops=1) - Sort (cost=170461360504.98..171639141309.21 rows=471112321692 width=29) (actual time=156777.296..177517.663 rows=4915567 loops=1) Sort Key: b.bucket_id, s.page_id, s.project_id, s.other - Nested Loop (cost=0.00..17287707964.10 rows=471112321692 width=29) (actual time=0.662..90702.755 rows=4915567 loops=1) - Seq Scan on bucket b (cost=0.00..9275.84 rows=281406 width=20) (actual time=0.063..1591.591 rows=265122 loops=1) Filter: ((rrs_id = 1) AND (end_time = '2005-01-21 20:23:00+00'::timestamp with time zone) AND (end_time '1970-01-01 00:00:00+00'::timestamp with time zone)) - Index Scan using log__log_time on log s (cost=0.00..36321.24 rows=1674137 width=33) (actual time=0.014..0.174 rows=19 loops=265122) Index Cond: ((outer.prev_end_time s.log_time) AND (outer.end_time = s.log_time)) Total runtime: 299623.954 ms Everything is analyzed, and the statistics target is set to 1000. Basically, it seems that it doesn't understand that each row in log will match up with at most one row in bucket. There is a unique index on bucket(rrs_id, end_time), so it should be able to tell this. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
From http://developer.postgresql.org/todo.php: Maintain a map of recently-expired rows This allows vacuum to reclaim free space without requiring a sequential scan On Sat, Jan 22, 2005 at 12:20:53PM -0500, Greg Stark wrote: Dawid Kuroczko [EMAIL PROTECTED] writes: Quick thought -- would it be to possible to implement a 'partial VACUUM' per analogiam to partial indexes? No. But it gave me another idea. Perhaps equally infeasible, but I don't see why. What if there were a map of modified pages. So every time any tuple was marked deleted it could be marked in the map as modified. VACUUM would only have to look at these pages. And if it could mark as free every tuple that was marked as deleted then it could unmark the page. The only downside I see is that this could be a source of contention on multi-processor machines running lots of concurrent update/deletes. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Cheaper VACUUMing
For reference, here's the discussion about this that took place on hackers: http://lnk.nu/archives.postgresql.org/142.php On Sun, Jan 23, 2005 at 01:16:20AM -0500, Christopher Browne wrote: A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Greg Stark) wrote: Dawid Kuroczko [EMAIL PROTECTED] writes: Quick thought -- would it be to possible to implement a 'partial VACUUM' per analogiam to partial indexes? No. But it gave me another idea. Perhaps equally infeasible, but I don't see why. What if there were a map of modified pages. So every time any tuple was marked deleted it could be marked in the map as modified. VACUUM would only have to look at these pages. And if it could mark as free every tuple that was marked as deleted then it could unmark the page. The only downside I see is that this could be a source of contention on multi-processor machines running lots of concurrent update/deletes. I was thinking the same thing after hearing fairly extensive pooh-poohing of the notion of vacuuming based on all the pages in the shared cache. This hot list page table would probably need to be a hash table. It rather parallels the FSM, including the way that it would need to be limited in size. -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com'). http://cbbrowne.com/info/lsf.html Rules of the Evil Overlord #57. Before employing any captured artifacts or machinery, I will carefully read the owner's manual. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Odd number of rows expected
On Sat, Jan 22, 2005 at 10:18:00PM -0500, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: (SELECT b.bucket_id AS rrs_bucket_id, s.* FROM rrs.bucket b JOIN page_log.log s ON ( b.prev_end_time log_time AND b.end_time = log_time ) WHERE b.rrs_id = '1' AND b.end_time = '2005-01-21 20:23:00+00' AND b.end_time '1970-01-01 00:00:00+00' ) a Basically, it seems that it doesn't understand that each row in log will match up with at most one row in bucket. There is a unique index on bucket(rrs_id, end_time), so it should be able to tell this. Why should it be able to tell that? Indexes: rrs_bucket__rrs_id__end_time unique, btree (rrs_id, end_time) Err, crap, I guess that wouldn't work, because of prev_end_time not being in there... In english, each bucket defines a specific time period, and no two buckets can over-lap (though there's no constraints defined to actually prevent that). So reality is that each row in page_log.log will in fact only match one row in bucket (at least for each value of rrs_id). Given that, would the optimizer make a better choice if it knew that (since it means a much smaller result set). Is there any way to tell the optimizer this is the case? Maybe what I ultimately need is a timestamp with interval datatype, that specifies an interval that's fixed in time. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Sun, Jan 23, 2005 at 03:40:03PM -0500, Tom Lane wrote: There was some discussion in Toronto this week about storing bitmaps that would tell VACUUM whether or not there was any need to visit individual pages of each table. Getting rid of useless scans through not-recently-changed areas of large tables would make for a significant reduction in the cost of VACUUM. FWIW, that's already on the TODO. See also http://lnk.nu/archives.postgresql.org/142.php. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Sun, Jan 23, 2005 at 03:40:03PM -0500, Tom Lane wrote: The real issue with any such scheme is that you are putting maintenance costs into the critical paths of foreground processes that are executing user queries. I think that one of the primary advantages of the Postgres storage design is that we keep that work outside the critical path and delegate it to maintenance processes that can run in the background. We shouldn't lightly toss away that advantage. To pull out the oft-used show me the numbers card... has anyone done a study to see if keeping this stuff out of the 'critical path' actually helps overall system performance? While the current scheme initially speeds up transactions, eventually you have to run vacuum, which puts a big load on the system. If you can put off vacuuming until off-hours (assuming your system has off-hours), then this doesn't matter, but more and more we're seeing systems where vacuum is a big performance issue (hence recent work with the delay in vacuum so as not to swamp the IO system). If you vacuum as part of the transaction it's going to be more efficient of resources, because you have more of what you need right there (ie: odds are that you're on the same page as the old tuple). In cases like that it very likely makes a lot of sense to take a small hit in your transaction time up-front, instead of a larger hit doing a vacuum down the road. Of course, without numbers this is a bunch of hand-waving, but I don't think it's valid to assume that minimizing the amount of work you do in a transaction means better throughput without considering what it will cost to do the work you're putting off until later. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Automagic tuning
On Mon, Jan 31, 2005 at 03:26:12PM -0500, Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: I would like something that really measures values like random_page_cost or cpu_tuple_cost that are hardware dependent. I assume such thing does not exist? Nope. You gotta whip out your calculator and run some queries. Preferably a whole lot of queries. All the measurement techniques I can think of are going to have a great deal of noise, so you shouldn't twiddle these cost settings based on just a few examples. Are there any examples of how you can take numbers from pg_stats_* or explain analize and turn them into configuration settings (such and random page cost)? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
On Mon, Jan 31, 2005 at 09:41:32PM +0100, Cosimo Streppone wrote: 2) The goal is to make the db handle 100 tps (something like 100 users). What kind of server and storage should I provide? The actual servers our application runs on normally have 2 Intel Xeon processors, 2-4 Gb RAM, RAID 0/1/5 SCSI disk storage with hard drives @ 10,000 rpm You might look at Opteron's, which theoretically have a higher data bandwidth. If you're doing anything data intensive, like a sort in memory, this could make a difference. 4) Is it correct to suppose that multiple RAID 1 arrays can provide the fastest I/O ? I usually reserve one RAID1 array to db data directory, one RAID1 array to pg_xlog directory and one RAID1 array for os and application needs. RAID10 will be faster than RAID1. The key factor to a high performance database is a high performance I/O system. If you look in the archives you'll find people running postgresql on 30 and 40 drive arrays. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Automagic tuning
On Tue, Feb 01, 2005 at 12:06:27AM -0500, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: On Mon, Jan 31, 2005 at 03:26:12PM -0500, Tom Lane wrote: Preferably a whole lot of queries. All the measurement techniques I can think of are going to have a great deal of noise, so you shouldn't twiddle these cost settings based on just a few examples. Are there any examples of how you can take numbers from pg_stats_* or explain analize and turn them into configuration settings (such and random page cost)? Well, the basic idea is to adjust random_page_cost so that the ratio of estimated cost to real elapsed time (as shown by EXPLAIN ANALYZE) is the same for seqscans and indexscans. What you have to watch out for is that the estimated cost model is oversimplified and doesn't take into account a lot of real-world factors, such as the activity of other concurrent processes. The reason for needing a whole lot of tests is essentially to try to average out the effects of those unmodeled factors, so that you have a number that makes sense within the planner's limited view of reality. Given that, I guess the next logical question is: what would it take to collect stats on queries so that such an estimate could be made? And would it be possible/make sense to gather stats useful for tuning the other parameters? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote: You might look at Opteron's, which theoretically have a higher data bandwidth. If you're doing anything data intensive, like a sort in memory, this could make a difference. Would Opteron systems need 64-bit postgresql (and os, gcc, ...) build to have that advantage? Well, that would give you the most benefit, but the memory bandwidth is still greater than on a Xeon. There's really no issue with 64 bit if you're using open source software; it all compiles for 64 bits and you're good to go. http://stats.distributed.net runs on a dual opteron box running FreeBSD and I've had no issues. RAID10 will be faster than RAID1. Sorry Jim, by RAID10 you mean several raid1 arrays mounted on different linux partitions? Or several raid1 arrays that build up a raid0 array? In the latter case, who decides which data goes in which raid1 array? Raid Adapter? You should take a look around online for a description of raid types. There's technically RAID0+1 and RAID1+0; one is a stripe of mirrored drives (a RAID 0 built out of RAID 1s), the other is a mirror of two RAID 0s. The former is much better; if you're lucky you can lose half your drives without any data loss (if each dead drive is part of a different mirror). Recovery is also faster. You'll almost certainly be much happier with hardware raid instead of software raid. stats.distributed.net runs a 3ware controller and SATA drives. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
On Wed, Feb 23, 2005 at 01:37:28PM -0500, Tom Lane wrote: Bruno Almeida do Lago [EMAIL PROTECTED] writes: Is there a real limit for max_connections? Here we've an Oracle server with up to 1200 simultaneous conections over it! [ shrug... ] If your machine has the beef to run 1200 simultaneous queries, you can set max_connections to 1200. The point of what you were quoting is that if you want to service 1200 concurrent users but you only expect maybe 100 simultaneously active queries from them (and you have a database box that can only service that many) then you want to put a connection pooler in front of 100 backends, not try to start a backend for every user. Oracle may handle this sort of thing differently, I dunno. regards, tom lane Oracle has some form of built-in connection pooling. I don't remember the exact details of it off the top of my head, but I think it was a 'wedge' that clients would connect to as if it was the database, and the wedge would then find an available database process to use. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] multi billion row tables: possible or insane?
On Tue, Mar 01, 2005 at 10:34:29AM +0100, Ramon Bastiaans wrote: Hi all, I am doing research for a project of mine where I need to store several billion values for a monitoring and historical tracking system for a big computer system. My currect estimate is that I have to store (somehow) around 1 billion values each month (possibly more). On a side-note, do you need to keep the actual row-level details for history? http://rrs.decibel.org might be of some use. Other than that, what others have said. Lots and lots of disks in RAID10, and opterons (though I would choose opterons not for memory size but because of memory *bandwidth*). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] What needs to be done for real Partitioning?
On Mon, Mar 21, 2005 at 09:55:03AM -0800, Josh Berkus wrote: Stacy, Luckily they that had the chance to work with a truly fantastic DBA (the author of an Oracle Press performance tuning book even) before they could switch back. He convinced them to make some of their indexes global. Performance dramatically improved (compared with both the unpartitioned schema, and the partitioned-and-locally-indexed schema), and they've since stayed with partitioned tables and a mix of local and global indexes. Hmmm. Wouldn't Greg's suggestion of a bitmap index which holds information on what values are found in what partition also solve this?Without 1/2 of the overhead imposed by global indexes? I can actually see such a bitmap as being universally useful to the partitioning concept ... for one, it would resolve the whole partition on {value} issue. I suspect both will have their uses. I've read quite a bit about global v. local indexs in Oracle, and there are definately cases where global is much better than local. Granted, there's some things with how Oracle handles their catalog, etc. that might make local indexes more expensive for them than they would be for PostgreSQL. It's also not clear how much a 'partition bitmap' index would help. As for the 'seqscan individual partitions' argument, that's not going to work well at all for a case where you need to hit a relatively small percentage of rows in a relatively large number of partitions. SELECT ... WHERE customer_id = 1 would be a good example of such a query (assuming the table is partitioned on something like invoice_date). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] What needs to be done for real Partitioning?
On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: We probably also need multi-table indexes. As Josh says, that seems antithetical to the main point of partitioning, which is to be able to rapidly remove (and add) partitions of a table. If you have to do index cleaning before you can drop a partition, what's the point of partitioning? Why would you need to do index cleaning first? Presumably the code that goes to check a heap tuple that an index pointed at to ensure that it was visible in the current transaction would be able to recognize if the partition that tuple was in had been removed, and just ignore that index entry. Granted, you'd need to clean the index up at some point (presumably via vacuum), but it doesn't need to occur at partition drop time. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Preventing query from hogging server
On Thu, Mar 24, 2005 at 01:07:39PM -0600, Matthew Nuzum wrote: I've tried `nice psql` in the past and I don't think that had much impact, but I haven't tried it on this query. On linux, nice will only help if the query is CPU-bound. On FreeBSD, nice affects I/O scheduling, as well as CPU, so it's a more effective means of limiting the impact of large queries. I don't know how other OS's handle this. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Compressing WAL
Maybe better for -hackers, but here it goes anyway... Has anyone looked at compressing WAL's before writing to disk? On a system generating a lot of WAL it seems there might be some gains to be had WAL data could be compressed before going to disk, since today's machines are generally more I/O bound than CPU bound. And unlike the base tables, you generally don't need to read the WAL, so you don't really need to worry about not being able to quickly scan through the data without decompressing it. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sustained inserts per sec ... ?
On Tue, Apr 05, 2005 at 12:16:27AM -0400, Christopher Petrilli wrote: My fear is that it's some bizarre situation interacting with both issues, and one that might not be solvable. Does anyone else have much experience with this sort of sustained COPY? You might ask the guy who just posted to -admin about a database that's doing 340M inserts a day in 300M transactions... -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)
On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote: Can anyone suggest a more general rule? Do we need for example to consider whether the relation membership is the same in two clauses that might be opposite sides of a range restriction? It seems like a.x b.y AND a.x b.z In a case like this, you could actually look at the data in b and see what the average range size is. If you wanted to get really fancy, the optimizer could decide how best to access a based on each row of b. probably can be treated as a range restriction on a.x for this purpose, but I'm much less sure that the same is true of a.x b.y AND a.x c.z Well, this could end up being much trickier, since who knows how b and c are related. Though thinking about it, although I threw out the row-by-row analysis idea to be glib, that would actually work in this case; you could take a look at what b and c look like each time 'through the loop'. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to improve db performance with $7K?
Sorry if I'm pointing out the obvious here, but it seems worth mentioning. AFAIK all 3ware controllers are setup so that each SATA drive gets it's own SATA bus. My understanding is that by and large, SATA still suffers from a general inability to have multiple outstanding commands on the bus at once, unlike SCSI. Therefore, to get good performance out of SATA you need to have a seperate bus for each drive. Theoretically, it shouldn't really matter that it's SATA over ATA, other than I certainly wouldn't want to try and cram 8 ATA cables into a machine... Incidentally, when we were investigating storage options at a previous job we talked to someone who deals with RS/6000 storage. He had a bunch of info about their serial controller protocol (which I can't think of the name of) vs SCSI. SCSI had a lot more overhead, so you could end up saturating even a 160MB SCSI bus with only 2 or 3 drives. People are finally realizing how important bandwidth has become in modern machines. Memory bandwidth is why RS/6000 was (and maybe still is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of the water. Likewise it's why SCSI is so much better than IDE (unless you just give each drive it's own dedicated bandwidth). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)
On Wed, Apr 06, 2005 at 06:35:10PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote: Can anyone suggest a more general rule? Do we need for example to consider whether the relation membership is the same in two clauses that might be opposite sides of a range restriction? It seems like a.x b.y AND a.x b.z In a case like this, you could actually look at the data in b and see what the average range size is. Not with the current statistics --- you'd need some kind of cross-column statistics involving both y and z. (That is, I doubt it would be helpful to estimate the average range width by taking the difference of independently-calculated mean values of y and z ...) But yeah, in principle it would be possible to make a non-default estimate. Actually, it might be possible to take a SWAG at it using the histogram and correlation stats. You know... since getting universally useful cross-platform stats seems to be pretty pie-in-the-sky, would it be possible to generate more complex stats on the fly from a sampling of a table? If you're looking at a fairly sizeable table ISTM it would be worth sampling the rows on 10 or 20 random pages to see what you get. In this case, you'd want to know the average difference between two fields. Other queries might want something different. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Any way to speed this up?
2 things to point out from this last run: 50% of the time is taken scanning tblassociate - Seq Scan on tblassociate a (cost=0.00..38388.79 rows=199922 width=53) (actual time=62.000..10589.000 rows=176431 loops=1) Filter: ((clientnum)::text = 'SAKS'::text) If you had an index on clientnum and didn't cast it to text in the view, you might be able to use an indexscan, which could be faster (depends on how big the table actually is). This sort is taking about 25% of the time: - Sort (cost=85542.59..86042.39 rows=199922 width=75) (actual time=19641.000..19955.000 rows=159960 loops=1) Sort Key: a.locationid - Merge Right Join (cost=60850.40..62453.22 rows=199922 width=75) (actual time=13500.000..14734.000 rows=176431 loops=1) I suspect it shouldn't take 5 seconds to sort 160k rows in memory, and that this sort is spilling to disk. If you increase your working memory the sort might fit entirely in memory. As a quick test, you could set working memory to 80% of system memory and see how that changes the speed. But you wouldn't want to set it that high in production. On Thu, Apr 07, 2005 at 01:14:33PM -0400, Joel Fradkin wrote: Here is the result after putting it back to 4 the original value (I had done that prior to your suggestion of using 2 or 3) to see what might change. I also vacummed and thought I saw records deleted in associate, which I found odd as this is a test site and no new records were added or deleted. Merge Join (cost=86788.09..87945.00 rows=10387 width=112) (actual time=19703.000..21154.000 rows=159959 loops=1) Merge Cond: (outer.locationid = inner.locationid) - Sort (cost=1245.50..1246.33 rows=332 width=48) (actual time=62.000..62.000 rows=441 loops=1) Sort Key: l.locationid - Index Scan using ix_location on tbllocation l (cost=0.00..1231.60 rows=332 width=48) (actual time=15.000..62.000 rows=441 loops=1) Index Cond: ('SAKS'::text = (clientnum)::text) - Sort (cost=85542.59..86042.39 rows=199922 width=75) (actual time=19641.000..19955.000 rows=159960 loops=1) Sort Key: a.locationid - Merge Right Join (cost=60850.40..62453.22 rows=199922 width=75) (actual time=13500.000..14734.000 rows=176431 loops=1) Merge Cond: ((outer.id = inner.jobtitleid) AND (outer.?column4? = inner.?column10?)) - Sort (cost=554.11..570.13 rows=6409 width=37) (actual time=94.000..94.000 rows=6391 loops=1) Sort Key: jt.id, (jt.clientnum)::text - Seq Scan on tbljobtitle jt (cost=0.00..148.88 rows=6409 width=37) (actual time=0.000..63.000 rows=6391 loops=1) Filter: (1 = presentationid) - Sort (cost=60296.29..60796.09 rows=199922 width=53) (actual time=13406.000..13859.000 rows=176431 loops=1) Sort Key: a.jobtitleid, (a.clientnum)::text - Seq Scan on tblassociate a (cost=0.00..38388.79 rows=199922 width=53) (actual time=62.000..10589.000 rows=176431 loops=1) Filter: ((clientnum)::text = 'SAKS'::text) Total runtime: 22843.000 ms Joel Fradkin -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Thursday, April 07, 2005 11:43 AM To: Joel Fradkin Cc: 'PostgreSQL Perform' Subject: Re: [PERFORM] Any way to speed this up? Joel Fradkin [EMAIL PROTECTED] writes: random_page_cost = 1.2#4# units are one sequential page fetch cost That is almost certainly overoptimistic; it's causing the planner to use indexscans when it shouldn't. Try 2 or 3 or thereabouts. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Functionscan estimates
On Sat, Apr 09, 2005 at 12:00:56AM -0400, Tom Lane wrote: Not too many releases ago, there were several columns in pg_proc that were intended to support estimation of the runtime cost and number of result rows of set-returning functions. I believe in fact that these were the remains of Joe Hellerstein's thesis on expensive-function evaluation, and are exactly what he was talking about here: http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php But with all due respect to Joe, I think the reason that stuff got trimmed is that it didn't work very well. In most cases it's *hard* to write an estimator for a SRF. Let's see you produce one for dblink() for instance ... Actually, if the remote database supported a way to get a rows estimate from the query passed to db_link, it would be trivial, since you'd just pass that back. In fact, having such a function (estimate_rows_for_sql(text)) would probably be very useful to functions that wanted to support returning a rows estimate. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Compressing WAL
On Sun, Apr 10, 2005 at 09:12:41PM -0400, Bruce Momjian wrote: I have never heard anyone talk about it, but it seems useful. I think compressing the page images written on first page modification since checkpoint would be a big win. Could you clarify that? Maybe I'm being naive, but it seems like you could just put a compression routine between the log writer and the filesystem. Is this a TODO? ISTM it's at least worth hacking something together and doing some performance testing... -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] How to improve db performance with $7K?
On Thu, Apr 14, 2005 at 10:51:46AM -0500, Matthew Nuzum wrote: So if you all were going to choose between two hard drives where: drive A has capacity C and spins at 15K rpms, and drive B has capacity 2 x C and spins at 10K rpms and all other features are the same, the price is the same and C is enough disk space which would you choose? I've noticed that on IDE drives, as the capacity increases the data density increases and there is a pereceived (I've not measured it) performance increase. Would the increased data density of the higher capacity drive be of greater benefit than the faster spindle speed of drive A? The increased data density will help transfer speed off the platter, but that's it. It won't help rotational latency. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] How to improve db performance with $7K?
On Mon, Apr 18, 2005 at 07:41:49PM +0200, Jacques Caron wrote: It would be interesting to actually compare this to real-world (or nearly-real-world) benchmarks to measure the effectiveness of features like TCQ/NCQ etc. I was just thinking that it would be very interesting to benchmark different RAID configurations using dbt2. I don't know if this is something that the lab is setup for or capable of, though. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to improve db performance with $7K?
On Mon, Apr 18, 2005 at 10:20:36AM -0500, Dave Held wrote: Hmm...so you're saying that at some point, quantity beats quality? That's an interesting point. However, it presumes that you can actually distribute your data over a larger number of drives. If you have a db with a bottleneck of one or two very large tables, the extra spindles won't help unless you break up the tables and glue them together with query magic. But it's still a point to consider. Huh? Do you know how RAID10 works? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to improve db performance with $7K?
On Mon, Apr 18, 2005 at 06:41:37PM -, Mohan, Ross wrote: Don't you think optimal stripe width would be a good question to research the binaries for? I'd think that drives the answer, largely. (uh oh, pun alert) EG, oracle issues IO requests (this may have changed _just_ recently) in 64KB chunks, regardless of what you ask for. So when I did my striping (many moons ago, when the Earth was young...) I did it in 128KB widths, and set the oracle multiblock read count according. For oracle, any stripe size under 64KB=stupid, anything much over 128K/258K=wasteful. I am eager to find out how PG handles all this. AFAIK PostgreSQL requests data one database page at a time (normally 8k). Of course the OS might do something different. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
On Tue, Apr 19, 2005 at 11:22:17AM -0500, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 04/19/2005 11:10:22 AM: What is 'multiple initiators' used for in the real world? I asked this same question and got an answer off list: Somebody said their SAN hardware used multiple initiators. I would try to check the archives for you, but this thread is becoming more of a rope. Multiple initiators means multiple sources on the bus issuing I/O instructions to the drives. In theory you can have two computers on the same SCSI bus issuing I/O requests to the same drive, or to anything else on the bus, but I've never seen this implemented. Others have noted this feature as being a big deal, so somebody is benefiting from it. It's a big deal for Oracle clustering, which relies on shared drives. Of course most people doing Oracle clustering are probably using a SAN and not raw SCSI... -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] immutable functions vs. join for lookups ?
You should re-run the function test using SQL as the function language instead of plpgsql. There might be some performance to be had there. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] How to improve postgres performace
On Mon, Apr 18, 2005 at 11:36:01AM -0300, Rodrigo Moreno wrote: I really worried about that, because it's no enough anymore, and users claim about performace. But running the vacuumdb full, everthing starts to run better again, so i think the problem is not related to a specific query. Vacuum full will skew your results, unless you plan on running vacuum full all the time. This is because you will always have some amount of dead tuples in a table that has any update or delete activity. A regular vacuum doesn't remove these tuples, it just marks them as available. So over time, depending on how frequently a table is vacuumed, it will settle down to a steady-state size that is greater than it's size after a vacuum full. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sort and index
On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote: Since you are fetching the entire table, you are touching all the rows. If the query were to fetch the rows in index order, it would be seeking all over the table's tracks. By fetching in sequence order, it has a much better chance of fetching rows in a way that minimizes head seeks. Since disk I/O is generally 10-100x slower than RAM, the in-memory sort can be surprisingly slow and still beat indexed disk access. Of course, this is only true if the table can fit and be sorted entirely in memory (which, with 1500 rows, probably can). Actually, the planner (at least in 7.4) isn't smart enough to consider if the sort would fit in memory or not. I'm running a test right now to see if it's actually faster to use an index in this case. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] What to do with 6 disks?
http://stats.distributed.net is setup with the OS, WAL, and temp on a RAID1 and the database on a RAID10. The drives are 200G SATA with a 3ware raid card. I don't think the controller has battery-backed cache, but I'm not sure. In any case, it's almost never disk-bound on the mirror; when it's disk-bound it's usually the RAID10. But this is a read-mostly database. If it was write-heavy, that might not be the case. Also, in general, I see very little disk activity from the OS itself, so I don't think there's a large disadvantage to having it on the same drives as part of your database. I would recommend different filesystems for each, though. (ie: not one giant / partition) On Tue, Apr 19, 2005 at 06:00:42PM -0700, Jeff Frost wrote: Now that we've hashed out which drives are quicker and more money equals faster... Let's say you had a server with 6 separate 15k RPM SCSI disks, what raid option would you use for a standalone postgres server? a) 3xRAID1 - 1 for data, 1 for xlog, 1 for os? b) 1xRAID1 for OS/xlog, 1xRAID5 for data c) 1xRAID10 for OS/xlong/data d) 1xRAID1 for OS, 1xRAID10 for data e) . I was initially leaning towards b, but after talking to Josh a bit, I suspect that with only 4 disks the raid5 might be a performance detriment vs 3 raid 1s or some sort of split raid10 setup. -- Jeff Frost, Owner [EMAIL PROTECTED] Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Slow copy with little CPU/disk usage
A friend of mine has an application where he's copying in 4000 rows at a time into a table that has about 4M rows. Each row is 40-50 bytes. This is taking 25 seconds on a dual PIII-1GHz with 1G of RAM and a 2 disk SATA mirror, running FBSD 4.10-stable. There's one index on the table. What's really odd is that neither the CPU or the disk are being hammered. The box appears to be pretty idle; the postgresql proces is using 4-5% CPU. I seem to recall others running into this before, but I can't remember what the issue was and I can't find it in the archives. This is version 8.0, btw. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Sort and index
On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Actually, the planner (at least in 7.4) isn't smart enough to consider if the sort would fit in memory or not. Really? Have you read cost_sort()? It's certainly possible that the calculation is all wet, but to claim that the issue is not considered is just wrong. To be fair, no, I haven't looked at the code. This is based strictly on anecdotal evidence on a 120M row table. I'm currently running a test to see how an index scan compares to a seqscan. I also got the same results when I added a where clause that would restrict it to about 7% of the table. Actually, after running some tests (below), the plan cost does change when I change sort_mem (it was originally 5). stats=# \d email_contrib Table public.email_contrib Column | Type | Modifiers +-+--- project_id | integer | not null id | integer | not null date | date| not null team_id| integer | work_units | bigint | not null Indexes: email_contrib_pkey primary key, btree (project_id, id, date) email_contrib__pk24 btree (id, date) WHERE (project_id = 24) email_contrib__pk25 btree (id, date) WHERE (project_id = 25) email_contrib__pk8 btree (id, date) WHERE (project_id = 8) email_contrib__project_date btree (project_id, date) Foreign-key constraints: fk_email_contrib__id FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE fk_email_contrib__team_id FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date; QUERY PLAN Sort (cost=3613476.05..3635631.71 rows=8862263 width=24) Sort Key: project_id, id, date - Seq Scan on email_contrib (cost=0.00..2471377.50 rows=8862263 width=24) Filter: (project_id = 8) (4 rows) stats=# explain select * from email_contrib order by project_id, id, date; QUERY PLAN -- Sort (cost=25046060.83..25373484.33 rows=130969400 width=24) Sort Key: project_id, id, date - Seq Scan on email_contrib (cost=0.00..2143954.00 rows=130969400 width=24) (3 rows) stats=# select 8862263::float/130969400; ?column? 0.067687027657 (1 row) stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date; QUERY PLAN - Index Scan using email_contrib_pkey on email_contrib (cost=0.00..6832005.57 rows=8862263 width=24) Index Cond: (project_id = 8) (2 rows) stats=# explain select * from email_contrib order by project_id, id, date; QUERY PLAN - Index Scan using email_contrib_pkey on email_contrib (cost=0.00..100055905.62 rows=130969400 width=24) (1 row) stats=# set enable_seqscan=on; SET stats=# set sort_mem=1000; SET stats=# explain select * from email_contrib order by project_id, id, date; QUERY PLAN -- Sort (cost=28542316.63..28869740.13 rows=130969400 width=24) Sort Key: project_id, id, date - Seq Scan on email_contrib (cost=0.00..2143954.00 rows=130969400 width=24) (3 rows) stats=# -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Slow copy with little CPU/disk usage
No, this is a single process. And there's known issues with context storms on Xeons, so that might be what you're seeing. On Tue, Apr 19, 2005 at 09:37:21PM -0700, Mischa Sandberg wrote: Quoting Tom Lane [EMAIL PROTECTED]: Jim C. Nasby [EMAIL PROTECTED] writes: A friend of mine has an application where he's copying in 4000 rows at a time into a table that has about 4M rows. Each row is 40-50 bytes. This is taking 25 seconds on a dual PIII-1GHz with 1G of RAM and a 2 disk SATA mirror, running FBSD 4.10-stable. There's one index on the table. If there's no hidden costs such as foreign key checks, that does seem pretty dang slow. What's really odd is that neither the CPU or the disk are being hammered. The box appears to be pretty idle; the postgresql proces is using 4-5% CPU. -- This sounds EXACTLY like my problem, if you make the box to a Xeon 2.4GHz, 2GB RAM ... with two SCSI drives (xlog and base); loading 10K rows of about 200 bytes each; takes about 20 secs at the best, and much longer at the worst. By any chance does your friend have several client machines/processes trying to mass-load rows at the same time? Or at least some other processes updating that table in a bulkish way? What I get is low diskio, low cpu, even low context-switches ... and I'm betting he should take a look at pg_locks. For my own problem, I gather that an exclusive lock is necessary while updating indexes and heap, and the multiple processes doing the update can make that pathological. Anyway, have your friend check pg_locks. Dreams come true, not free. -- S.Sondheim, ITW -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Slow copy with little CPU/disk usage
No, he's using either COPY or \COPY. On Wed, Apr 20, 2005 at 12:34:27AM -0400, Greg Stark wrote: Jim C. Nasby [EMAIL PROTECTED] writes: What's really odd is that neither the CPU or the disk are being hammered. The box appears to be pretty idle; the postgresql proces is using 4-5% CPU. Is he committing every row? In that case you would see fairly low i/o bandwidth usage because most of the time is being spent seeking and waiting for rotational latency. -- greg -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Hrm... I was about to suggest that for timing just the query (and not output/data transfer time) using explain analyze, but then I remembered that explain analyze can incur some non-trivial overhead with the timing calls. Is there a way to run the query but have psql ignore the output? If so, you could use \timing. In any case, it's not valid to use pgadmin to time things. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sort and index
I've run some performance tests. The actual test case is at http://stats.distributed.net/~decibel/timing.sql, and the results are at http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing an index scan appears to be about 2x faster than a sequential scan and a sort. Something else of interest is that going from 50M of sort memory to 3G sped the sort up by 900 seconds. If someone wants to record data about the effect of sort_mem on on-disk sorts somewhere (maybe in the docs?) I can run some more tests for that case. In any case, it's clear that the planner is making the wrong choice here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan. On Tue, Apr 19, 2005 at 10:40:41PM -0500, Jim C. Nasby wrote: On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Actually, the planner (at least in 7.4) isn't smart enough to consider if the sort would fit in memory or not. Really? Have you read cost_sort()? It's certainly possible that the calculation is all wet, but to claim that the issue is not considered is just wrong. To be fair, no, I haven't looked at the code. This is based strictly on anecdotal evidence on a 120M row table. I'm currently running a test to see how an index scan compares to a seqscan. I also got the same results when I added a where clause that would restrict it to about 7% of the table. Actually, after running some tests (below), the plan cost does change when I change sort_mem (it was originally 5). stats=# \d email_contrib Table public.email_contrib Column | Type | Modifiers +-+--- project_id | integer | not null id | integer | not null date | date| not null team_id| integer | work_units | bigint | not null Indexes: email_contrib_pkey primary key, btree (project_id, id, date) email_contrib__pk24 btree (id, date) WHERE (project_id = 24) email_contrib__pk25 btree (id, date) WHERE (project_id = 25) email_contrib__pk8 btree (id, date) WHERE (project_id = 8) email_contrib__project_date btree (project_id, date) Foreign-key constraints: fk_email_contrib__id FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE fk_email_contrib__team_id FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date; QUERY PLAN Sort (cost=3613476.05..3635631.71 rows=8862263 width=24) Sort Key: project_id, id, date - Seq Scan on email_contrib (cost=0.00..2471377.50 rows=8862263 width=24) Filter: (project_id = 8) (4 rows) stats=# explain select * from email_contrib order by project_id, id, date; QUERY PLAN -- Sort (cost=25046060.83..25373484.33 rows=130969400 width=24) Sort Key: project_id, id, date - Seq Scan on email_contrib (cost=0.00..2143954.00 rows=130969400 width=24) (3 rows) stats=# select 8862263::float/130969400; ?column? 0.067687027657 (1 row) stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date; QUERY PLAN - Index Scan using email_contrib_pkey on email_contrib (cost=0.00..6832005.57 rows=8862263 width=24) Index Cond: (project_id = 8) (2 rows) stats=# explain select * from email_contrib order by project_id, id, date; QUERY PLAN - Index Scan using email_contrib_pkey on email_contrib (cost=0.00..100055905.62 rows=130969400 width=24) (1 row) stats=# set enable_seqscan=on; SET stats=# set sort_mem=1000; SET stats=# explain select * from email_contrib order by project_id, id, date; QUERY PLAN -- Sort (cost=28542316.63..28869740.13 rows=130969400 width=24) Sort Key: project_id, id, date - Seq Scan on email_contrib (cost=0.00..2143954.00 rows=130969400 width=24) (3 rows) stats=# -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
On Fri, Apr 22, 2005 at 05:04:19PM -0400, Joel Fradkin wrote: And is the system always going to be used by only one user? No we have 400+ concurrent users I guess the question is if this big select is representative of the load you expect in production. Yes we see many time on the two processor box running MSSQL large return sets using 100%cpu for 5-30 seconds. What happens if you execute the query more times? Do the times stay the same as the second run? I will definitely have to pressure testing prior to going live in production. I have not done concurrent tests as honestly single user tests are failing, so multiple user testing is not something I need yet. I would very, very strongly encourage you to run multi-user tests before deciding on mysql. Mysql is nowhere near as capable when it comes to concurrent operations as PostgreSQL is. From what others have said, it doesn't take many concurrent operations for it to just fall over. I can't speak from experience because I avoid mysql like the plague, though. :) Likewise, MSSQL will probably look better single-user than it will multi-user. Unless you're going to only access the database single-user, it's just not a valid test case (and by the way, this is true no matter what database you're looking at. Multiuser access is where you uncover your real bottlenecks.) -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Interesting numbers on a CREATE INDEX
Building a single-column index on a dual opteron with 4G of memory, data on a 4 SATA RAID10; OS, logs and tempsace on a SATA mirror, with sort_mem set to 2.5G, create index is actually CPU bound for large portions of time. The postgresql process and system time are accounting for an entire CPU, and systat (this is a FreeBSD5.2 box) is generally showing 80% utilization on the RAID10 and 40% on the mirror. Not a performance problem, but I thought some people might be interested. The RAID10 is doing about 28-32MB/s, I would think this wouldn't be enough to swamp the CPU but I guess I would be thinking wrong. BTW, the column I'm indexing is a bigint with a low correlation. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Sort and index
On Fri, Apr 22, 2005 at 10:08:06PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: I've run some performance tests. The actual test case is at http://stats.distributed.net/~decibel/timing.sql, and the results are at http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing an index scan appears to be about 2x faster than a sequential scan and a sort. ... for one test case, on one platform, with a pretty strong bias to the fully-cached state since you ran the test multiple times consecutively. The table is 6.5G and the box only has 4G, so I suspect it's not cached. Past experience has generally been that an explicit sort is quicker, so you'll have to pardon me for suspecting that this case may be atypical. Is the table nearly in order by pkey, by any chance? It might be, but there's no way I can check with a multi-key index, right? I'll re-run the tests with a single column index on a column with a correlation of 16% In any case, it's clear that the planner is making the wrong choice here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan. Feel free to propose better cost equations. Where would I look in code to see what's used now? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index bloat problem?
You would be interested in http://archives.postgresql.org/pgsql-hackers/2005-04/msg00565.php On Thu, Apr 21, 2005 at 03:33:05PM -0400, Dave Chapeskie wrote: On Thu, Apr 21, 2005 at 11:28:43AM -0700, Josh Berkus wrote: Michael, Every five minutes, DBCC INDEXDEFRAG will report to the user an estimated percentage completed. DBCC INDEXDEFRAG can be terminated at any point in the process, and *any completed work is retained.* Keen. Sounds like something for our TODO list. -- Josh Berkus Aglio Database Solutions San Francisco See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php for my thoughts on a non-blocking alternative to REINDEX. I got no replies to that message. :-( I've almost got a working solution integrated in the backend that does correct WAL logging and everything. (Writing the code to write and replay WAL logs for complicated operations can be very annoying!) For now I've gone with a syntax of: REINDEX INDEX btree_index_name INCREMENTAL; (For now it's not a proper index AM (accessor method), instead the generic index code knows this is only supported for btrees and directly calls the btree_compress function.) It's not actually a REINDEX per-se in that it doesn't rebuild the whole index. It holds brief exclusive locks on the index while it shuffles items around to pack the leaf pages fuller. There were issues with the code I attached to the above message that have been resolved with the new code. With respect to the numbers provided in that e-mail the new code also recycles more pages than before. Once I've finished it up I'll prepare and post a patch. -- Dave Chapeskie OpenPGP Key ID: 0x3D2B6B34 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sort and index
On Sat, Apr 23, 2005 at 01:00:40AM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Feel free to propose better cost equations. Where would I look in code to see what's used now? All the gold is hidden in src/backend/optimizer/path/costsize.c. regards, tom lane After setting up a second test that orders the table by a highly non-correlated column, I think I've found part of the problem. The estimated index scan cost for (project_id, id, date) is 0.00..100117429.34 while the estimate for work_units is 0.00..103168408.62; almost no difference, even though project_id correlation is .657 while work_units correlation is .116. This is with random_page_cost set to 1.1; if I set it much higher I can't force the index scan (BTW, would it make more sense to set the cost of a disable seqscan to either pages or tuples * disable_cost?), but even with only a 10% overhead on random page fetches it seems logical that the two estimates should be much farther apart. If you look at the results of the initial run (http://stats.distributed.net/~decibel/timing.log), you'll see that the cost of the index scan is way overestimated. Looking at the code, the runcost is calculated as run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost); where csquared is indexCorrelation^2. Why is indexCorrelation squared? The comments say a linear interpolation between min_IO and max_IO is used, but ISTM that if it was linear then instead of csquared, indexCorrelation would just be used. By the way, I'm running a test for ordering by work_units right now, and I included code to allocate and zero 3.3G of memory (out of 4G) between steps to clear the kernel buffers. This brought the seqscan times up to ~6800 seconds, so it seems there was in fact buffering going on in the first test. The second test has been running an index scan for over 14 hours now, so clearly a seqscan+sort is the way to go for a highly uncorrelated index (at least one that won't fit in effective_cache_size). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
BTW, http://stats.distributed.net/~decibel/base.log is a test I ran; select count(*) was ~6x faster than explain analyze select *. On Tue, Apr 26, 2005 at 07:46:52PM -0700, Kevin Brown wrote: Josh Berkus wrote: Jim, Kevin, Hrm... I was about to suggest that for timing just the query (and not output/data transfer time) using explain analyze, but then I remembered that explain analyze can incur some non-trivial overhead with the timing calls. Is there a way to run the query but have psql ignore the output? If so, you could use \timing. Would timing SELECT COUNT(*) FROM (query) work? Just \timing would work fine; PostgreSQL doesn't return anything until it has the whole result set. Hmm...does \timing show the amount of elapsed time between query start and the first results handed to it by the database (even if the database itself has prepared the entire result set for transmission by that time), or between query start and the last result handed to it by the database? Because if it's the latter, then things like server-client network bandwidth are going to affect the results that \timing shows, and it won't necessarily give you a good indicator of how well the database backend is performing. I would expect that timing SELECT COUNT(*) FROM (query) would give you an idea of how the backend is performing, because the amount of result set data that has to go over the wire is trivial. Each is, of course, useful in its own right, and you want to be able to measure both (so, for instance, you can get an idea of just how much your network affects the overall performance of your queries). That's why MSSQL vs. PostgreSQL timing comparisons are deceptive unless you're careful: MSSQL returns the results on block at a time, and reports execution time as the time required to return the *first* block, as opposed to Postgres which reports the time required to return the whole dataset. Interesting. I had no idea MSSQL did that, but I can't exactly say I'm surprised. :-) -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Kernel Resources and max_connections
On Wed, May 04, 2005 at 01:46:34PM +1200, Mark Kirkwood wrote: (This not-very-clear distinction between what is sysctl'abe and what is a kernel tunable is a bit of a downer). I think this is documented somewhere, though I can't think of where right now. Also, note that some sysctl's can only be set in /boot/loader.conf. hw.ata.wc=0 is an example (which you want to set on any box with IDE drives if you want fsync to actually do what it thinks it's doing). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] COPY vs INSERT
On Wed, May 04, 2005 at 10:22:56PM -0400, Tom Lane wrote: Also, there is a whole lot of one-time-per-statement overhead that can be amortized across many rows instead of only one. Stuff like opening the target table, looking up the per-column I/O conversion functions, identifying trigger functions if any, yadda yadda. It's not *that* expensive, but compared to an operation as small as inserting a single row, it's significant. Has thought been given to supporting inserting multiple rows in a single insert? DB2 supported: INSERT INTO table VALUES( (1,2,3), (4,5,6), (7,8,9) ); I'm not sure how standard that is or if other databases support it. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] COPY vs INSERT
On Fri, May 06, 2005 at 09:30:46AM +0200, Dennis Bjorklund wrote: The sql standard include this, except that you can not have the outer (). So it should be INSERT INTO table VALUES (1,2,3), (4,5,6), (7,8,9); Do DB2 demand these extra ()? My recollection is that it does, but it's been a few years... -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL
On Tue, May 10, 2005 at 02:38:41AM +1000, Neil Conway wrote: Jim C. Nasby wrote: Having indexes that people shouldn't be using does add confusion for users, and presents the opportunity for foot-shooting. Emitting a warning/notice on hash-index creation is something I've suggested in the past -- that would be fine with me. Probably not a bad idea. Even if there is some kind of advantage (would they possibly speed up hash joins?) No, hash joins and hash indexes are unrelated. I know they are now, but does that have to be the case? Like I said, I don't know the history, so I don't know why we even have them to begin with. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL
On Tue, May 10, 2005 at 10:14:11AM +1000, Neil Conway wrote: Jim C. Nasby wrote: No, hash joins and hash indexes are unrelated. I know they are now, but does that have to be the case? I mean, the algorithms are fundamentally unrelated. They share a bit of code such as the hash functions themselves, but they are really solving two different problems (disk based indexing with (hopefully) good concurrency and WAL logging vs. in-memory joins via hashing with spill to disk if needed). Well, in a hash-join right now you normally end up feeding at least one side of the join with a seqscan. Wouldn't it speed things up considerably if you could look up hashes in the hash index instead? That way you can eliminate going to the heap for any hashes that match. Of course, if limited tuple visibility info was added to hash indexes (similar to what I think is currently happening to B-tree's), many of the heap scans could be eliminated as well. A similar method could also be used for hash aggregates, assuming they use the same hash. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL
On Tue, May 10, 2005 at 12:10:57AM -0400, Tom Lane wrote: be responsive to your search.) (This also brings up the thought that it might be interesting to support hash buckets smaller than a page ... but I don't know how to make that work in an adaptive fashion.) IIRC, other databases that support hash indexes also allow you to define the bucket size, so it might be a good start to allow for that. DBA's usually have a pretty good idea of what a table will look like in production, so if there's clear documentation on the effect of bucket size a good DBA should be able to make a good decision. What's the challange to making it adaptive, comming up with an algorithm that gives you the optimal bucket size (which I would think there's research on...) or allowing the index to accommodate different bucket sizes existing in the index at once? (Presumably you don't want to re-write the entire index every time it looks like a different bucket size would help.) -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL
On Tue, May 10, 2005 at 11:49:50AM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: What's the challange to making it adaptive, comming up with an algorithm that gives you the optimal bucket size (which I would think there's research on...) or allowing the index to accommodate different bucket sizes existing in the index at once? (Presumably you don't want to re-write the entire index every time it looks like a different bucket size would help.) Exactly. That's (a) expensive and (b) really hard to fit into the WAL paradigm --- I think we could only handle it as a REINDEX. So if it were adaptive at all I think we'd have to support multiple bucket sizes existing simultaneously in the index, and I do not see a good way to do that. I'm not really familiar enough with hash indexes to know if this would work, but if the maximum bucket size was known you could use that to determine a maximum range of buckets to look at. In some cases, that range would include only one bucket, otherwise it would be a set of buckets. If you found a set of buckets, I think you could then just go to the specific one you need. If we assume that the maximum bucket size is one page it becomes more realistic to take an existing large bucket and split it into several smaller ones. This could be done on an update to the index page, or a background process could handle it. In any case, should this go on the TODO list? Allowing a bucket size to be specified at CREATE INDEX doesn't seem out of line though. We'd have to think up a scheme for index-AM-specific index parameters ... -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PGSQL Capacity
http://stats.distributed.net has a table that's 130M rows. http://stats.distributed.net/participant/phistory.php?project_id=8id=39622 is a page that hits that table, and as you can see it's quite fast. This is on a dual opteron with 4G of memory. Unless you're looking for sub millisecond response times, 50k rows is nothing. On Mon, May 09, 2005 at 09:32:18PM +0200, Steinar H. Gunderson wrote: On Mon, May 09, 2005 at 09:22:40PM +0200, [EMAIL PROTECTED] wrote: How can i know a capacity of a pg database ? How many records my table can have ? I saw in a message that someone have 50 000 records it's possible in a table ? (My table have 8 string field (length 32 car)). Thanks for your response. You can have several million records in a table easily -- I've done 10 million personally, but you can find people doing that many records a _day_. Hitting 1 billion records should probably not be impossible either -- it all depends on your hardware, and perhaps more importantly, what kind of queries you're running against it. 5 is absolutely no problem at all. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Configing 8 gig box.
On Mon, May 09, 2005 at 04:55:53PM -0400, Joel Fradkin wrote: Seems to be only using like 360 meg out of 7 gig free (odd thing is I did see some used swap 4k out of 1.9) with a bunch of users (this may be normal, but it is not going overly fast so thought I would ask). This is perfectly normal. Each postgresql backend will only report memory usage roughly equal to shared_buffers plus the size of the code (16M or so?). If it's in the middle of a sort or vacuum, it will use more memory. It's not uncommon for modern OS's to swap out stuff that's not being used. They would rather have the memory available for disk caching, which is normally a good trade-off. For reference, on a 4G box running FreeBSD, there's currently 18M of swap used. Postgresql processes typically show 53M of total VM, with ~22M resident. This is with shared buffers of 2000. Items I modified per commandprompt.coma nd watching this list etc. shared_buffers = 24576 work_mem = 32768 max_fsm_pages = 10 max_fsm_relations = 1500 fsync = true wal_sync_method = open_sync wal_buffers = 2048 checkpoint_segments = 100 effective_cache_size = 524288 default_statistics_target = 250 Any help is appreciated. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 [EMAIL PROTECTED] www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Partitioning / Clustering
On Tue, May 10, 2005 at 07:29:59PM +0200, PFC wrote: I wonder how Oracle does it ;) Oracle *clustering* demands shared storage. So you've shifted your money from big-iron CPUs to big-iron disk arrays. Oracle replication works similar to Slony, though it supports a lot more modes (ie: syncronous). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Partitioning / Clustering
On Tue, May 10, 2005 at 02:55:55PM -0700, Mischa Sandberg wrote: just beyond belief, for both updates and queries. At Acxiom, the datasets are so large, even after partitioning, that they just constantly cycle them through memory, and commands are executes in convoys --- sort of like riding a paternoster. Speaking of which... what's the status of the patch that allows seqscans to piggyback on already running seqscans on the same table? So where should I go look, to see what's been done so far, on a Postgres that can treat another PG server as a new table type? To the best of my knowledge no such work has been done. There is a project (who's name escapes me) that lets you run queries against a remote postgresql server from a postgresql connection to a different server, which could serve as the basis for what you're proposing. BTW, given your experience, you might want to check out Bizgres. (http://pgfoundry.org/projects/bizgres/) I'm sure your insights would be most welcome. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Sort and index
First, I've got some updated numbers up at http://stats.distributed.net/~decibel/ timing2.log shows that the planner actually under-estimates an index scan by several orders of magnitude. Granted, random_page_cost is set to an unrealistic 1.1 (otherwise I can't force the index scan), but that alone isn't enough to explain the difference. On Wed, May 11, 2005 at 05:59:10PM +0200, Manfred Koizar wrote: On Sun, 24 Apr 2005 17:01:46 -0500, Jim C. Nasby [EMAIL PROTECTED] wrote: Feel free to propose better cost equations. I did. More than once. estimated index scan cost for (project_id, id, date) is 0.00..100117429.34 while the estimate for work_units is 0.00..103168408.62; almost no difference, ~3% even though project_id correlation is .657 This is divided by the number of index columns, so the index correlation is estimated to be 0.219. That seems like a pretty bad assumption to make. Is there any eta on having statistics for multi-column indexes? you'll see that the cost of the index scan is way overestimated. Looking at the code, the runcost is calculated as run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost); where csquared is indexCorrelation^2. Why is indexCorrelation squared? The comments say a linear interpolation between min_IO and max_IO is used, but ISTM that if it was linear then instead of csquared, indexCorrelation would just be used. In my tests I got much more plausible results with 1 - (1 - abs(correlation))^2 What's the theory behind that? And I'd still like to know why correlation squared is used. Jim, are you willing to experiment with one or two small patches of mine? What version of Postgres are you running? It depends on the patches, since this is a production machine. Currently it's running 7.4.*mumble*, though I need to upgrade to 8, which I was intending to do via slony. Perhaps the best thing would be for me to get that setup and we can experiment against version 8.0.3. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioning / Clustering
On Wed, May 11, 2005 at 08:57:57AM +0100, David Roussel wrote: For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Neither of which is that horribly impressive. 200 TPM is less than 4TPS. While I haven't run high transaction rate databases under PostgreSQL, I suspect others who have will say that 4TPS isn't that big of a deal. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if Probably didn't scale because they used to use MyISAM. they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the Which means they've got a huge amount of additional code complexity, not to mention how many times you can't post something because 'that cluster is down for maintenance'. application level to avoid hitting the db for rendered pageviews. Memcached is about the only good thing I've seen come out of livejournal. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Except that unlike LJ, google stays up and it's fast. Though granted, LJ is quite a bit faster than it was 6 months ago. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sort and index
On Thu, May 12, 2005 at 08:54:48PM +0200, Manfred Koizar wrote: On Wed, 11 May 2005 16:15:16 -0500, Jim C. Nasby [EMAIL PROTECTED] wrote: This is divided by the number of index columns, so the index correlation is estimated to be 0.219. That seems like a pretty bad assumption to make. Any assumption we make without looking at entire index tuples has to be bad. A new GUC variable secondary_correlation introduced by my patch at least gives you a chance to manually control the effects of additional index columns. It seems it would be much better to gather statistics on any multi-column indexes, but I know that's probably beyond what's reasonable for your patch. Also, my data (http://stats.distributed.net/~decibel) indicates that max_io isn't high enough. Look specifically at timing2.log compared to timing.log. Thouggh, it is possibile that this is because of having random_page_cost set to 1.1 (if I set it much higher I can't force the index scan because the index estimate actually exceeds the cost of the seqscan with the disable cost added in). It depends on the patches, since this is a production machine. Currently it's running 7.4.*mumble*, The patch referenced in http://archives.postgresql.org/pgsql-hackers/2003-08/msg00931.php is still available. It doesn't touch too many places and should be easy to review. I'm using it and its predecessors in production for more than two years. Let me know, if the 74b1 version does not apply cleanly to your source tree. Looks reasonable; I'll give it a shot on 8.0 once I have replication happening. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PERFORM] Tuning planner cost estimates
I've been doing some work to try and identify the actual costs associated with an index scan with some limited sucess. What's been run so far can be seen at http://stats.distributed.net/~decibel. But there's a couple problems. First, I can't use the box exclusively for this testing, which results in some result inconsistencies. Second, I've been using a dataset that I can't make public, which means no one else can run these tests on different hardware. So what I think would be useful is some way to generate a known dataset, and then be able to run tests against it on different machines. In the case of testing index scans, we need to be able to vary correlation, which so far I've been doing by ordering by different columns. I suspect it will also be important to test with different tuple sizes. There's also the question of whether or not the cache should be flushed for each run or not. Does this sound like a good way to determine actual costs for index scans (and hopefully other access methods in the future)? If so, what would be a good way to implement this? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning planner cost estimates
On Thu, May 19, 2005 at 09:31:47AM -0700, Josh Berkus wrote: In the case of testing index scans, we need to be able to vary correlation, which so far I've been doing by ordering by different columns. I suspect it will also be important to test with different tuple sizes. There's also the question of whether or not the cache should be flushed for each run or not. Does this sound like a good way to determine actual costs for index scans (and hopefully other access methods in the future)? If so, what would be a good way to implement this? Well, the problem is that what we need to index scans is a formula, rather than a graph. The usefulness of benchmarking index scan cost is so that we True, but having a graphical representation of how different input variables (such as correlation) affect runtime is a good way to derive such a formula, or at least point you in the right direction. can test our formula for accuracy and precision. However, such a formula *does* need to take into account concurrent activity, updates, etc ... that is, it needs to approximately estimate the relative cost on a live database, not a test one. Well, that raises an interesting issue, because AFAIK none of the cost estimate functions currently do that. Heck, AFAIK even the piggyback seqscan code doesn't take other seqscans into account. Another issue is: what state should the buffers/disk cache be in? In the thread that kicked all this off Tom noted that my results were skewed because of caching, so I changed my tests to flush the disk cache as effectively as I could (by running a program that would consume enough available memory to just start the box swapping), but I don't think that's necessarily realistic. Though at least it should preclude the need to run tests multiple times on an otherwise idle box in order to 'pre-seed' the cache (not that that's any more realistic). If you don't use one of these techniques you end up with results that depend on what test was run before the current one... This is also going to be a moving target because Tom's in-memory-bitmapping changes relative cost equations. I thought those all had seperate costing functions...? In any case, if we have a cost estimation tool it will make it much easier to derive cost estimation functions. I think a first step would be, in fact, to develop a tool that allows us to put EXPLAIN ANALYZE results in a database table. Without that, there is no possibility of statistical-scale analysis. Rather than trying to parse all possible output, ISTM it would be much better if there was a way to access the info directly. Would it be difficult to have an option that produces output that is a set of different fields? I'm thinking something like: Level (basically how far something's indented) Parent node (what node a child node is feeding) node_id (some kind of identifier for each step) operation (estimate|actual)_(startup|total|rows|width|loops) other (something to hold index condition, filter, etc) But ultimately, I'm not sure if this is really required or not, because I don't see that we need to use explain when running queries. In fact, it's possibly desireable that we don't, because of the overhead it incurs. We would want to log an explain (maybe analyze) just to make sure we knew what the optimizer was doing, but I think we shouldn't need the info to produce cost estimates. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Tuning planner cost estimates
On Fri, May 20, 2005 at 04:47:38PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: On Thu, May 19, 2005 at 09:31:47AM -0700, Josh Berkus wrote: can test our formula for accuracy and precision. However, such a formula *does* need to take into account concurrent activity, updates, etc ... that is, it needs to approximately estimate the relative cost on a live database, not a test one. Well, that raises an interesting issue, because AFAIK none of the cost estimate functions currently do that. I'm unconvinced that it'd be a good idea, either. People already complain that the planner's choices change when they ANALYZE; if the current load factor or something like that were to be taken into account then you'd *really* have a problem with irreproducible behavior. It might make sense to have something a bit more static, perhaps a GUC variable that says plan on the assumption that there's X amount of concurrent activity. I'm not sure what scale to measure X on, nor exactly how this would factor into the estimates anyway --- but at least this approach would maintain reproducibility of behavior. Or allowing the load of the machine to affect query plans dynamically is something that could be disabled by default, so presumably if you turn it on it means you know what you're doing. Of course this is all academic until we have a means to actually measure how much system load affects the different things we estimate cost for, and I don't see that happening until we have a system for measuring how changing different input variables affects costs. Another issue is: what state should the buffers/disk cache be in? The current cost models are all based on the assumption that every query starts from ground zero: nothing in cache. Which is pretty bogus in most real-world scenarios. We need to think about ways to tune that assumption, too. Maybe this is actually the same discussion, because certainly one of the main impacts of a concurrent environment is on what you can expect to find in cache. Well, load doesn't directly effect cache efficiency; it's really a question of the ratios of how often different things in the database are accessed. If you wanted to get a crude idea of how likely pages from some relation are to be in cache, you could take periodic snapshots of io stats and see what percentage of the IO done in a given time period was on the relation you're interested in as compared to the rest of the database. But I think this is probably still a 2nd order effect. In terms of a testing system, here's what I'm thinking of. For each cost estimate, there will be a number of input variables we want to vary, and then check to see how changes in them effect run time. Using index scan as a simple example, 1st order variables will likely be index and table size (especially in relation to cache size), and correlation. So, we need some kind of a test harness that can vary these variables (prefferably one at a time), and run a test case after each change. It would then need to store the timing info, possibly along with other information (such as explain output). If I'm the one to write this it'll end up in perl, since that's the only language I know that would be able to accomplish this. DBT seems to be a reasonable test database to do this testing with, especially since it would provide a ready means to provide external load. Does this sound like a reasonable approach? Also, how important do people think it is to use explain analyze output instead of just doing SELECT count(*) FROM (query you actually want to test)? (The select count(*) wrapper is just a means to throw away the results since we don't really want to worry about data transfer times, etc). The testing I've done (http://stats.distributed.net/~decibel/base.log) shows explain analyze to be almost 5x slower than select count(*), so it seems a big gain if we can avoid that. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tuning planner cost estimates
On Fri, May 20, 2005 at 03:23:16PM -0700, Josh Berkus wrote: Jim, Well, that raises an interesting issue, because AFAIK none of the cost estimate functions currently do that. Heck, AFAIK even the piggyback seqscan code doesn't take other seqscans into account. Sure. But you're striving for greater accuracy, no? Actually, all that's really needed in the way of concurrent activity is a calculated factor that lets us know how likely a particular object is to be cached, either in the fs cache or the pg cache (with different factors for each presumably) based on history. Right now, that's based on estimated_cache_size, which is rather innacurate: a table which is queried once a month has the exact same cost factors as one which is queried every 2.1 seconds. This would mean an extra column in pg_stats I suppose. True, though that's a somewhat different issue that what the load on the box is (see the reply I just posted). Load on the box (particuarly IO load) will also play a factor for things; for example, it probably means seqscans end up costing a lot more than random IO does, because the disk heads are being sent all over the place anyway. But ultimately, I'm not sure if this is really required or not, because I don't see that we need to use explain when running queries. In fact, it's possibly desireable that we don't, because of the overhead it incurs. We would want to log an explain (maybe analyze) just to make sure we knew what the optimizer was doing, but I think we shouldn't need the info to produce cost estimates. Well, the problem is that you need to know how much time the index scan took vs. other query steps. I don't see a way to do this other than an anayze. True, but that can be done by a seperate seqscan step. I would argue that doing it that way is actually more accurate, because the overhead of explain analyze is huge and tends to swamp other factors out. As I mentioned in my other email, my tests show explain analyze select * from table is 5x slower than select count(*) from table. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Select performance vs. mssql
On Wed, May 25, 2005 at 09:29:36AM +0800, Christopher Kings-Lynne wrote: --MSSQL's ability to hit the index only and not having to go to the table itself results in a _big_ performance/efficiency gain. If someone who's in development wants to pass this along, it would be a nice addition to PostgreSQL sometime in the future. I'd suspect that as well as making one query faster, it would make everything else faster/more scalable as the server load is so much less. This is well-known and many databases do it. However, due to MVCC considerations in PostgreSQL, it's not feasible for us to implement it... Wasn't there a plan to store some visibility info in indexes? IIRC the idea was that a bit would be set in the index tuple indicating that all transactions that wouldn't be able to see that index value were complete, meaning that there was no reason to hit the heap for that tuple. I looked on the TODO but didn't see this, maybe it fell through the cracks? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster