Re: [PERFORM] [SQL] two queryes in a single tablescan
Hi, Stefano, Stefano Dal Pra [EMAIL PROTECTED] wrote: suppose you have a large table tab and two (or more) queryes like this: SELECT count(*),A FROM tab WHERE C GROUP BY A; SELECT count(*),B FROM tab WHERE C GROUP BY B; is there any way to get both results in a single query, eventually through stored procedure? The retrieved [count(*),A] ; [count(*),B)] data couldnt fit on a single table, of course. The main goal would be to get multiple results while scanning the table[s] once only thus getting results in a faster way. PostgreSQL 8.3 contains great improvements in this area, you can simply start the selects from concurrent connections, and the backend will synchronize the scans. Regards, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] PostgreSQL underestimates sorting
Hi, PostgreSQL 8.1 (and, back then, 7.4) have the tendency to underestimate the costs of sort operations, compared to index scans. The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and then starts spilling out more Gigs of temporary data to the disk. So the execution gets - in the end - much slower compared to an index scan, and wastes lots of disk space. We did not manage to tune the config values appropriately, at least not without causing other query plans to suffer badly. Are there some nice ideas how to shift the planners preferences slightly towards index scans, without affecting other queries? There's one thing that most of those queries have in common: They include TOAST data (large strings, PostGIS geometries etc.), and I remember that there are known problems with estimating the TOAST costs. This may be part of the problem, or may be irrelevant. Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL underestimates sorting
Hi, Steinar, Steinar H. Gunderson wrote: On Wed, Nov 22, 2006 at 11:17:23AM +0100, Markus Schaber wrote: The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and then starts spilling out more Gigs of temporary data to the disk. How much RAM is in the server? Remember that sort_mem is _per sort_, so if you have multiple sorts, it might allocate several multiples of the amount you set up. That one machine has 16 Gigs of ram, and about 10 Gigs tend to be free / part of the Linux blocklayer cache. The temporary data is not swapping, it's the Postgres on-disk sort algorithm. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL underestimates sorting
Hi, Frank, Frank Wiles wrote: The temporary data is not swapping, it's the Postgres on-disk sort algorithm. Are you actually running a query where you have a GB of data you need to sort? If not I fear you may be causing the system to swap by setting it this high. Yes, the table itself is about 40 Gigs in size, thus much larger than the memory. The machine has 16 Gigs of ram, and 10-12 Gigs are available for PostgreSQL + Disk Cache. There's no swapping, only 23 MB of swap are used (40 Gigs are available). That's one example configuration, there are others on different machines where it turns out that forcing index usage leads to faster queries, and less overall ressource consumption. (Or, at least, faster delivery of the first part of the result so the application can begin to process it asynchroneously). Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow SELECT on three or more clients
Hi, Amir, AMIR FRANCO D. JOVEN wrote: My current project uses PostgreSQL 7.3.4. By all means, please upgrade. The newest 7.3 series version is 7.3.16, which fixes lots of critical bugs, and can be used as a drop-in replacement for 7.3.4 (see Release Notes at http://www.postgresql.org/docs/7.3/interactive/release.html ) The newest stable release is 8.1.5, and 8.2 is just on the roads... I have a table with 94 fields and a select with only one resultset in only one client consumes about 0.86 seconds. with only on resultset? You mean with only one returned row, I presume. Each SELECT has exactly one resultset, which can contain zero to many rows. Please check the following: - Did you create the appropriate indices? - Version 7.3.X may suffer from index bloat, so REINDEX might help. - Did you VACUUM and ANALYZE the table properly? - Is your free space map setting, the statistics targets, and other config options tuned to fit your environment? - Maybe a VACUUM FULL or a CLUSTER command may help you. for a single select with one field in one resultset, is 0.86 seconds normal? That depends on the circumstances. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Easy read-heavy benchmark kicking around?
Hi, Brian, Brian Hurt wrote: So the question is: is there an easy to install and run, read-heavy benchmark out there that I can wave at them to get them to fix the problem? For sequential read performance, use dd. Most variants of dd I've seen output some timing information, and if not, do a time dd if=/your/device of=/dev/null bs=1M on the partition. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best COPY Performance
Hi, Worky, Worky Workerson wrote: $ psql -c COPY my_table TO STDOUT my_data $ ls my_data 2018792 edgescape_pg_load $ time cat my_data | psql -c COPY mytable FROM STDIN real5m43.194s user0m35.412s sys 0m9.567s That's via PSQL, and you get about 5 MB/Sec. On a table with no indices, triggers and contstraints, we managed to COPY about 7-8 megabytes/second with psql over our 100 MBit network, so here the network was the bottleneck. hmm, this makes me think that either my PG config is really lacking, or that the SAN is badly misconfigured, as I would expect it to outperform a 100Mb network. As it is, with a straight pipe to psql COPY, I'm only working with a little over 5.5 MB/s. Could this be due to the primary key index updates? Yes, index updates cause both CPU load, and random disk access (which is slow by nature). HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best COPY Performance
Hi, Craig, Craig A. James wrote: Would dblink() help in any way? It might if perl wasn't so damned good at this. ;-) You know that you can use Perl inside PostgreS via plperl? HTH, Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index on two columns not used
Hi, Tom, Tom Lane wrote: You're wrong. An UPDATE always writes a new version of the row (if it overwrote the row in-place, it wouldn't be rollback-able). The new version has a different TID and therefore the index entry must change. To support MVCC, our approach is to always insert a new index entry pointing at the new TID --- the old one remains in place so that the old version can still be found by transactions that need it. OK, good you corrected me. I had the weird impression that both row versions have the same tuple ID (as they are different versions of the same tuple), and so an index change is not necessary when both versions fit on the same page. Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Best COPY Performance
Hi, Worky, Worky Workerson wrote: I am currently getting between 10K and 15K inserts/second. I ran trivial little insert into a table with a single integer row and came close to 250K inserts/second using psql's \copy, so I'm thinking that my code could be optimized a bit more, but wanted to check around to see if that was the case. Could you COPY one of your tables out to disk via psql, and then COPY it back into the database, to reproduce this measurement with your real data? Also, how much is the disk load, and CPU usage? As long as psql is factor 20 better than your perl script, I think that the perl interface is what should be optimized. On a table with no indices, triggers and contstraints, we managed to COPY about 7-8 megabytes/second with psql over our 100 MBit network, so here the network was the bottleneck. You should think about making your perl program writing the COPY statement as text, and piping it into psql. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index on two columns not used
Hi, Peter, Péter Kovács wrote: Sorry for the amateurish question, but what are heap tuples? Also, my understanding is that the following statement applies only for composite indexes: PostgreSQL can't use the values stored in the index to check the join condition. I assume that PostgreSQL will be able to use single-column-indexes for join conditions. Is this correct? Both questions are tightly related: First, the heap is the part of the table where the actual tuples are stored. PostgreSQL uses an MVCC system, that means that multiple versions (with their transaction information) of a single row can coexist in the heap. This allows for higher concurrency in the backend. Now, the index basically stores pointers like pages 23 and 42 contain rows with value 'foo', but version information is not replicated to the index pages, this keeps the index' size requirements low. Additionally, in most UPDATE cases, the new row version will fit into the same page as the old version. In this case, the index does not have to be changed, which is an additional speed improvement. But when accessing the data via the index, it can only give a preselection of pages that contain interesting data, and PostgreSQL has to look into the actual heap pages to check whether there really are row versions that are visible in the current transaction. A further problem is that some GIST index types are lossy, that means the index does not retain the full information, but only an approximation, for efficiency reasons. A prominent example are the PostGIS geometry indices, they only store the bounding box (4 float values) instead of the whole geometry (may be millions of double precision coordinates). So it may be necessary to re-check the condition with the real data, using the lossy index for preselection. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index on two columns not used
Hi, Alvaro, Alvaro Herrera wrote: Additionally, in most UPDATE cases, the new row version will fit into the same page as the old version. In this case, the index does not have to be changed, which is an additional speed improvement. Actually, when the UPDATE puts a new row version in the same heap page, the index must be updated anyway. AFAICS only, when the index covers (directly or via function) a column that's actually changed. Changing columns the index does not depend on should not need any write access to that index. Correct me if I'm wrong. Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUM FULL ANALYZE on 8.1.4 is slower then on 8.0
Hi, Andrzej, Andrzej Zawadzki wrote: After upgrade PostgreSQL from 8.0 to 8.1.4 a VACUUM FULL ANALYZE process is much slower, from logs: 8.0 [13666][postgres][2006-10-06 01:13:38 CEST][1340121452] LOG: statement: VACUUM FULL ANALYZE; [13666][postgres][2006-10-06 01:39:15 CEST][0] LOG: duration: 1536862.425 ms 8.1 [4535][postgres][2006-10-10 01:08:51 CEST][6144112] LOG: statement: VACUUM FULL ANALYZE; [4535][postgres][2006-10-10 02:04:23 CEST][0] LOG: duration: 3332128.332 ms Databases are equal. Are they on equal disks? And in the same areas of those disks? Some current disks tend to drop down their speed at the end of the LBA address space drastically. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Scrub one large table against another (vmstat output)
Hi, Brendan, Brendan Curran wrote: What prevents you from using an aggregate function? I guess I could actually obtain the results in an aggregate function and use those to maintain a summary table. There is a web view that requires 'as accurate as possible' numbers to be queried per group (all 40 groups are displayed on the same page) and so constant aggregates over the entire table would be a nightmare. That sounds just like a case for GROUP BY and a materialized view. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
Hi, Alex, Alex Stapleton wrote: explain analyze is more helpful because it prints the times. You can always use the \timing flag in psql ;) Have you ever tried EXPLAIN ANALYZE? \timing gives you one total timing, but EXPLAIN ANALYZE gives you timings for sub-plans, including real row counts etc. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Poor performance on very simple query ?
Hi, Tobias, Tobias Brox wrote: How can you have a default value on a primary key? Just declare the column with both a default value and a primary key constraint. It makes sense when the default value is calculated instead of a constant, by calling a function that generates the key. In fact, the SERIAL type does nothing but defining a sequence, and then use nextval('sequencename') as default. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performace Optimization for Dummies
Hi, Carlo, Carlo Stonebanks wrote: Did you think about putting the whole data into PostgreSQL using COPY in a nearly unprocessed manner, index it properly, and then use SQL and stored functions to transform the data inside the database to the desired result? This is actually what we are doing. The slowness is on the row-by-row transformation. Every row reqauires that all the inserts and updates of the pvious row be committed - that's why we have problems figuring out how to use this using SQL set logic. Maybe group by, order by, distinct on and hand-written functions and aggregates (like first() or best()) may help. You could combine all relevant columns into an user-defined compund type, then group by entity, and have a self-defined aggregate generate the accumulated tuple for each entity. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performace Optimization for Dummies
Hi, Carlo, Carlo Stonebanks wrote: Trying to achieve a high level of data quality in one large project is not often possible. Focus on the most critical areas of checking and get that working first with acceptable performance, then layer on additional checks while tuning. The complexity of the load programs you have also means they are susceptible to introducing data quality problems rather than removing them, so an incremental approach will also aid debugging of the load suite. I couldn't agree more. I still think that using a PL in the backend might be more performant than having an external client, alone being the SPI interface more efficient compared to the network serialization for external applications. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Unsubscribe
Hi, Uwcssa, uwcssa wrote: Please unsubscribe me! Thank you! Sorry, but we (the list members) are unable do that, we have no adminstrative power on the list. :-( Also, it would be better to have a message foot saying how to unsubscribe. List unsubscribe information is contained in the Headers of every mail that's sent over the list: List-Archive: http://archives.postgresql.org/pgsql-performance List-Help: mailto:[EMAIL PROTECTED] List-ID: pgsql-performance.postgresql.org List-Owner: mailto:[EMAIL PROTECTED] List-Post: mailto:pgsql-performance@postgresql.org List-Subscribe: mailto:[EMAIL PROTECTED] List-Unsubscribe: mailto:[EMAIL PROTECTED] Additionally, there is a link to the unsubscribe web form at the list archive page: http://archives.postgresql.org/pgsql-performance/ HTH, Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] how to optimize postgres 8.1
Hi, Gurkan, [EMAIL PROTECTED] wrote: If I run the query below with informix, it gives cost=107. with postgres with additional indexes it gives cost=407, before the additional indexes it was even much slower What are your real timing measurements, in a produciton-like setup in a production-like load? That's the only kind of benchmarking that will give you an useful comparison. You cannot compare anything else. Especially, you cannot compare those artificial cost estimator values, as they are likely to be defined differently for PostgreSQL and Informix. For PostgreSQL, they are relative values to the cost of reading a page as part of a sequential scan. And those values are tunable - fiddle with the random_page_cost and cpu_*_cost values in the postgresql.conf, and you will see very different values compared to the 407 you see now, even if the query plan stays equal. Do you look up the definition of cost for Informix? Have you made shure that they're comparable? HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performace Optimization for Dummies
Hi, Carlo, Carlo Stonebanks wrote: We urgently need a major performance improvement. Did you think about putting the whole data into PostgreSQL using COPY in a nearly unprocessed manner, index it properly, and then use SQL and stored functions to transform the data inside the database to the desired result? We're using this way for some 3rd-party databases we have to process in-house. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performace Optimization for Dummies
Hi, Carlo, Carlo Stonebanks wrote: From what I can see, autovacuum is hitting the db's in question about once every five minutes. Does this imply an ANALYZE is being done automatically that would meet the requirements we are talking about here? Is there any benefit ot explicitly performing an ANALYZE? Autovacuum looks at the modification statistics (they count how much modifications happened on the table), and decides whether it's time to VACUUM (reclaim empty space) and ANALYZE (update column value distributions) the table. The exact thresholds for Autovacuum to kick in are configurable, see the docs. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Decreasing BLKSZ
Hi, Marc, Marc Morin wrote: I wonder whether there is a way to use table partitioning to make the insert pattern more localized? We'd need to know a lot more about your insertion patterns to guess how, though. We're doing partitioning as well. And is constraint exclusion set up properly, and have you verified that it works? HTH, Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Multi-processor question
Hi, Kjell Tore, Kjell Tore Fossbakk wrote: I got two AMD Opteron 885 processors (2.6ghz) and 8 gig of memory. Harddrives are 4 scsi disks in 10 raid. I'm running gentoo, and the kernel finds and uses all of my 2 (4) cpu's. How can i actually verify that my PostgreSQL (or that my OS) actually gives each new query a fresh idle CPU) all of my CPU's? On unixoid systems, use top to display individual CPU loads, they should be balanced, if you issue the same type of queries in parallel.[1] On Windows, the Task Manager should be able to display individual CPU load graphs. Note, however, that if you issue different kinds of queries in parallel, it is well possible that some CPUs have 100% load (on CPU-intensive queries), and the other CPUs have low load (processing the other, I/O intensive queries. Btw, if your queries need a long time, but CPU load is low, than it is very likely that you're I/O bound, either at the disks in the Server, or at the network connections to the clients. HTH, Markus [1] You might need some command line options or keys, e. G. on some debian boxes over here, one has to press 1 to switch a running top to multi-cpu mode, pressing 1 again switches back to accumulation. -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Decreasing BLKSZ
Hi, Marc, Marc Morin wrote: The problem is, the insert pattern has low correlation with the (key,logtime) index. In this case, would need 1M blocks in my shared_buffer space to prevent a read-modify-write type of pattern happening during the inserts (given a large enough database). Would it be possible to change the primary key to (logtime,key)? This could help keeping the working window small. Secondly, the real working set is smaller, as the rows are all inserted at the end of the table, filling each page until it's full, so only the last pages are accessed. There's no relation between the index order, and the order of data on disk, unless you CLUSTER. Any comment on other affects or gotchas with lowering the size of BLKSZ? Currently, our database is thrashing its cache of blocks we we're getting only ~100 inserts/second, every insert results in a evict-read-modify operation. I'm not shure that's the correct diagnosis. Do you have one transaction per insert? Every transaction means a forced sync to the disk, so you won't get more than about 100-200 commits per second, depending on your actual disk rotation speed. To improve concurrency of the numer of inserters running in parallel, try to tweak the config variables commit_delay and commit_sibling, so you get a higher overall throughput at cost of an increased delay per connection, and increase the number of inserters. Using sensible tweaking, the throughput should scale nearly linear with the number of backens. :-) If feasible for your application, you can also bundle several log entries into a single transaction. If you're CPU bound, you can use COPY instead of INSERT or (if you can wait for 8.2) the new multi-row INSERT to further improve performance, but I doubt that you're CPU bound. The only way to really get over the sync limit is to have (at least) the WAL on a battery backed ram / SSD media that has no spinning disk physical limit, or abandon crash safety by turning fsync off. Thanks, Markus. -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Confusion and Questions about blocks read
Hi, Tom, Tom Lane wrote: Alex Turner [EMAIL PROTECTED] writes: Home come the query statistics showed that 229066 blocks where read given that all the blocks in all the tables put together only total 122968? You forgot to count the indexes. Also, the use of indexscans in the mergejoins probably causes multiple re-reads of some table blocks, depending on just what the physical ordering of the rows is. As far as I understand, Index Bitmap Scans improve this behaviour, by ensuring that every table block is read only once. Btw, would it be feasible to enhance normal index scans by looking at all rows in the current table block whether they meet the query criteria, fetch them all, and blacklist the block for further revisiting during the same index scan? I think that, for non-sorted cases, this could improve index scans a little, but I don't know whether it's worth the effort, given that bitmap indidex scans exist. Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Large tables (was: RAID 0 not as fast as
Hi, Guy, Guy Thornley wrote: Of course you could argue the OS should be able to detect this, and prevent it occuring anyway. I don't know anything about linux's behaviour in this area. Yes, one can argue that way. But a generic Algorithm in the OS can never be as smart as the application which has more informations about semantics and algorithms. Everything else would need a crystal ball device :-) HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Large tables (was: RAID 0 not as fast as
Hi, Luke, Luke Lonergan wrote: I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly meant for this purpose? This is a good idea - I wasn't aware that this was possible. This possibility was the reason for me to propose it. :-) We'll do some testing and see if it works as advertised on Linux and Solaris. Fine, I'm looking forward to the results. According to my small test, it works at least on linux 2.6.17.4. Btw, posix_fadvise() could even give a small improvement for multi-threaded backends, given that the I/O subsystem is smart enough to cope intelligently to cope with large bunches of outstanding requests. HTH, Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Large tables (was: RAID 0 not as fast as
Hi, Bucky, Bucky Jordan wrote: We can implement multiple scanners (already present in MPP), or we could implement AIO and fire off a number of simultaneous I/O requests for fulfillment. So this might be a dumb question, but the above statements apply to the cluster (e.g. postmaster) as a whole, not per postgres process/transaction correct? So each transaction is blocked waiting for the main postmaster to retrieve the data in the order it was requested (i.e. not multiple scanners/aio)? No, that's a wrong assumption. It applies per active backend. When connecting, the Postmaster forks a new backend process. Each backend process has its own scanner and executor. The main postmaster is only for coordination (forking, config reload etc.), all the work is done in the forked per-connection backends. Furthermore, the PostgreSQL MVCC system ensures that readers are neither ever blocked nor blocking other backends. Writers can block each other due to the ACID transaction semantics, however the MVCC limits that to a minimum. In this case, the only way to take full advantage of larger hardware using normal postgres would be to run multiple instances? (Which might not be a bad idea since it would set your application up to be able to deal with databases distributed on multiple servers...) Typical OLTP applications (Web UIs, Booking systems, etc.) have multiple connections, and those run fully parallel. So if your application is of this type, it will take full advantage of larger hardware. In the list archive, you should find some links to benchmarks that prove this statement, PostgreSQL scales linearly, up to 8 CPUs and 32 hyperthreads in this benchmarks. Our discussion is about some different type of application, where you have a single application issuing a single query at a time dealing with a large amount (several gigs up to teras) of data. Now, when such a query is generating sequential disk access, the I/O scheduler of the underlying OS can easily recognize that pattern, and prefetch the data, thus giving the full speed benefit of the underlying RAID. The discussed problem arises when such large queries generate random (non-continous) disk access (e. G. index scans). Here, the underlying RAID cannot effectively prefetch data as it does not know what the application will need next. This effectively limits the speed to that of a single disk, regardless of the details of the underlying RAID, as it can only process a request at a time, and has to wait for the application for the next one. Now, Bizgres MPP goes the way of having multiple threads per backend, each one processing a fraction of the data. So there are always several outstanding read requests that can be scheduled to the disks. My proposal was to use posix_fadvise() in the single-threaded scanner, so it can tell the OS I will need those blocks in the near future. So the OS can pre-fetch those blocks into the cache, while PostgreSQL still processes the previous block of data. Another proposal would be to use so-called asynchroneous I/O. This is definitely an interesting and promising idea, but needs much more changes to the code, compared to posix_fadvise(). I hope that this lengthy mail is enlightening, if not, don't hesitate to ask. Thanks for your patience, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Large tables (was: RAID 0 not as fast as
Hi, Luke, Luke Lonergan wrote: Since PG's heap scan is single threaded, the seek rate is equivalent to a single disk (even though RAID arrays may have many spindles), the typical random seek rates are around 100-200 seeks per second from within the backend. That means that as sequential scan performance increases, such as happens when using large RAID arrays, the random_page_cost will range from 50 to 300 linearly as the size of the RAID array increases. Do you think that adding some posix_fadvise() calls to the backend to pre-fetch some blocks into the OS cache asynchroneously could improve that situation? I could imagine that e. G. index bitmap scans could profit in the heap fetching stage by fadvise()ing the next few blocks. Maybe asynchroneous I/O could be used for the same benefit, but posix_fadvise is less() intrusive, and can easily be #define'd out on platforms that don't support it. Combine this with the Linux Kernel I/O Scheduler patches (readahead improvements) that were discussed here in summer... Regards, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Large tables (was: RAID 0 not as fast as
Hi, Luke, Luke Lonergan wrote: Do you think that adding some posix_fadvise() calls to the backend to pre-fetch some blocks into the OS cache asynchroneously could improve that situation? Nope - this requires true multi-threading of the I/O, there need to be multiple seek operations running simultaneously. The current executor blocks on each page request, waiting for the I/O to happen before requesting the next page. The OS can't predict what random page is to be requested next. I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly meant for this purpose? My idea was that the executor could posix_fadvise() the blocks it will need in the near future, and later, when it actually issues the blocking read, the block is there already. This could even give speedups in the single-spindle case, as the I/O scheduler could already fetch the next blocks while the executor processes the current one. But there must be some details in the executor that prevent this. We can implement multiple scanners (already present in MPP), or we could implement AIO and fire off a number of simultaneous I/O requests for fulfillment. AIO is much more intrusive to implement, so I'd preferrably look whether posix_fadvise() could improve the situation. Thanks, Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] High CPU Load
Hi, Jerome, Jérôme BENOIS wrote: Now i Have 335 concurrent connections, i decreased work_mem parameter to 32768 and disabled Hyper Threading in BIOS. But my CPU load is still very important. What are your settings for commit_siblings and commit_delay? Tomorrow morning i plan to add 2Giga RAM ... But I don't understand why my database server worked good with previous version of postgres and same queries ... I don't think any more that it's the RAM, as you told you don't go into swap. It has to be something else. Could you try logging which are the problematic queries, maybe they have bad plans for whatever reason. I used already database pool on my application and when i decrease number of connection my application is more slow ;-( Could you just make sure that the pool really uses persistent connections, and is not broken or misconfigured, always reconnect? HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High CPU Load
Hi, Jerome, Jérôme BENOIS wrote: Now i Have 335 concurrent connections, i decreased work_mem parameter to 32768 and disabled Hyper Threading in BIOS. But my CPU load is still very important. What are your settings for commit_siblings and commit_delay? It default : #commit_delay = 01 # range 0-10, inmicroseconds #commit_siblings = 5 # range 1-1000 You should uncomment them, and play with different settings. I'd try a commit_delay of 100, and commit_siblings of 5 to start with. I plan to return to previous version : 7.4.6 in and i will reinstall all in a dedicated server in order to reproduce and solve the problem. You should use at least 7.4.13 as it fixes some critical buts that were in 7.4.6. They use the same on-disk format and query planner logic, so they should not have any difference. I don't have much more ideas what the problem could be. Can you try to do some profiling (e. G. with statement logging) to see what specific statements are the one that cause high cpu load? Are there other differences (besides the PostgreSQL version) between the two installations? (Kernel, libraries, other software...) HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Poor performance on seq scan
Hi, Piotr, Piotr Kołaczkowski wrote: Why match rows from the heap if ALL required data are in the index itself? Why look at the heap at all? Because the index does not contain any transaction informations, so it has to look to the heap to find out which of the rows are current. This is one of the more debated points in the PostgreSQL way of MVCC implementation. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] High CPU Load
Hi, Jérôme, Jérôme BENOIS wrote: max_connections = 512 Do you really have that much concurrent connections? Then you should think about getting a larger machine, probably. You will definitely want to play with commit_delay and commit_siblings settings in that case, especially if you have write access. work_mem = 65536 effective_cache_size = 131072 hmm, 131072*8*1024 + 512*65536*1024 = 35433480192 - thats 33 Gig of Memory you assume here, not counting OS usage, and the fact that certain queries can use up a multiple of work_mem. Even on amachine that big, I'd be inclined to dedicate more memory to caching, and less to the backends, unless specific needs dictate it. You could try to use sqlrelay or pgpool to cut down the number of backends you need. My Server is Dual Xeon 3.06GHz For xeons, there were rumours about context switch storms which kill performance. with 2 Go RAM and good SCSI disks. For 2 Gigs of ram, you should cut down the number of concurrent backends. Does your machine go into swap? Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Vacuums on large busy databases
Hi, Francisco, Francisco Reyes wrote: I am looking to either improve the time of the vacuum or decrease it's impact on the loads. Are the variables: #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits Just to avoid a silly mistake: You pasted those settings with # sign, that means that PostgreSQL does treat them as comments, and uses the defaults instead. You should make shure that you use real settings in your config. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] sql-bench
Hi, Yoav X, yoav x wrote: You can use the test with InnoDB by giving the --create-options=engine=innodb option in the command line. Even with InnoDB, in some specific tests PG looks very bad compared to InnoDB. As far as I've seen, they include the CREATE TABLE command in their benchmarks. Realistic in-production workloads don't have so much create table commands, I think. Wondering, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] database bloat,non removovable rows, slow query etc...
Hi, Matteo, Matteo Sgalaberni wrote: A my collegue JDBC application that stay in idle intransaction 24h/24h Just a little note: For most applications, this can be fixed updating the JDBC driver. Old versions had the behaviour of auto-opening a new backend transaction on commit/rollback, whereas new versions delay that until the first statement in the new transaction is sent. This won't fix applications that do a select and then sit idle for days before committing/rolling back, however. Those should be fixed or use autocommit mode. Good to know this...but why this behaviour? it'is lovely...:) Tom , can you explain why?... It is because the transaction IDs are global per cluster. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] database bloat,non removovable rows, slow query etc...
Hi, Tom and Matteo, Tom Lane wrote: Matteo Sgalaberni [EMAIL PROTECTED] writes: 22 daemons that have a persistent connection to this database(all connection are in idle(no transaction opened). You may think that, but you are wrong. INFO: cliente: found 0 removable, 29931 nonremovable row versions in 559 pages DETAIL: 29398 dead row versions cannot be removed yet. The only way the above can happen is if there are some fairly old open transactions. Looking in pg_stat_activity might help you identify the culprit(s). Another possibility might be an outstanding two-phase-commit transaction. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] perf pb solved only after pg_dump and restore
Hi, Guillaume Guillaume Cottenceau wrote: We noticed a slowdown on our application while traffic was kinda heavy. The logics after reading the docs commanded us to trim the enlarged tables, run VACUUM ANALYZE and then expect fast performance again; but it wasn't the case[1]. What exactly do you mean with trim the enlarged tables? Out of the blue, we dumped the database, removed it, recreated from the restore, and now the performance is lightning fast again. Does it look familiar to anyone? I thought running VACUUM ANALYZE after a trim should be enough so that pg has assembled the data and has good statistical knowledge of the tables contents.. This looks like either your free_space_map setting is way to low, or you have index bloat. Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem. It also might make sense to issue a CLUSTER instead (which combines the effects of VACUUM FULL, REINDEX and physically reordering the data). When the free_space_map is to low, VACUUM ANALYZE should have told you via a warning (at least, if your logging is set appropriately). HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] perf pb solved only after pg_dump and restore
Hi, Guillaume, Guillaume Cottenceau wrote: We have a couple of logs files which get larger over time (millions of rows). As they are log files, they can be trimmed from older values. Ah, ok, you DELETEd the old rows. So I assume that you never UPDATE, but only INSERT new entries and sometimes DELETE a big bunch of entries from the beginning. This is a special usage pattern, where the normal VACUUM is not well suited for. DELETing rows itsself does not free any space. Only after your transaction is committed, a following VACUUM FULL or CLUSTER does actually free the space. VACUUM and VACUUM ANALYZE only remove obsolete rows from the pages and marks them free (by entering them into the free space map, as long as that one is large enough). That means that your table will actually stay as large as before, having 90% of free pages at the beginning and 10% used pages at the very end. New INSERTs and UPDATEs will prefer to use pages from the free space map before allocating new pages, but the existing rows will stay forever. Now, VACUUM FULL actively moves rows to the beginning of the table, allowing to cut the end of the table, while CLUSTER recreates the table from scratch, in index order. Both lead to a compact storage, having all used rows at the beginning, and no free pages. So, I think, in your case VACUUM FULL and CLUSTER would both have solved your problem. max_fsm_pages is 2 Do they look low? Notice: table data is only 600M after trim (without indexes), while it was probably 3x to 10x this size before the trim. 10x the size means 6G, so 5.4G of data were freed by the trim. Each page has 8k in size, so the fsm needs about 675000 pages. So, yes, for your usage, they look low, and give very suboptimal results. have index bloat. Can you elaborate? I have created a couple of indexes (according to multiple models of use in our application) and they do take up quite some disk space (table dump is 600M but after restore it takes up 1.5G on disk) but I thought they could only do good or never be used, not impair performance.. Like tables, indices may suffer from getting bloated by old, unused entries. Especially the GIST based indices in 7.4 (used by PostGIS and other plugins) suffered from that problem[1], but recent PostgreSQL versions have improved in this area. Now, when the query planner decides to use an index, the index access is extremely slow because of all the deleted entries the index scan has to skip. However, from the additional information you gave above, I doubt it was index bloat. Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem. So these would have reordered the data for faster sequential access which is not the case of VACUUM ANALYZE? A VACUUM FULL would have reordered the data, and a REINDEX would have optimized the index. It also might make sense to issue a CLUSTER instead (which combines the effects of VACUUM FULL, REINDEX and physically reordering the data). I was reluctant in using CLUSTER because you have to choose an index and there are multiple indexes on the large tables.. Usually, CLUSTERing on one index does not necessarily slow down accesses on other indices, compared to the non-clustered (= random) table before. If you have some indices that are somehow related (e. G. a timestamp and a serial number), CLUSTERing on one index does automatically help the other index, especially as the query planer uses corellation statistics. Btw, if your queries often include 2 or 3 columns, a multi-column index (and clustering on that index) might be the best. When the free_space_map is to low, VACUUM ANALYZE should have told you via a warning (at least, if your logging is set appropriately). Unfortunately, we didn't keep the logs of VACUUM ANALYZE, so I can't be sure :/ AFAIK, the warning is also output on the psql command line. HTH, Markus [1] We once had an index that was about 100 times larger before REINDEX. -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] perf pb solved only after pg_dump and restore
Hi, Guillaume, Guillaume Cottenceau wrote: About REINDEX: is it ok to consider that REINDEX is to indexes what VACUUM FULL is to table data, because it cleans up unused index pages? Yes, roughly speaking. And AFAICS you're not running it on a regular basis so your database was probably completely bloated which means: - bloated indexes, - bloated tables (ie a lot of fragmentation in the pages which means that you need far more pages to store the same data). I suppose that table fragmentation occurs when DELETE are interleaved with INSERT? Yes, and it gets ugly as soon as the fsm setting is to low / VACUUM frequency is to low, so it cannot keep up. Big bunches of UPDATE/DELETE that hit more than, say 20% of the table between VACUUM runs, justify a VACUUM FULL in most cases. VACUUM ANALYZE is normally run overnight (each night). Is it not regular enough? There can be hundreds of thousands of statements a day. Which PostgreSQL version are you using? Maybe you should consider autovacuum (which is a contrib module at least since 7.4, and included in the server since 8.1). If you think that vacuum during working hours puts too much load on your server, there are options to tweak that, at least in 8.1. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Identifying bloated tables
Hi, Michal, Michal Taborsky - Internet Mall wrote: When using this view, you are interested in tables, which have the bloat column higher that say 2.0 (in freshly dump/restored/analyzed database they should all be around 1.0). I just noticed some columns in pg_catalog with a bloat value 1 and a negative wasted space - is this due to the pseudo nature of them? Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] VACUUM FULL needed sometimes to prevent transaction
Hi, Jeff all, Jeff Davis wrote: (2) You have a long-running transaction that never completed for some strange reason. I just asked myself whether a 2-phase-commit transaction that was prepared, but never committed, can block vacuuming and TID recycling. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
Hi, Peter, Peter Hardman wrote: BTW, are you *sure* you are testing PG 8.1? The Subquery Scan f2 plan node looks unnecessary to me, and I'd have expected 8.1 to drop it out. 8.0 and before would have left it in the plan though. This doesn't make all that much difference performance-wise in itself, but it does make me wonder what you are testing. Yes, the executables all say version 8.1.3.6044 Would you mind to look at the output of select version();, too? I ask this because I stumbled over it myself, that I had installed the correct postgresql and psql versions, but accidentally connected to a different database installation due to strange environment and script settings... Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
Hi, Peter, Peter Hardman wrote: select version() returns PostgreSQL 8.1.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special) That looks correct. I also presume that your environment is not as fragile wr/t connecting do wrong databases, compared to debian with their multi-cluster multi-version script wrapper magic. Don't mind. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgresql Performance on an HP DL385 and
Hi, Jim, Jim C. Nasby wrote: Well, if the controller is caching with a BBU, I'm not sure that order matters anymore, because the controller should be able to re-order at will. Theoretically. :) But this is why having some actual data posted somewhere would be great. Well, actually, the controller should not reorder over write barriers. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [BUGS] BUG #2567: High IOWAIT
Hi, Jim, Jim Nasby wrote: Your biggest win would be to batch those inserts together into transactions, if possible. Using COPY instead of INSERT might even give better wins, and AFAIK some client libs use COPY internally (e. G. tablewriter from libpqxx). If not, the commit_delay settings might help you out. As far as I understand, this will only help for concurrent inserts by different clients, dealing throughput for latency. Please correct me if I'm wrong. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Migrating data from DB2 to SQL Server
Hi, Eldhose, contact1981 wrote: I am trying to migrate data from a DB2 database to SQL Server 2005 database. Does anyone know about any migration tool that does that? I have heard about DB2 Migration Tool kit, but I think you can only migrate data to a DB2 database with that. Thank you. It seems that you, by accident, hit the wrong list with your question. But, as you're here, why don't you migrate to PostgreSQL instead? Have a nice day, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Beginner optimization questions, esp. regarding Tsearch2
Hi, Richard and Carl, Richard Huxton wrote: Carl Youngblood wrote: - I noticed that there are six different postmaster daemons running. Only one of them is taking up a lot of RAM (1076m virtual and 584m resident). The second one is using 181m resident while the others are less than 20m each. Is it normal to have multiple postmaster processes? You should have one master backend process and one per connection. PG is a classic multi-process designed server. There may be some additional background processes, such as the background writer, stats collector or autovacuum, depending on your version and configuration. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Disk writes
Hi, Reimer, carlosreimer wrote: There is some performance problems with the server and I discovered with vmstat tool that there is some process writing a lot of information in the disk subsystem. [..] I could I discover who is sending so many data to the disks? It could be something triggered by your crontab (updatedb comes in my mind, or texpire from leafnode etc.). Another idea would be that you have statement logging on, or something else that produces lots of kernel or syslog messages[1], and your syslogd is configured to sync() after every line... HTH, Markus [1] We once had such a problem because an ill-compiled kernel having USB verbose logging on... -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance with 2 AMD/Opteron 2.6Ghz and 8gig
Hi, Charles, Charles Sprickman wrote: I've also got a 1U with a 9500SX-4 and 4 drives. I like how the 3Ware card scales there - started with 2 drives and got drive speed mirroring. Added two more and most of the bonnie numbers doubled. This is not what I'm used to with the Adaptec SCSI junk. Well, for sequential reading, you should be able to get double drive speed on a 2-disk mirror with a good controller, as it can balance the reads among the drives. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL scalability on Sun UltraSparc T1
Hi, Arjen, Arjen van der Meijden wrote: It was the 8core version with 16GB memory... but actually that's just overkill, the active portions of the database easily fits in 8GB and a test on another machine with just 2GB didn't even show that much improvements when going to 7GB (6x1G, 2x 512M), it was mostly in the range of 10% improvement or less. I'd be interested in the commit_siblings and commit_delay settings, tuning them could give a high increase on throughput for highly concurrent insert/update workloads, at the cost of latency (and thus worse results for low concurrency situations). Different fsync method settings can also make a difference (I presume that syncing was enabled). HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] sub select performance due to seq scans
Hi, Scott and Hale, Scott Marlowe wrote: Make sure analyze has been run and that the statistics are fairly accurate. It might also help to increase the statistics_target on the column in question. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] loading increase into huge table with 50.000.000 records
Hi, Larry, Hi, Sven, Sven Geisler wrote: You can increase wal_buffers, checkpoint_segments and checkpoint_timeout much higher. You also should increase the free space map settings, it must be large enough to cope with your weekly bunch. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] RAID stripe size question
Hi, Mikael, Mikael Carneholm wrote: An 0+1 array of 4 disks *could* be enough, but I'm still unsure how WAL activity correlates to normal data activity (is it 1:1, 1:2, 1:4, ...?) I think the main difference is that the WAL activity is mostly linear, where the normal data activity is rather random access. Thus, a mirror of few disks (or, with good controller hardware, raid6 on 4 disks or so) for WAL should be enough to cope with a large set of data and index disks, who have a lot more time spent in seeking. Btw, it may make sense to spread different tables or tables and indices onto different Raid-Sets, as you seem to have enough spindles. And look into the commit_delay/commit_siblings settings, they allow you to deal latency for throughput (means a little more latency per transaction, but much more transactions per second throughput for the whole system.) HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] RAID stripe size question
Hi, Mikael, Mikael Carneholm wrote: This is something I'd also would like to test, as a common best-practice these days is to go for a SAME (stripe all, mirror everything) setup. From a development perspective it's easier to use SAME as the developers won't have to think about physical location for new tables/indices, so if there's no performance penalty with SAME I'll gladly keep it that way. Usually, it's not the developers task to care about that, but the DBAs responsibility. And look into the commit_delay/commit_siblings settings, they allow you to deal latency for throughput (means a little more latency per transaction, but much more transactions per second throughput for the whole system.) In a previous test, using cd=5000 and cs=20 increased transaction throughput by ~20% so I'll definitely fiddle with that in the coming tests as well. How many parallel transactions do you have? Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Kill a session
Hi, Tom, Tom Lane wrote: Our expectation is that all or at least most queries should respond to SIGINT or SIGTERM interrupts pretty rapidly, say on a less-than-a-second timescale. However there are various loops in the backend that fail to execute CHECK_FOR_INTERRUPTS sufficiently often :-(. The same is true for user-defined C funtions. The PostGIS GEOS geometry functions come to mind, for complex geometries, they can need hours to complete. And as GEOS is a 3rd-Party library, I don't see an easy way to make them CHECK_FOR_INTERRUPTS. Does anybody know how this is for plpgsql, pljava and plpython? HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] getting better performance
Hi, Eugeny, Eugeny N Dzhurinsky wrote: Do you add / remove tables a lot? Could be you've got system catalog bloat. Yes, almost each table is dropped and re-created in 3-5 days. If your really recreate the same table, TRUNCATE may be a better solution than dropping and recreation. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] need vacuum after insert/truncate/insert?
Hi, Craig, Craig A. James wrote: If I insert a bunch of rows, then truncate, then insert a bunch more rows, do I need to vacuum? I've been assuming that TRUNCATE TABLE is a brute-force technique that more-or-less tosses the old table and starts fresh so that no vacuum is necessary. Second question: Same scenario as above, but now the table has indexes. Is a reindex needed, or are the indexes they truncated too? AFAIK, both table and indices are cut down nicely. But you will need an ANALYZE after refilling of the table, to have current statistics. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Update INSERT RULE while running for Partitioning
Hi, Gene, Gene wrote: I'm using PostgreSQL 8.1.4 in a Hibernate Application and I am attempting to use partitioning via Inherited tables. At first I was going to create a rule per sub-table based on a date range, but found out with multiple rules postgres will only return the affected-row count on the last rule which gives Hibernate problems. This could be considered a PostgreSQL bug - maybe you should discuss this on the appropriate list (general, hackers)? HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Is postgresql ca do the job for software deployed in
Hi, Mikael, Just my 2 cents: Mikael Carneholm wrote: Do you really need to create one *DB* per client - that is, is one schema (in the same DB) per client out of the question? Sometimes, schemas would work _technically_, but not politically, as a postgresql user cannot be prevented from listing all schemas (or even all databases in the same user), regardless whether he/she has access rights. But it is not always acceptable that a customer knows which other customers one has. This forces the use of the one cluster per customer paradigm. Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Is postgresql ca do the job for software deployed in
Hi, Gregory, Gregory S. Williamson wrote: A sodden late night idea ... schemas don't need to have names that are meaningful to outsiders. Yes, but having schema names like A34FZ37 not only qualifies for thedailywtf.com, but also tends to produce maintainance nightmares. And it still allows the customer to estimate the amount of customers. Still, the point about political aspects is an important one. OTH, schemas provide an elegant way of segregating data. Yes, they do, and in the ASP case (where we have control over all software that connects to PostgreSQL) we use the one schema per customer paradigm quite successfully. My $0.02 (not worth what it was) Oh, I think the're at least $0.03 cents worth. :-) Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN performance mystery
Hi, Tim, Tim Allen wrote: One thing that has been apparent is that autovacuum has not been able to keep the database sufficiently tamed. A pg_dump/pg_restore cycle reduced the total database size from 81G to 36G. Two first shots: - Increase your free_space_map settings, until (auto)vacuum does not warn about a too small FSM setting any more - Tune autovacuum to run more often, possibly with a higher delay setting to lower the load. If you still have the original database around, Performing the restore took about 23 hours. Try to put the WAL on another spindle, and increase the WAL size / checkpoint segments. When most of the restore time was spent in index creation, increase the sort mem / maintainance work mem settings. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SAN performance mystery
Hi, Tim, Seems I sent my message to fast, cut in middle of a sencence: Markus Schaber wrote: A pg_dump/pg_restore cycle reduced the total database size from 81G to 36G. If you still have the original database around, ... can you check wether VACUUM FULL and REINDEX achieve the same effect? Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help tuning autovacuum - seeing lots of relationbloat
Hi, Csaba, Csaba Nagy wrote: Well, your application might be completely well behaved and still your DBA (or your favorite DB access tool for that matter) can leave open transactions in an interactive session. It never hurts to check if you actually have idle in transaction sessions. It happened a few times to us, some of those were bad coding on ad-hoc tools written by us, others were badly behaved DB access tools opening a transaction immediately after connect and after each successful command, effectively leaving an open transaction when leaving it open while having lunch... Some older JDBC driver versions had the bug that they always had an open transaction, thus an application server having some pooled connections lingering around could block vacuum forever. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help tuning autovacuum - seeing lots of relationbloat
Jim C. Nasby wrote: On Wed, Jun 21, 2006 at 01:21:05PM -0300, jody brownell wrote: Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG: target: removed 5645231 row versions in 106508 pages Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL: CPU 3.37s/1.23u sec elapsed 40.63 sec. Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG: target: found 5645231 removable, 1296817 nonremovable row versions in 114701 pages Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL: 0 dead row versions cannot be removed yet. So the table contained 5.6M dead rows and 1.3M live rows. I think you should forget about having autovacuum keep this table in-check and add manual vacuum commands to your code. Autovac is intended to deal with 99% of use cases; this is pretty clearly in the 1% it can't handle. Maybe your free space map is configured to small, can you watch out for log messages telling to increase it? HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] scaling up postgres
Hi, Fzied, [EMAIL PROTECTED] wrote: I'm using httperf/autobench for measurments and the best result I can get is that my system can handle a trafiic of almost 1600 New con/sec. Are you using connection pooling or persistent connections between PostgreSQL and the Apaches? Maybe it simply is the network latency between the two machines - as the database is read-only, did you think about having both PostgreSQL and Apache on both machines, and then load-balancing ingoing http requests between them? I cannot scale beyond that value and the funny thing, is that none of the servers is swapping, or heavy loaded, neither postgres nor apache are refusing connexions. And for measuring, are you really throwing parallel http connections to the server? This sounds like you measure request latency, but the maximum throughput might be much higher. my database is only 58M it's a read only DB and will lasts only for a month. I guess it is a simple table with a single PK (some subscription numer) - no joins or other things. For this cases, a special non-RDBMS like MySQL, SQLite, or even some hancrafted thingy may give you better results. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] scaling up postgres
Hi, Zydoon, Zydoon wrote: Now I'm trying to make my tests, and I'm not that sure I will make the switch to the PSeries, since my dual xeon with 4 G RAM can handle 3500 concurrent postmasters consuming 3.7 G of the RAM. I cannot reach this number on the PSeries with 2 G. This sounds like you want to have one postgresql backend per apache frontend. Did you try running pgpool on the Apache machine, and have only a few (hundred) connections to the backend? Maybe http://en.wikipedia.org/wiki/Memcached could be helpful, too. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Some queries starting to hang
Hi, Chris, Chris Beecroft wrote: Query is now returning with results on our replicated database. Will vacuum analyze production now. So it seems to have done the trick. Now the question is has our auto vacuum failed or was not set up properly... A question for my IT people. Most of the cases when we had database bloat despite running autovacuum, it was due to a low free_space_map setting. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How can I make this query faster (resend)
Hi, Cstendis, Cstdenis wrote: Server is a dual xeon with a gig of ram dedicated mostly to postgresql. Here is the changed lines in my postgresql.conf: http://pastebin.ca/57222 3M is really low for a production server. Try using pg_pool and limiting it to about 30 or so backend connections, and then give them at least 30 megs of RAM each. This should also cut down the connection creation overhead. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
Hi, PFC, PFC wrote: The problem is that you need a set-returning function to retrieve the values. SRFs don't have rowcount estimates, so the plans suck. What about adding some way of rowcount estimation to SRFs, in the way of: CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS $$ ... function code ... $$ LANGUAGE plpgsql ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ; Internally, this could create two functions, foo (para, meters) and estimate_foo(para, meters) that are the same language and coupled together (just like a SERIAL column and its sequence). The estimator functions have an implicit return parameter of int8. Parameters may be NULL when they are not known at query planning time. What do you think about this idea? The same scheme could be used to add a CPUCOST_ESTIMATOR to expensive functions. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Hi, Scott all, Scott Lamb wrote: I don't know the answer to this question, but have you seen this tool? http://brad.livejournal.com/2116715.html We had a simpler tool inhouse, which wrote a file byte-for-byte, and called fsync() after every byte. If the number of fsyncs/min is higher than your rotations per minute value of your disks, they must be lying. It does not find as much liers as the script above, but it is less intrusive (can be ran on every low-io machine without crashing it), and it found some liers in-house (some notebook disks, one external USB/FireWire to IDE case, and an older linux cryptoloop implementations, IIRC). If you're interested, I can dig for the C source... HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Hi, Bruce, Bruce Momjian wrote: It does not find as much liers as the script above, but it is less Why does it find fewer liers? It won't find liers that have a small lie-queue-length so their internal buffers get full so they have to block. After a small burst at start which usually hides in other latencies, they don't get more throughput than spindle turns. It won't find liers that first acknowledge to the host, and then immediately write the block before accepting other commands. This improves latency (which is measured in some benchmarks), but not syncs/write rate. Both of them can be captured by the other script, but not by my tool. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Hi, Bruce, Markus Schaber wrote: It does not find as much liers as the script above, but it is less Why does it find fewer liers? It won't find liers that have a small lie-queue-length so their internal buffers get full so they have to block. After a small burst at start which usually hides in other latencies, they don't get more throughput than spindle turns. I just reread my mail, and must admit that I would not understand what I wrote above, so I'll explain a little more: My test programs writes byte-for-byte. Let's say our FS/OS has 4k page- and blocksize, that means 4096 writes that all write the same disk blocks. Intelligent liers will see that the the 2nd and all further writes obsolete the former writes who still reside in the internal cache, and drop those former writes from cache, effectively going up to 4k writes/spindle turn. Dumb liers will keep the obsolete writes in the write cache / queue, and so won't be caught by my program. (Note that I have no proof that such disks actually exist, but I have enough experience with hardware that I won't be surprised.) HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
Hi, PFC, PFC wrote: You need to do some processing to know how many rows the function would return. Often, this processing will be repeated in the function itself. Sometimes it's very simple (ie. the function will RETURN NEXT each element in an array, you know the array length...) Sometimes, for functions returning few rows, it might be faster to compute the entire result set in the cost estimator. I know, but we only have to estmiate the number of rows to give a hint to the query planner, so we can use lots of simplifications. E. G. for generate_series we return ($2-$1)/$3, and for some functions even constant estimates will be good enough. - please execute me and store my results in a temporary storage, count the rows returned, and plan the outer query accordingly That's an interesting idea. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
Hi, Nils, Nis Jorgensen wrote: It will probably be quite common for the number to depend on the number of rows in other tables. Even if this is fairly constant within one db (some assumption), it is likely to be different in others using the same function definition. Perhaps a better solution would be to cache the result of the estimator function. Sophisticated estimator functions are free to use the pg_statistics views for their row count estimation. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] extremely slow when execute select/delete for certain
Hi, Kah, [EMAIL PROTECTED] wrote: I already vacuum those tables with full option but it still the same. What could be the possible causes of this problem? How can I solve it? CPU - Intel Xeon 2.40 GHz Memory - 1.5G Postgresql version: 7.2.2 First, you should consider to upgrade your PostgreSQL server to a newer version, at least to 7.2.8 which fixes some critical bugs. But it will be much better to upgrade to current 8.1 version, as I think that your problem is caused by index bloat, and indices are handled much better in 8.1. Try recreating your indices using REINDEX command. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Why so slow?
Hi, Bill, Bill Moran wrote: My understanding is basically that if you vacuum with the correct frequency, you'll never need to vacuum full. This is why the autovacuum system is so nice, it adjusts the frequency of vacuum according to how much use the DB is getting. Additonally, the free_space_map setting has to be high enough, it has to cover enough space to put in all pages that get dead rows between two vacuum runs. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] how unsafe (or worst scenarios) when setting fsync
Hk, Guoping, Guoping Zhang wrote: a) The tests consists of ten thousands very small transactions, which are not grouped, that is why so slow with compare to set fsync off. If those transactions are submitted by concurrent applications over several simulataneous connections, playing with commit_delay and commit_siblins may improve your situation. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Arrays and index scan
Hello, I'm searching for a comfortable way to get a variable-size bunch of user specified Objects via a single prepared statement, so I wanted to submit an ARRAY. However, the query planner seems to refuse to make index scans even with 8.1: testdb=# EXPLAIN SELECT * from streets WHERE link_id = ANY(ARRAY[1,2,3]); QUERY PLAN Seq Scan on streets (cost=0.00..288681.74 rows=1713754 width=393) Filter: (link_id = ANY ('{1,2,3}'::integer[])) (2 rows) Via IN, it works fine, but hast the disadvantage that we cannot use prepared statements effectively: testdb=# explain select * from streets where link_id in (1,2,3); QUERY PLAN --- Bitmap Heap Scan on streets (cost=6.02..16.08 rows=5 width=393) Recheck Cond: ((link_id = 1) OR (link_id = 2) OR (link_id = 3)) - BitmapOr (cost=6.02..6.02 rows=5 width=0) - Bitmap Index Scan on streets_link_id_idx (cost=0.00..2.01 rows=2 width=0) Index Cond: (link_id = 1) - Bitmap Index Scan on streets_link_id_idx (cost=0.00..2.01 rows=2 width=0) Index Cond: (link_id = 2) - Bitmap Index Scan on streets_link_id_idx (cost=0.00..2.01 rows=2 width=0) Index Cond: (link_id = 3) (9 rows) And on the net, I found a nice trick via an array flattening function, which at least uses a nested loop of index scans instead of an index bitmap scan: testdb=# CREATE FUNCTION flatten_array(anyarray) RETURNS SETOF anyelement AS testdb-# 'SELECT ($1)[i] FROM (SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) as foo;' testdb-# language SQL STRICT IMMUTABLE; testdb=# EXPLAIN SELECT * from streets JOIN flatten_array(ARRAY[1,2,3]) on flatten_array=link_id; QUERY PLAN -- Nested Loop (cost=0.00..5882.15 rows=1566 width=397) - Function Scan on flatten_array (cost=0.00..12.50 rows=1000 width=4) - Index Scan using treets_link_id_idx on streets (cost=0.00..5.84 rows=2 width=393) Index Cond: (outer.flatten_array = streets.link_id) (4 rows) Currently, we're planning to use the array flattening approach, but are there any plans to enhance the query planner for the direct ARRAY approach? Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Recovery will take 10 hours
Hi, Brandan, Brendan Duddridge wrote: So how do you overlap the restore process with the retrieving of files? You need a shell script as restore command that does both uncompressing the current file, and starting a background decompress of the next file(s). It also has to check whether the current file is already in progress from a last run, and wait until this is finished instead of decompressing it. Seems to be a little complicated than it sounds first. restore_command = 'gunzip /wal_archive/%f.gz%p ' Warning: Don't do it this way! It will break things because PostgreSQL will try to access a not-completely-restored wal file. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Introducing a new linux readahead framework
Hi, Wu, Wu Fengguang wrote: In adaptive readahead, the context based method may be of particular interest to postgresql users. It works by peeking into the file cache and check if there are any history pages present or accessed. In this way it can detect almost all forms of sequential / semi-sequential read patterns, e.g. - parallel / interleaved sequential scans on one file - sequential reads across file open/close - mixed sequential / random accesses - sparse / skimming sequential read It also have methods to detect some less common cases: - reading backward - seeking all over reading N pages Gread news, thanks! This call will disable readahead totally for fd: posix_fadvise(fd, any, any, POSIX_FADV_RANDOM); This one will reenable it: posix_fadvise(fd, any, any, POSIX_FADV_NORMAL); This one will enable readahead _and_ set max readahead window to 2*max_readahead_kb: posix_fadvise(fd, any, any, POSIX_FADV_SEQUENTIAL); I think that this is an easy, understandable and useful interpretation of posix_fadvise() hints. Are there any rough estimates when this will get into mainline kernel (if you intend to submit)? Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Quick Performance Poll
Hi, Luke, Luke Lonergan wrote: The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations. For now, you have to drop index, do inserts/updates, rebuild index. So they effectively turn the table into a read-only table for now. Are they capable to index custom datatypes like the PostGIS geometries that use the GIST mechanism? This could probably speed up our Geo Databases for Map rendering, containing static data that is updated approx. 2 times per year. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Inserts optimization?
Hi, Magnus, Magnus Hagander wrote: Bacula already serializes access to the database (they have to support mysql/myisam), so this shouldn't help. Ouch, that hurts. To support mysql, they break performance for _every other_ database system? cynism Now, I understand how the mysql people manage to spread the legend of mysql being fast. They convince software developers to thwart all others. / Seriously: How can we convince developers to either fix MySQL or abandon and replace it with a database, instead of crippling client software? Actually, it might well hurt by introducing extra delays. Well, if you read the documentation, you will see that it will only wait if there are at least commit_siblings other transactions active. So when Bacula serializes access, there will be no delays, as there is only a single transaction alive. HTH Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Inserts optimization?
Hi, Magnus, Magnus Hagander wrote: To support mysql, they break performance for _every other_ database system? Actually, it probably helps on SQLite as well. AFAICS from the FAQ http://www.sqlite.org/faq.html#q7 and #q8, SQLite does serialize itsself. And considering they only support postgresql, mysql and sqlite, there is some merit to it from their perspective. Okay, I understand, but I hesitate to endorse it. IMHO, they should write their application in a normal way, and then have the serialization etc. encapsulated in the database driver interface (possibly a wrapper class or so). cynism Now, I understand how the mysql people manage to spread the legend of mysql being fast. They convince software developers to thwart all others. / Yes, same as the fact that most (at least FOSS) web project-du-jour are dumbed down to the mysql featureset. (And not just mysql, but mysql-lowest-common-factors, which means myisam etc) Well, most of those projects don't need a database, they need a bunch of tables and a lock. Heck, they even use client-side SELECT-loops in PHP instead of a JOIN because I always confuse left and right. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Inserts optimization?
Hi, Francisco, Francisco Reyes wrote: I only wonder what is safer.. using a second or two in commit_delay or using fsync = off.. Anyone cares to comment? It might be that you misunderstood commit_delay. It will not only delay the disk write, but also block your connnection until the write actually is performed. It will rise the throughput in multi-client scenarios, but will also rise the latency, and it will absolutely bring no speedup in single-client scenarios. It does not decrease safety (in opposite to fsync=off), data will be consistent, and any application that has successfully finished a commit can be shure their data is on the platters.[1] HTH, Markus [1] As long as the platters don't lie, but that's another subject. -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] mergehashloop
Hi, Tom, Tom Lane wrote: Well, the other thing that's going on here is that we know we are overestimating the cost of nestloop-with-inner-indexscan plans. The current estimation for that is basically outer scan cost plus N times inner scan cost where N is the estimated number of outer tuples; in other words the repeated indexscan probes are each assumed to happen from a cold start. In reality, caching of the upper levels of the index means that the later index probes are much cheaper than this model thinks. We've known about this for some time but no one's yet proposed a more reasonable cost model. My spontaneus guess would be to use log(N)*inner instead of N*inner. I don't have any backings for that, it's just what my intuition tells me as a first shot. In my mind this is tied into another issue, which is that the planner always costs on the basis of each query starting from zero. In a real environment it's much cheaper to use heavily-used indexes than this cost model suggests, because they'll already be swapped in due to use by previous queries. But we haven't got any infrastructure to keep track of what's been heavily used, let alone a cost model that could make use of the info. An easy first approach would be to add a user tunable cache probability value to each index (and possibly table) between 0 and 1. Then simply multiply random_page_cost with (1-that value) for each scan. Later, this value could be automatically tuned by stats analysis or other means. I think part of the reason that people commonly reduce random_page_cost to values much lower than physical reality would suggest is that it provides a crude way of partially compensating for this basic problem. I totall agree with this, it's just what we did here from time to time. :-) Hmm, how does effective_cach_size correspond with it? Shouldn't a high effective_cache_size have a similar effect? Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Better index stategy for many fields with few values
Hi, Jim, Jim C. Nasby wrote: I was also thinking about about using a functional index. If there's a logical relation between those values that they can easily combined, that may be a good alternative. How would that be any better than just doing a multi-column index? 10 different values per column, and 20 columns are 10^20 value combinations. Partitioning it for the first column gives 10^19 combinations which is smaller than 2^64, and thus fits into a long value. And I just guess that a 10-partition functional index on a long value could perform better than a multi-column index on 20 columns of character(10), if only because it is approx. 1/25th in size. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Better index stategy for many fields with few values
Hi, Jim, Jim Nasby wrote: Adding -performance back in I would like to try it. However in an other post I added that contrary to what I stated initially all the paramXX columns are not mandatory in the query. So it seems that requirement make the problem more complexe. Okay, this rules out my functional index over 19 columns. Doesn't this new requirement rule out this solution? No, just group the columns logically. Yes, that's the solution. If you have common groups of columns that appear and disappear synchroneously, pack those together in an (possibly partitioned and/or functional) index. Then rely on the query planner that the combines the appropriate indices via index bitmap scan. By the way I have test to index each column individually and check what happens in relation to bitscan map. My test table is 1 million rows. The explain analyze command shows that a bit scan is sometimes used but I still end up with queries that can take up to 10s which is way to much. Is it on the first query, or on repeated queries? It might be that you're I/O bound, and the backend has to fetch indices and rows from Disk into RAM. I currently don't know whether the order of indices in a multi-index bitmap scan is relevant, but I could imagine that it may be useful to have the most selective index scanned first. And keep in mind that, assuming an equal distribution of your parameters, every index bitmap hits 1/10th of the whole table on average, so the selectivity generally is low. The selectivity of a partitioned 3-column index will be much better (about 1/1th of the whole table), and less index scans and bitmaps have to be generated. A functional index may also make sense to CLUSTER the table to optimize the locality of search results (and so reducing disk I/O). In case your table has low write activity, but high read-only activity, the overhead that comes with the additional index is neglible compared to the performance improvement proper CLUSTERing can generate. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pgmemcache
Hi, Tom, Tom Lane wrote: Why are AFTER COMMIT triggers impossible? What happens if such a trigger gets an error? You can't un-commit. Then it must be specified that those triggers are in their own transaction, and cannot abort the transaction. Or use the 2-phase-commit infrastructure for them. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Better index stategy for many fields with few values
Hi, Oscar, Oscar Picasso wrote: [ all the 20 paramXX columns are used in the query} How can I optimize this kind of query? PostgreSQL 8.1 has so-called bitmap index scans, which can combine several index scans before actually accessing the data. So I think it's best to create an index on each of the paramXX columns, and see with EXPLAIN ANALYZE what it is doing. I was thinking about using a multicolumns index, but I have read that we should limit multicolumns indice to at most 2 or 3 columns. Yes, that's true, the index overhead gets too high. If that's true then 22 columns for a multicolumn incdex seems way too much. Or maybe it is workable as every column uses only a very limited set of values? Yes, I think that a 22 column index is way too much, especially with the new bitmap index scans available. I was also thinking about about using a functional index. If there's a logical relation between those values that they can easily combined, that may be a good alternative. I just had another weird idea: As your paramXX values can have only 10 parameters, it also might be feasible to use a bunch of 10 conditional indices, like: CREATE INDEX foo1 ON table (param1, param2 WHERE param0='1st value'; CREATE INDEX foo2 ON table (param1, param2 WHERE param0='2nd value'; CREATE INDEX foo3 ON table (param1, param2 WHERE param0='3rd value'; [...] This way, you don't have the index bloat of a 3-column index, but 10 2-column indices that cover 1/10th of the table each. For 22 columns, you'd need a bunch of seven such indices plus a single-column one, or can use some 3+1 and some 2+1 column index. I'd like to see the query plans from explain analyze. Btw, I expect query planning time to get rather significant for so much columns, so gequo tuning, tuning work_mem (for the bitmap scans) and prepared statements will pay off. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Indexes with descending date columns
Hi, Bruce, Bruce Momjian wrote: Ahh. There's a hack to do that by defining a new opclass that reverses and , and then doing ORDER BY project_id, id, date USING new_opclass. I think there's a TODO about this, but I'm not sure... Yes, and updated: * Allow the creation of indexes with mixed ascending/descending specifiers This is possible now by creating an operator class with reversed sort operators. One complexity is that NULLs would then appear at the start of the result set, and this might affect certain sort types, like merge join. I think it would be better to allow index zig-zag scans for multi-column index.[1] So it traverses in a given order on the higher order column, and the sub trees for each specific high order value is traversed in reversed order. From my knowledge at least of BTrees, and given correct commutator definitions, this should be not so complicated to implement.[2] This would allow the query planner to use the same index for arbitrary ASC/DESC combinations of the given columns. Just a thought, Markus [1] It may make sense to implement the mixed specifiers on indices as well, to allow CLUSTERing on mixed search order. [2] But I admit that I currently don't have enough knowledge in PostgreSQL index scan internals to know whether it really is easy to implement. -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3
Hi, Juan, Juan Casero (FL FLC) wrote: Ok that is beginning to become clear to me. Now I need to determine if this server is worth the investment for us. Maybe it is not a speed daemon but to be honest the licensing costs of an SMP aware RDBMS is outside our budget. When postgresql starts does it start up a super server process and then forks copies of itself to handle incoming requests? It starts a super server process (Postmaster) and some background processes (background writer, stats collector). For each incoming connection, the postmaster forks a single-threaded backend process, which handles all queries and transactions on this connection, and terminates when the connection terminates. So as a thumb-rule, each connection can utilize only a single CPU. You can utilize a few more CPUs than you have simultaneous connections, due to the background processes and the OS needing CPU for I/O, but thats rather marginal. AFAIK, Bizgres MPP has extended the backend processes to be multi threaded, so a single connection can utilize several CPUs for some types of queries (large data sets, sorting/joining/hashing etc.). Btw, I presume that they might offer you a free test license, and I also presume their license fee is much lower than Oracle or DB/2. Or do I have to specify how many server processes should be started up? You can limit the number of server processes by setting the maximum connection limit. I figured maybe I can take advantage of the multiple cpu's on this system by starting up enough postgres server processes to handle large numbers of incoming connections. I have this server available for sixty days so I may as well explore the performance of postgresql on it. Yes, you can take advantage if you have multiple clients (e. G. a wep app, that's what the T2000 / Niagara were made for). You have a Tomcat or Jboss sitting on it, each http connection forks its own thread. Each customer has its own CPU :-) Then use a connection pool to PostgreSQL, and you're fine. The more customers, the more CPUs are utilized. But beware, if you have floating point maths, it will be very slow. All 8 CPUs / 32 Threads share a single FPU. So if you need floating point (e. G. Mapserver, PostGIS geoprocessing, Java2D chart drawing or something), T2000 is not the right thing for you. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Decide between Postgresql and Mysql (help of
Hi, Craig, Craig A. James wrote: I hope this was just a joke. You should be sure to clarify - there might be some newbie out there who thinks you are seriously suggesting coding major web sites in some old-fashioned compiled language. No, but perhaps with a CMS that pregenerates static content, or http://www.tntnet.org/ Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index scan startup time
Hi, Peter, Peter Eisentraut wrote: The table has seen VACUUM FULL and REINDEX before this. But no analyze? ANALYZE as well, but the plan choice is not the point anyway. Maybe you could add a combined Index on activity_id and state, or (if you use this kind of query more often) a conditional index on activity_id where state in (1,10001). Btw, PostgreSQL 8.1 could AND two bitmap index scans on the activity and state indices, and get the result faster (i presume). Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Slow performance on Windows .NET and OleDb
Hi, Greg, Greg Quinn wrote: I populate 3000 records into the table to test PostGreSql's speed. It takes about 3-4 seconds. When you do the population, is it via inserts or copy? Via insert Are those inserts encapsulated into a single transaction? If not, that's the reason why it's so slow, every transaction sync()s through to the disk. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] count(*) performance
Gábriel Ákos wrote: I thought that too. Autovacuum is running on our system but it didn't do the trick. Anyway the issue is solved, thank you all for helping. :) Hi, Gabriel, it may be that your Free Space Map (FSM) setting is way to low. Try increasing it. Btw, VACUUM outputs a Warning if FSM is not high enough, maybe you can find useful hints in the log file. HTH Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] simple join uses indexes, very slow
Hi, George, george young wrote: Looks like a hash join might be faster. What is your work_mem set to? work_mem= 1024 This is 1 Megabyte. By all means, increase it, if possible. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Massive Inserts Strategies
Hi, ashah, ashah wrote: I tried this solution, but ran into following problem. The temp_table has columns (col1, col2, col3). The original_table has columns (col0, col1, col2, col3) Now the extra col0 on the original_table is the unique generated ID by the database. INSERT INTO original_table (col1, col2, col3) SELECT col1, col2, col3 FROM temp_table WHERE ... HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster