Re: [PERFORM] Question about CLUSTER
On Mon, Feb 11, 2008 at 03:33:37PM -0600, Scott Marlowe wrote: On Feb 11, 2008 2:03 PM, salman [EMAIL PROTECTED] wrote: I'm planning to cluster a few large tables in our database but I'm unable to find any recommendations/documentation on best practices -- Mainly, whether it's better to use an index which has a higher idx_scan value, a higher idx_tup_read value, or the higest idx_tup_fetch value. I'm assuming that idx_tup_read would probably be the best choice, but want to get other opinions before proceeding. If you've got two indexes that are both being hit a lot, it might be worth looking into their correlation, and if they get used a lot together, look at creating an index on both. But I'd guess that idx_tup_read would be a good bet. You might also consider the ratio idx_tup_read::float8 / idx_scan to see which indexes access a lot of rows per scan. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Questions about enabling SSL
On Mon, Feb 11, 2008 at 05:37:51PM -0700, Michael Fuhr wrote: On Mon, Feb 11, 2008 at 04:58:35PM -0700, fabrix peñuelas wrote: If ssl is enable in postgresql decreanse the performance of the database? How much? The performance impact of an encrypted connection depends on how expensive the queries are and how much data they return. Another consideration is how much time you spend using each connection vs. how much time it takes to establish each connection. A thousand simple queries over the same encrypted connection might be significantly faster than running each query over a separate unencrypted connection, which in turn will probably be significantly faster than using separate encrypted connections that must each carry out a relatively expensive key establishment. -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Questions about enabling SSL
On Mon, Feb 11, 2008 at 04:58:35PM -0700, fabrix peñuelas wrote: If ssl is enable in postgresql decreanse the performance of the database? How much? The performance impact of an encrypted connection depends on how expensive the queries are and how much data they return. A query that joins several tables and aggregates millions of rows might take several seconds or minutes to run and return only a few rows; for such a query the impact of an encrypted connection is insignificant. But if you make many queries that run quickly and return large result sets then you might indeed notice the impact of an encrypted connection vs. a non-encrypted connection. The most reliable way to assess the impact would be to run representative queries over your data and measure the difference yourself. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query taking too long. Problem reading explain output.
On Wed, Oct 03, 2007 at 10:03:53AM +0200, Henrik wrote: I have a little query that takes too long and what I can see in the explain output is a seq scan on my biggest table ( tbl_file_structure) which I can't explain why. Here's where almost all of the time is taken: Hash Join (cost=8605.68..410913.87 rows=19028 width=40) (actual time=22.810..16196.414 rows=17926 loops=1) Hash Cond: (tbl_file_structure.fk_file_id = tbl_file.pk_file_id) - Seq Scan on tbl_file_structure (cost=0.00..319157.94 rows=16591994 width=16) (actual time=0.016..7979.083 rows=16591994 loops=1) - Hash (cost=8573.62..8573.62 rows=2565 width=40) (actual time=22.529..22.529 rows=2221 loops=1) - Bitmap Heap Scan on tbl_file (cost=74.93..8573.62 rows=2565 width=40) (actual time=1.597..20.691 rows=2221 loops=1) Filter: (lower((file_name)::text) ~~ 'index.php%'::text) - Bitmap Index Scan on tbl_file_idx (cost=0.00..74.28 rows=2565 width=0) (actual time=1.118..1.118 rows=2221 loops=1) Index Cond: ((lower((file_name)::text) ~=~ 'index.php'::character varying) AND (lower((file_name)::text) ~~ 'index.phq'::character varying)) Does tbl_file_structure have an index on fk_file_id? If so then what's the EXPLAIN ANALYZE output if you set enable_seqscan to off? I don't recommend disabling sequential scans permanently but doing so can be useful when investigating why the planner thinks one plan will be faster than another. What are your settings for random_page_cost, effective_cache_size, work_mem, and shared_buffers? If you're using the default random_page_cost of 4 then what's the EXPLAIN ANALYZE output if you reduce it to 3 or 2 (after setting enable_seqscan back to on)? -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] one column from huge view
On Thu, Jul 12, 2007 at 09:50:42AM +0100, Heikki Linnakangas wrote: Marcin Stępnicki wrote: Let's say I've got a view with 100 columns and 1mln rows; some of them are calculated on the fly. For some reason I want only one column from this view: select col1 from huge_view; Now, does PostgreSQL skip all the calculations from other columns and executes this query faster then select * from huge_view? In simple cases, yes. But for example, if you have a LEFT OUTER JOIN in the view, the join is performed even if your query doesn't return any columns from the outer relation. Also, if the calculation contains immutable functions, it's not skipped. Don't you mean if the calculation contains VOLATILE functions, it's not skipped? -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query Analyser
On Tue, Jul 10, 2007 at 08:17:05PM +0530, Gauri Kanekar wrote: Is there anyway so as to indicate the Query Analyser not to use the plan which it is using regularly, and use a new plan ? You can't dictate the query plan but you can influence the planner's decisions with various configuration settings. http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html Disabling planner methods (enable_seqscan, etc.) should be a last resort -- before doing so make sure that settings like shared_buffers and effective_cache_size are appropriately sized for your system, that you're gathering enough statistics (see below), and that the statistics are current (run ANALYZE or VACUUM ANALYZE). After all that, if you still think you need to disable a planner method then consider posting the query and the EXPLAIN ANALYZE output to pgsql-performance to see if anybody has other suggestions. From where do the Query Analyser gets the all info to prepare a plan? Is it only from the pg_statistics table or are there anyother tables which have this info. stored? The planner also uses pg_class.{reltuples,relpages}. http://www.postgresql.org/docs/8.2/interactive/planner-stats.html http://www.postgresql.org/docs/8.2/interactive/planner-stats-details.html And can we change the statistic?? You can increase the amount of statistics gathered for a specific column with ALTER TABLE SET STATISTICS or system-wide by adjusting default_statistics_target. http://www.postgresql.org/docs/8.2/interactive/sql-altertable.html http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Delete Cascade FK speed issue
On Tue, Jul 03, 2007 at 08:05:27AM +0200, Patric de Waha wrote: Is there a way I can find out what postgres does, and where it hangs around, so I know where the FK might not be indexed. (The dbase is to big to analyze it by hand). You could query the system catalogs to look for foreign key constraints that don't have an index on the referencing column(s). Something like the following should work for single-column foreign keys: select n1.nspname, c1.relname, a1.attname, t.conname, n2.nspname as fnspname, c2.relname as frelname, a2.attname as fattname from pg_constraint t join pg_attribute a1 on a1.attrelid = t.conrelid and a1.attnum = t.conkey[1] join pg_class c1 on c1.oid = t.conrelid join pg_namespace n1 on n1.oid = c1.relnamespace join pg_class c2 on c2.oid = t.confrelid join pg_namespace n2 on n2.oid = c2.relnamespace join pg_attribute a2 on a2.attrelid = t.confrelid and a2.attnum = t.confkey[1] where t.contype = 'f' and not exists ( select 1 from pg_index i where i.indrelid = t.conrelid and i.indkey[0] = t.conkey[1] ) order by n1.nspname, c1.relname, a1.attname; -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pg_statistic doesnt contain details for specific table
On Mon, Jun 11, 2007 at 02:28:32PM +0530, Nimesh Satam wrote: We have already used analyze command on the table. We have also ran the vacuum analyze command. But they are not helping. Is there any data in the table? What does ANALYZE VERBOSE or VACUUM ANALYZE VERBOSE show for this table? Is there any chance that somebody set all of the columns' statistics targets to zero? -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pg_statistic doesnt contain details for specific table
On Mon, Jun 11, 2007 at 07:22:24PM +0530, Nimesh Satam wrote: INFO: analyzing public.cam_attr INFO: cam_attr: scanned 103 of 103 pages, containing 11829 live rows and 0 dead rows; 6000 rows in sample, 11829 estimated total rows Looks reasonable. Also how do we check if the statistics are set to Zero for the table? SELECT attname, attstattarget FROM pg_attribute WHERE attrelid = 'public.cam_attr'::regclass AND attnum 0 AND NOT attisdropped; If nobody has changed the statistics targets then they're all probably -1. Negative attstattarget values mean to use the system default, which you can see with: SHOW default_statistics_target; How exactly are you determining that no statistics are showing up for this table? Are you running a query like the following? SELECT * FROM pg_stats WHERE schemaname = 'public' AND tablename = 'cam_attr'; -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] upgraded to pgsql 8.2.4, getting worse performance then 7.4.x
On Sat, Jun 02, 2007 at 09:13:32AM -0400, Douglas J Hunley wrote: Our 'esteemed' Engr group recently informed a customer that in their testing, upgrading to 8.2.x improved the performance of our J2EE application approximately 20%, so of course, the customer then tasked me with upgrading them. We dumped their db, removed pgsql, installed the 8.2.4 rpms from postgresql.org, did an initdb, and the pg_restored their data. It's been about a week now, and the customer is complaining that in their testing, they are seeing a 30% /decrease/ in general performance. After the restore, did you ANALYZE the entire database to update the planner's statistics? Have you enabled autovacuum or are you otherwise vacuuming and analyzing regularly? What kind of queries are slower than desired? If you post an example query and the EXPLAIN ANALYZE output then we might be able to see if the slowness is due to query plans. A few differences between the configuration files stand out. The 7.4 file has the following settings: shared_buffers = 25000 sort_mem = 15000 effective_cache_size = 196608 The 8.2 config has: #shared_buffers = 32MB #work_mem = 1MB #effective_cache_size = 128MB To be equivalent to the 7.4 config the 8.2 config would need: shared_buffers = 195MB work_mem = 15000kB effective_cache_size = 1536MB With 8GB of RAM you might try increasing shared_buffers to 400MB - 800MB (less if the entire database isn't that big) and effective_cache_size to 5GB - 6GB. You might have to increase the kernel's shared memory settings before increasing shared_buffers. Some of the other settings are the same between the configurations but deserve discussion: fsync = off Disabling fsync is dangerous -- are all parties aware of the risk and willing to accept it? Has the risk been weighed against the cost of upgrading to a faster I/O subsystem? How much performance benefit are you realizing by disabling fsync? What kind of activity led to the decision to disable fynsc? Are applications doing anything like executing large numbers of insert/update/delete statements outside of a transaction block when they could be done in a single transaction? commit_delay = 2 commit_siblings = 3 What kind of activity led to the above settings? Are they a guess or were they determined empirically? How much benefit are they providing and how did you measure that? enable_mergejoin = off geqo = off I've occasionally had to tweak planner settings but I prefer to do so for specific queries instead of changing them server-wide. -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help with TOAST Compression
On Tue, Apr 17, 2007 at 04:13:36PM -0500, David Hinkle wrote: I have a table where I store email, the bodies are mostly kept in a toast table.The toast table is 940 Meg in size. The whole database is about 1.2 Gig in size. When I back the database up using pg_dump in custom output mode, I pipe the output into gzip. My backups are only about 600 meg in size. From this, I assume the that toe toast table isn't getting compressed. How are you measuring the toast table and database sizes? Have you taken indexes and uncompressible data and metadata into account? The database compresses only certain data, whereas when you pipe a dump into gzip you get compression on the entire dump. Some of the space might be taken up by dead rows and unused item pointers. How often do you vacuum? What does VACUUM VERBOSE tablename show? Is there any way I can tell for sure if the messages from this column are being stored compressed? You could look at a hex/ascii dump of the base and toast tables -- you might see runs of legible text but it should be obvious where the data is compressed. See the TOAST section in the documentation for more information about how and when data is compressed: http://www.postgresql.org/docs/8.2/interactive/storage-toast.html Note that The TOAST code is triggered only when a row value to be stored in a table is wider than BLCKSZ/4 bytes (normally 2 kB). And I'm no expert at compression algorithms but it's possible that the fairly simple and very fast member of the LZ family of compression techniques isn't as space-efficient as the algorithm that gzip uses (LZ77 according to its manual page). Maybe one of the developers can comment. I know I can set the compression settings using the ALTER TABLE ALTER SET STORAGE syntax, but is there a way I can see what this value is currently set to? You could query pg_attribute.attstorage: http://www.postgresql.org/docs/8.2/interactive/catalog-pg-attribute.html -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] postgres 8.2 seems to prefer Seq Scan
On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote: One more anomaly between 7.4 and 8.2. DB dumped from 7.4 and loaded onto 8.2, both have locale set to C. 8.2 seems to prefer Seq Scans for the first query while the ordering in the second query seems to perform worse on 8.2. I ran analyze. I've tried with the encoding set to UTF-8 and SQL_ASCII; same numbers and plans. Any ideas how to improve this? Are you sure the data sets are identical? The 7.4 query returned 0 rows; the 8.2 query returned 1 row. If you're running the same query against the same data in both versions then at least one of them appears to be returning the wrong result. Exactly which versions of 7.4 and 8.2 are you running? Have you analyzed all tables in both versions? The row count estimate in 7.4 is much closer to reality than in 8.2: 7.4 - Index Scan using pnum_idx on event (cost=0.00..3.37 rows=19 width=172) (actual time=0.063..0.063 rows=0 loops=1) Index Cond: ((pnum)::text = 'AB5819188'::text) 8.2 - Index Scan using pnum_idx on event (cost=0.00..3147.63 rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1) Index Cond: ((pnum)::text = 'AB5819188'::text) If analyzing the event table doesn't improve the row count estimate then try increasing the statistics target for event.pnum and analyzing again. Example: ALTER TABLE event ALTER pnum SET STATISTICS 100; ANALYZE event; You can set the statistics target as high as 1000 to get more accurate results at the cost of longer ANALYZE times. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Wrong plan sequential scan instead of an index one
On Fri, Mar 30, 2007 at 12:08:26PM +0200, Gaetano Mendola wrote: Claus Guttesen wrote: Try changing random_page_cost from the default 4 to 2 in postgresql.conf: random_page_cost = 2 I have tuned that number already at 2.5, lowering it to 2 doesn't change the plan. The following 19-fold overestimate is influencing the rest of the plan: - Seq Scan on l_pvcp (cost=0.00..2.17 rows=19 width=4) (actual time=0.066..0.081 rows=1 loops=1) Filter: (value ~~* '%pi%'::text) Have you tried increasing the statistics target on l_pvcp.value? I ran your queries against canned data in 8.2.3 and better statistics resulted in more accurate row count estimates for this and other parts of the plan. I don't recall if estimates for non-leading-character matches in earlier versions can benefit from better statistics. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wrong plan sequential scan instead of an index one
On Fri, Mar 30, 2007 at 04:46:11AM -0600, Michael Fuhr wrote: Have you tried increasing the statistics target on l_pvcp.value? I ran your queries against canned data in 8.2.3 and better statistics resulted in more accurate row count estimates for this and other parts of the plan. I don't recall if estimates for non-leading-character matches in earlier versions can benefit from better statistics. This might work only in 8.2. I see the following in the Release Notes: * Improve the optimizer's selectivity estimates for LIKE, ILIKE, and regular expression operations (Tom) -- Michael Fuhr ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Nested Loop
On Mon, Mar 26, 2007 at 05:34:39PM +0530, Gauri Kanekar wrote: how to speedup nested loop queries and by which parameters. Please post a query you're trying to tune and the EXPLAIN ANALYZE output, as well as any changes you've already made in postgresql.conf or configuration variables you've set in a particular session. Without more information we can't give much advice other than to make sure you're vacuuming and analyzing the tables often enough to keep them from becoming bloated with dead rows and to keep the statistics current, and to review a configuration checklist such as this one: http://www.powerpostgresql.com/PerfList -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance of count(*)
On Thu, Mar 22, 2007 at 01:29:46PM +0100, Andreas Kostyrka wrote: * Andreas Tille [EMAIL PROTECTED] [070322 13:24]: Well, to be honest I'm not really interested in the performance of count(*). I was just discussing general performance issues on the phone line and when my colleague asked me about the size of the database he just wonderd why this takes so long for a job his MS-SQL server is much faster. So in principle I was just asking a first question that is easy to ask. Perhaps I come up with more difficult optimisation questions. Simple. MSSQL is optimized for this case, and uses older datastructures. PG uses a MVCC storage, which is not optimized for this usecase. It's quite fast for different kinds of queries. Ask about performing concurrent selects, inserts, updates, and deletes in SQL Server and about the implications on ACID of locking hints such as NOLOCK. Then consider how MVCC handles concurrency without blocking or the need for dirty reads. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Execution plan changed after upgrade from 7.3.9 to 8.2.3
On Tue, Mar 13, 2007 at 09:19:47AM +0100, [EMAIL PROTECTED] wrote: Is there an option in the 8.2.3 to change in order to have the same execution plan than before ? Let's see if we can figure out why 8.2.3 is choosing a bad plan. Have you run ANALYZE on the tables in 8.2.3? Could you post the query and the complete output of EXPLAIN ANALYZE (preferably without wrapping) for both versions? -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Not Picking Index
On Fri, Feb 16, 2007 at 06:26:51PM +0530, Gauri Kanekar wrote: I want to run a Select Query on a table. But i dont want the query to pick a index defined on that table. So can i instruct the planner not to pick that index. Why don't you want the planner to use the index? Is there a specific index you want to ignore or do you want the planner to ignore all indexes? What problem are you trying to solve? -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Moving a tablespace
On Tue, Aug 22, 2006 at 06:16:54PM -0700, Craig A. James wrote: Is there a way to move a tablespace to a new location without a dump/restore? I, er, this hypothetical guy, knows he can move it and put a symbolic link in for /disk2, but this is somewhat unsatisfactory since /disk2 would have to exist forever. The last paragraph of the Tablespaces documentation might be helpful: http://www.postgresql.org/docs/8.1/interactive/manage-ag-tablespaces.html The directory $PGDATA/pg_tblspc contains symbolic links that point to each of the non-built-in tablespaces defined in the cluster. Although not recommended, it is possible to adjust the tablespace layout by hand by redefining these links. Two warnings: do not do so while the postmaster is running; and after you restart the postmaster, update the pg_tablespace catalog to show the new locations. (If you do not, pg_dump will continue to show the old tablespace locations.) I just tested this and it appeared to work, but this hypothetical DBA might want to wait for others to comment before proceeding. He might also want to initdb and populate a test cluster and practice the procedure before doing it for real. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Big differences in plans between 8.0 and 8.1
On Sat, Jul 15, 2006 at 04:14:11PM +0200, Gabriele Turchi wrote: Hi all. I have a strange (and serious) problem with an application ported from postgres 8.0 to 8.1. The old installation is postgresql-8.0.4-2.FC4.1 running on a Fedora 4, the new one is postgresql-8.1.4-1.FC5.1 running on a fedora 5. Some query is now _very_ slow. I've found some deep differences between query plans. Have you run ANALYZE in 8.1? Some of the row count estimates in the 8.1 plan differ significantly from the actual number of rows returned, while in the 8.0 plan the estimates are accurate. For example, in one case the 8.0 plan shows 349 rows estimated, 349 rows returned: - Seq Scan on registrazioni (cost=0.00..11.98 rows=349 width=19) (actual time=0.029..2.042 rows=349 loops=1) Filter: (date((now() - '02:00:00'::interval)) = data) but the 8.1 plan shows 2 rows estimated, 349 rows returned: - Seq Scan on registrazioni (cost=0.00..11.98 rows=2 width=44) (actual time=0.025..2.315 rows=349 loops=1) Filter: (date((now() - '02:00:00'::interval)) = data) This suggests that the 8.1 statistics are out of date, possibly because ANALYZE or VACUUM ANALYZE hasn't been run since the data was loaded. Try running ANALYZE in 8.1 and post the new plans if that doesn't help. -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] hyper slow after upgrade to 8.1.4
On Thu, Jul 13, 2006 at 08:22:46AM -0500, Medora Schauer wrote: Can it be that the connection delay is because first an IPv6 socket is trying to be established and when that fails an IPv4 socket is created? A sniffer like tcpdump or ethereal might reveal why connecting is so slow. The problem might be with DNS queries for (IPv6) records prior to queries for A (IPv4) records; see this thread from almost a year ago: http://archives.postgresql.org/pgsql-general/2005-08/msg00216.php -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Solaris shared_buffers anomaly?
On Tue, Jun 13, 2006 at 06:22:07PM -0400, Tom Lane wrote: Mischa Sandberg [EMAIL PROTECTED] writes: vmstat showed that it was swapping like crazy. Dropped shared_buffers back down again. Swapping stopped. Does Solaris have any call that allows locking a shmem segment in RAM? The Solaris 9 shmctl manpage mentions this token: SHM_LOCK Lock the shared memory segment specified by shmid in memory. This command can be executed only by a process that has an effective user ID equal to super-user. -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Posrgres speed problem
On Mon, Jun 12, 2006 at 04:38:57PM +0200, Ruben Rubio Rey wrote: I have two similar servers, one in production and another for testing purposes. Databases are equal (with a difference of some hours) In the testing server, an sql sentence takes arround 1 sec. In production server (low server load) takes arround 50 secs, and uses too much resources. Explain analyze takes too much load, i had to cancel it! The EXPLAIN ANALYZE output would be helpful, but if you don't want to run it to completion then please post the output of EXPLAIN ANALYZE for the fast system and EXPLAIN (without ANALYZE) for the slow one. As someone else asked, are you running ANALYZE regularly? What about VACUUM? -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] lowering priority automatically at connection
On Thu, May 25, 2006 at 06:16:24PM +0200, Chris Mair wrote: I find this very helpful: Lowering the priority of a PostgreSQL query http://weblog.bignerdranch.com/?p=11 Now I was wondering whether one could have a SELECT pg_setpriority(10); executed automatically each time a certain user connects (not necessarily using psql)? Beware that setting priorities can have unintended, adverse effects. Use a search engine to find information about priority inversion before deciding that query priorities are a good idea. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Adding and filling new column on big table
On Wed, May 17, 2006 at 03:19:26AM +0200, Jonathan Blitz wrote: I have a table of about 500,000 rows. I need to add a new column and populate it. So, I have tried to run the following command. The command never finishes (I gave up after about and hour and a half!). If you install contrib/pgstattuple you can figure out how fast the update is running. Run SELECT * FROM pgstattuple('mytable') a few times and note the rate at which dead_tuple_count is increasing. If it's not increasing at all then query pg_locks and look for locks where granted is false. I created a test table, populated it with 500,000 rows of random data, and ran the update you posted. On a 500MHz Pentium III with 512M RAM and a SCSI drive from the mid-to-late 90s, running PostgreSQL 8.1.3 on FreeBSD 6.1, the update finished in just over two minutes. The table had one index (the primary key). Note that none of the columns have indexes. Do you mean that no columns in the table have indexes? Or that the columns referenced in the update don't have indexes but that other columns do? What does \d mytable show? Do other tables have foreign key references to this table? What non-default settings do you have in postgresql.conf? What version of PostgreSQL are you running and on what platform? How busy is the system? What's the output of EXPLAIN UPDATE mytable ...? -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Array performance
On Fri, Mar 24, 2006 at 07:06:19AM -0600, Jim C. Nasby wrote: On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote: mmm ... i have read in postgres documentation that null values on arrays are not supported ... Damn, you're right. Another reason I tend to stay away from them... 8.2 will support NULL array elements. http://archives.postgresql.org/pgsql-committers/2005-11/msg00385.php http://developer.postgresql.org/docs/postgres/arrays.html test= SELECT '{1,2,NULL,3,4}'::integer[]; int4 {1,2,NULL,3,4} (1 row) -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help optimizing a slow index scan
On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote: Dan Harris [EMAIL PROTECTED] writes: Furthermore, by doing so, I am tying my queries directly to postgres-isms. One of the long term goals of this project is to be able to fairly transparently support any ANSI SQL-compliant back end with the same code base. Unfortunately, there isn't any portable or standard (not exactly the same thing ;-)) SQL functionality for dealing gracefully with two-dimensional searches, which is what your lat/long queries are. The OpenGIS Simple Features Specification[1] is a step in that direction, no? PostGIS[2], MySQL[3], and Oracle Spatial[4] implement to varying degrees. With PostGIS you do have to add non-standard operators to a query's predicate to benefit from GiST indexes on spatial columns, but the rest of the query can be straight out of the SQL and OGC standards. [1] http://www.opengeospatial.org/docs/99-049.pdf [2] http://www.postgis.org/ [3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html [4] http://www.oracle.com/technology/products/spatial/index.html -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PG Statistics
On Mon, Mar 13, 2006 at 06:49:39PM -0500, mcelroy, tim wrote: Does anyone know how much of a performance hit turning stats_block_level and stats_row_level on will incur? Do both need to be on to gather cache related statistics? I know the annotated_conf_80 document states to only turn them on for debug but if they're not that performance intensive I cannot see the harm. I ran some tests a few months ago and found that stats_command_string had a significant impact, whereas stats_block_level and stats_row_level were almost negligible. Here are my test results: http://archives.postgresql.org/pgsql-performance/2005-12/msg00307.php Your results may vary. If you see substantially different results then please post the particulars. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Process Time X200
On Fri, Mar 10, 2006 at 08:11:44AM +0100, NbForYou wrote: As you can see the query isn't useful anymore because of the processtime. Please Also notice that both systems use a different query plan. Also on the webhost we have a loop of 162409 (403 rows * 403 rows). Both systems also use a different postgresql version. But I cannot believe that the performance difference between 1 version could be this big regarding self outer join queries! What versions are both servers? I'd guess that the webhost is using 7.3 or earlier and you're using 7.4 or later. I created a table like yours, populated it with test data, and ran your query on several versions of PostgreSQL. I saw the same horrible plan on 7.3 and the same good plan on later versions. The 7.4 Release Notes do mention improvements in query planning; apparently one of those improvements is making the difference. -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query time
On Fri, Mar 10, 2006 at 11:05:57AM +0100, Ruben Rubio Rey wrote: Sometimes some big query takes between 5 to 15 seconds. It happens sometimes all the day it does not depend if database is busy. I have measured that sentence in 15 - 70 ms in normal circunstances. Is it the *exact* same query, including the values you're querying for? The same query with different values can run with different plans depending on row count estimates. It might be useful to see the query string and the EXPLAIN ANALYZE output for a fast query and a slow one. How many tables are you querying? Might you be hitting geqo_threshold (default 12)? If so then the following thread might be helpful: http://archives.postgresql.org/pgsql-performance/2006-01/msg00132.php -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Can anyone explain this pgbench results?
On Mon, Mar 06, 2006 at 07:46:05PM +0100, Joost Kraaijeveld wrote: Michael Fuhr wrote: What's your setting? Default. Have you tweaked postgresql.conf at all? If so, what non-default settings are you using? Are your test results more consistent if you execute CHECKPOINT between them? Could you tell me how I could do that? Connect to the database as a superuser and execute a CHECKPOINT statement. http://www.postgresql.org/docs/8.1/interactive/sql-checkpoint.html From the command line you could do something like psql -c checkpoint pgbench -c 10 -t 150 test psql -c checkpoint pgbench -c 10 -t 150 test psql -c checkpoint pgbench -c 10 -t 150 test -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Can anyone explain this pgbench results?
[Please copy the mailing list on replies.] On Mon, Mar 06, 2006 at 09:38:20PM +0100, Joost Kraaijeveld wrote: Michael Fuhr wrote: Have you tweaked postgresql.conf at all? If so, what non-default settings are you using? Yes, I have tweaked the following settings: shared_buffers = 4 work_mem = 512000 maintenance_work_mem = 512000 max_fsm_pages = 4 effective_cache_size = 131072 Are you sure you need work_mem that high? How did you decide on that value? Are all other settings at their defaults? No changes to the write ahead log (WAL) or background writer (bgwriter) settings? What version of PostgreSQL are you running? The paths in your original message suggest 8.1.x. Are your test results more consistent psql -c checkpoint pgbench -c 10 -t 150 test psql -c checkpoint pgbench -c 10 -t 150 test psql -c checkpoint pgbench -c 10 -t 150 test OK, that leads to a consistant hight score. I also noticed that psql -c checkpoint results in I/O on the database partition but not on the partition that has the logfiles (pg_xlog directory). Do you know if that how it should be? A checkpoint updates the database files with the data from the write-ahead log; you're seeing those writes to the database partition. The postmaster does checkpoints every checkpoint_timeout seconds (default 300) or every checkpoint_segment log segments (default 3); it also uses a background writer to trickle pages to the database files between checkpoints so the checkpoints don't have as much work to do. I've been wondering if your pgbench runs are being affected by that background activity; the fact that you get consistently good performance after forcing a checkpoint suggests that that might be the case. If you run pgbench several times without intervening checkpoints, do your postmaster logs have any messages like checkpoints are occurring too frequently? It might be useful to increase checkpoint_warning up to the value of checkpoint_timeout and then see if you get any such messages during pgbench runs. If checkpoints are happening a lot more often than every checkpoint_timeout seconds then try increasing checkpoint_segments (assuming you have the disk space). After doing so, restart the database and run pgbench several times without intervening checkpoints and see if performance is more consistent. Note that tuning PostgreSQL for pgbench performance might be irrelevant for your actual needs unless your usage patterns happen to resemble what pgbench does. -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SQL Function Performance
On Tue, Feb 14, 2006 at 11:33:57AM +0200, Adnan DURSUN wrote: - Nested Loop (cost=5.90..267.19 rows=3 width=101) (actual time=76.240..30974.777 rows=63193 loops=1) - Nested Loop (cost=5.90..123.48 rows=26 width=73) (actual time=32.082..4357.786 rows=14296 loops=1) A prepared query is planned before the parameters' values are known, so the planner can't take full advantage of column statistics to estimate row counts. The planner must therefore decide on a plan that should be reasonable in most cases; apparently this isn't one of those cases, as the disparity between estimated and actual rows shows. Maybe Tom (one of the core developers) can comment on whether anything can be done to improve the plan in this case. Absent a better solution, you could write a PL/pgSQL function and build the query as a text string, then EXECUTE it. That would give you a new plan each time, one that can take better advantage of statistics, at the cost of having to plan the query each time you call the function (but you probably don't care about that cost as long as the overall results are better). Here's an example: CREATE FUNCTION fooquery(qval text) RETURNS SETOF foo AS $$ DECLARE rowfoo%ROWTYPE; query text; BEGIN query := 'SELECT * FROM foo WHERE val = ' || quote_literal(qval); FOR row IN EXECUTE query LOOP RETURN NEXT row; END LOOP; RETURN; END; $$ LANGUAGE plpgsql STABLE STRICT; -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] SQL Function Performance
On Mon, Feb 13, 2006 at 07:57:07PM -0500, Tom Lane wrote: Adnan DURSUN [EMAIL PROTECTED] writes: EXPLAIN ANALYZE EXECUTE stmt (...); Here is the EXPLAIN ANALYZE output for prepared statement : This is exactly the same as the other plan --- you did not parameterize the query. To see what's going on, you need to insert PREPARE parameters in the places where the function uses plpgsql variables. Actually it was an SQL function, but that also does PREPARE/EXECUTE, right? Adnan, what Tom is saying is that I requested this (simplified): PREPARE stmt (integer) AS SELECT * FROM foo WHERE id = $1; EXPLAIN ANALYZE EXECUTE stmt (12345); but instead you appear to have done this: PREPARE stmt AS SELECT * FROM foo WHERE id = 12345; EXPLAIN ANALYZE EXECUTE stmt; We can tell because if you had done it the first way (parameterized) then the EXPLAIN ANALYZE output would have shown the parameters as $1, $2, $3, etc., which it didn't. -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SQL Function Performance
On Sun, Feb 12, 2006 at 10:25:28PM +0200, Adnan DURSUN wrote: My database has an SQL function. The result comes in 30-40 seconds when i use the SQL function. On the other hand; The result comes 300-400 milliseconds when i run the SQL statement. Any idea ?? Have you analyzed the tables? If that's not the problem then could you post the EXPLAIN ANALYZE output for the direct query and for a prepared query? For the prepared query do this: PREPARE stmt (varchar, date, int4, varchar, varchar) AS SELECT ... ; where ... is the same SQL as in the function body, including the numbered parameters ($1, $2, etc.). To execute the query do this: EXPLAIN ANALYZE EXECUTE stmt (...); Where ... is the same parameter list you'd pass to the function (the same values you used in the direct query). If you need to re-prepare the query then run DEALLOCATE stmt before doing so. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Sequential scan being used despite indexes
On Tue, Jan 31, 2006 at 07:29:51PM -0800, Joshua D. Drake wrote: Any ideas? What does explain analyze say? Also, have the tables been vacuumed and analyzed? -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Sequential scan being used despite indexes
On Wed, Feb 01, 2006 at 01:33:08PM +0900, James Russell wrote: Reading about this issue further in the FAQ, it seems that I should ensure that Postgres has adequate and accurate information about the tables in question by regularly running VACUUM ANALYZE, something I don't do currently. Many people use a cron job (or the equivalent) to run VACUUM ANALYZE at regular intervals; some also use the pg_autovacuum daemon, which is a contrib module in 8.0 and earlier and part of the backend as of 8.1. How often to vacuum/analyze depends on usage. Once per day is commonly cited, but busy tables might need it more often than that. Just recently somebody had a table that could have used vacuuming every five minutes or less (all records were updated every 30 seconds); pg_autovacuum can be useful in such cases. I disabled SeqScan as per the FAQ, and it indeed was a lot slower so Postgres was making the right choice in this case. The planner might be making the right choice given the statistics it has, but it's possible that better statistics would lead to a different plan, perhaps one where an index scan would be faster. What happens if you run VACUUM ANALYZE on all the tables, then run the query again with EXPLAIN ANALYZE? -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Incorrect Total runtime Reported by Explain Analyze!?
On Thu, Jan 26, 2006 at 04:49:59PM +, Richard Huxton wrote: Jozsef Szalay wrote: I have seen it on occasion that the total runtime reported by explain analyze was much higher than the actual time the query needed to complete. The differences in my case ranged between 20-120 seconds. I'm just curious if anyone else has experienced this and whether there is something that I can do to convince explain analyze to report the execution time of the query itself rather than the time of its own execution. Engine version is 8.1.1. I think it's down to all the gettime() calls that have to be made to measure how long each stage of the query takes. In some cases these can take a substantial part of the overall query time. Another possibility is that the total query time was indeed that long because the query was blocked waiting for a lock. For example: T1: BEGIN; T2: BEGIN; T1: SELECT * FROM foo WHERE id = 1 FOR UPDATE; T2: EXPLAIN ANALYZE UPDATE foo SET x = x + 1 WHERE id = 1; T1: (do something for a long time) T1: COMMIT; When T2's EXPLAIN ANALYZE finally returns it'll show something like this: test= EXPLAIN ANALYZE UPDATE foo SET x = x + 1 WHERE id = 1; QUERY PLAN --- Index Scan using foo_pkey on foo (cost=0.00..3.92 rows=1 width=14) (actual time=0.123..0.138 rows=1 loops=1) Index Cond: (id = 1) Total runtime: 31926.304 ms (3 rows) SELECT queries can be blocked by operations that take an Access Exclusive lock, such as CLUSTER, VACUUM FULL, or REINDEX. Have you ever examined pg_locks during one of these queries to look for ungranted locks? If this weren't 8.1 I'd ask if you had any triggers (including foreign key constraints), whose execution time EXPLAIN ANALYZE doesn't show in earlier versions. For example: 8.1.2: test= EXPLAIN ANALYZE DELETE FROM foo WHERE id = 1; QUERY PLAN -- Index Scan using foo_pkey on foo (cost=0.00..3.92 rows=1 width=6) (actual time=0.136..0.154 rows=1 loops=1) Index Cond: (id = 1) Trigger for constraint bar_fooid_fkey: time=1538.054 calls=1 Total runtime: 1539.732 ms (4 rows) 8.0.6: test= EXPLAIN ANALYZE DELETE FROM foo WHERE id = 1; QUERY PLAN -- Index Scan using foo_pkey on foo (cost=0.00..3.92 rows=1 width=6) (actual time=0.124..0.147 rows=1 loops=1) Index Cond: (id = 1) Total runtime: 1746.173 ms (3 rows) -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Inconsistant query plan
On Tue, Jan 24, 2006 at 04:15:57PM -0700, Daniel Gish wrote: We are running Postgresql 8.1, and getting dramatically inconsistant results after running VACUUM ANALYZE. Sometimes after analyzing the database, the query planner chooses a very efficient plan (15 rows, 4.744 ms), and sometimes a terrible one (24 rows, 3536.995 ms). Here's the abbreviated query: SELECT * FROM t1 INNER JOIN (t2 INNER JOIN (t3 INNER JOIN t4 ON t3.gid = t4.gid) ON t3.gid = t2.gid) ON t2.eid = t1.eid WHERE ... How abbreviated is that example? Are you actually joining more tables than that? In another recent thread varying plans were attributed to exceeding geqo_threshold: http://archives.postgresql.org/pgsql-performance/2006-01/msg00132.php Does your situation look similar? -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Use of Stored Procedures and
On Tue, Jan 17, 2006 at 09:04:53AM +, Marcos wrote: I already read the documentation for to use the SPI_PREPARE and SPI_EXEC... but sincerely I don't understand how I will use this resource in my statements. What statements? What problem are you trying to solve? -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert without oids
On Fri, Jan 13, 2006 at 03:10:11PM -0500, Michael Stone wrote: Are there plans on updating the insert API for the post-OID world? Are you looking for this TODO item? * Allow INSERT/UPDATE ... RETURNING new.col or old.col This is useful for returning the auto-generated key for an INSERT. One complication is how to handle rules that run as part of the insert. http://www.postgresql.org/docs/faqs.TODO.html -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index isn't used during a join.
On Wed, Jan 11, 2006 at 12:56:55AM -0700, Michael Fuhr wrote: WHERE ... AND doy = EXTRACT(doy FROM now() - '24 hour'::interval) AND doy = EXTRACT(doy FROM now()) To work on 1 Jan this should be more like WHERE ... AND (doy = EXTRACT(doy FROM now() - '24 hour'::interval) OR doy = EXTRACT(doy FROM now())) In any case the point is to add conditions to the WHERE clause that will use an index on the table for which you're currently getting a sequential scan. -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index isn't used during a join.
On Wed, Jan 11, 2006 at 08:02:37AM -0700, Robert Creager wrote: The query is wrong as stated, as it won't work when the interval crosses a year boundary, but it's a stop gap for now. Yeah, I realized that shortly after I posted the original and posted a correction. http://archives.postgresql.org/pgsql-performance/2006-01/msg00104.php -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Showing Column Statistics Number
On Wed, Jan 11, 2006 at 04:05:18PM -0600, Dave Dutcher wrote: I've looked around through the docs, but can't seem to find an answer to this. If I change a column's statistics with Alter table alter column set statistics n, is there a way I can later go back and see what the number is for that column? I want to be able to tell which columns I've changed the statistics on, and which ones I haven't. pg_attribute.attstattarget http://www.postgresql.org/docs/8.1/interactive/catalog-pg-attribute.html -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] indexes on primary and foreign keys
On Thu, Jan 12, 2006 at 10:26:58AM +0900, Michael Glaesemann wrote: On Jan 12, 2006, at 9:36 , K C Lau wrote: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index cities_pkey for table cities Is there a way to suppress this notice when I create tables in a script? Set[1] your log_min_messages to WARNING or higher[2]. Or client_min_messages, depending on where you don't want to see the notice. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Index isn't used during a join.
On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote: The query is now correct, but still is slow because of lack of index usage. I don't know how to structure the query correctly to use the index. Have you tried adding restrictions on doy in the WHERE clause? Something like this, I think: WHERE ... AND doy = EXTRACT(doy FROM now() - '24 hour'::interval) AND doy = EXTRACT(doy FROM now()) Something else occurred to me: do you (or will you) have more than one year of data? If so then matching on doy could be problematic unless you also check for the year, or unless you want to match more than one year. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index isn't used during a join.
On Mon, Jan 09, 2006 at 09:23:38PM -0700, Robert Creager wrote: I'm working with a query to get more info out with a join. The base query works great speed wise because of index usage. When the join is tossed in, the index is no longer used, so the query performance tanks. The first query you posted returns 285 rows and the second returns over one million; index usage aside, that difference surely accounts for a performance penalty. And as is often pointed out, index scans aren't always faster than sequential scans: the more of a table a query has to fetch, the more likely a sequential scan will be faster. Have the tables been vacuumed and analyzed? The planner's estimates for windspeed are pretty far off, which could be affecting the query plan: - Sort (cost=12997.68..13157.98 rows=64120 width=28) (actual time=2286.155..2286.450 rows=284 loops=1) Sort Key: date_part('doy'::text, unmunge_time(windspeed.time_group)) - Seq Scan on windspeed (cost=0.00..7878.18 rows=64120 width=28) (actual time=2279.275..2285.271 rows=284 loops=1) Filter: (unmunge_time(time_group) (now() - '24:00:00'::interval)) That's a small amount of the total query time, however, so although an index scan might help it probably won't provide the big gain you're looking for. Have you done any tests with enable_seqscan disabled? That'll show whether an index or bitmap scan would be faster. And have you verified that the join condition is correct? Should the query be returning over a million rows? -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance problems with 8.1.1 compared to 7.4.7
On Tue, Dec 27, 2005 at 05:09:28PM +0100, Albert Cervera Areny wrote: However, now we have a query that is much slower with 8.1 compared to 7.4. The query lasts 7minutes (all the times we try) with 8.1, keeping CPU usage at 93~97% while it lasts 25 seconds in 7.4 the first time going down to 4 seconds the following tries. We're not experts at all but we can't see anything strange with the differences of EXPLAIN in the queries. Below I paste the query and the EXPLAIN output. Could you post the EXPLAIN ANALYZE output of the query on both systems? That'll show how accurate the planner's estimates are. Have you run ANALYZE (or VACUUM ANALYZE) on the tables in both versions? The row count estimates in the 8.1.1 query differ from those in the 7.4.7 query. Are the two versions using the same data set? Are your configuration settings the same in both versions? You mentioned increasing work_mem, but what about others like effective_cache_size, random_page_cost, and shared_buffers? -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Wrong index used when ORDER BY LIMIT 1
On Wed, Dec 21, 2005 at 07:03:00PM +0100, Sz?cs Gbor wrote: Version: 7.4.6 [...] Query is: SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1. I expected the planner to choose the index on muvelet, then sort by idopont. Instead, it took the other index. I think the planner is guessing that since you're ordering on idopont, scanning the idopont index will find the first matching row faster than using the muvelet index would. In many cases that's a good bet, but in this case the guess is wrong and you end up with a suboptimal plan. I just ran some tests with 8.1.1 and it chose the better plan for a query similar to what you're doing. One of the developers could probably explain why; maybe it's because of the changes that allow better use of multicolumn indexes. Try 8.1.1 if you can and see if you get better results. -- workaround 2: quite ugly but seems to work (at least for this -- one test case): # explain analyze select idopont from (select idopont from muvelet_vonalkod where muvelet=6859 order by idopont) foo order by idopont limit 1; Another workaround is to use OFFSET 0 in the subquery. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Speed of different procedural language
On Wed, Dec 21, 2005 at 12:06:47PM +0100, Steinar H. Gunderson wrote: On Sun, Dec 18, 2005 at 01:10:21AM -, Ben Trewern wrote: I know I should be writing these in C but that's a bit beyond me. I was going to try PL/Python or PL/Perl or even PL/Ruby. Has anyone any idea which language is fastest, or is the data access going to swamp the overhead of small functions? I'm not sure if it's what you ask for, but there _is_ a clear difference between the procedural languages -- I've had a 10x speed increase from rewriting PL/PgSQL stuff into PL/Perl, for instance. The difference is clear only in specific cases; just because you saw a 10x increase in some cases doesn't mean you can expect that kind of increase, or indeed any increase, in others. I've seen PL/pgSQL beat all other PL/* challengers handily many times, especially when the function does a lot of querying and looping through large result sets. I tend to use PL/pgSQL except in cases where PL/pgSQL can't do what I want or the job would be much easier in another language (e.g., string manipulation, for which I'd use PL/Perl or PL/Ruby). Even then I might use the other language only to write small functions that a PL/pgSQL function could call. As Merlin suggested, maybe Ben could tell us what he wants to do that he thinks should be written in C or a language other than PL/pgSQL. Without knowing what problem is to be solved it's near impossible to recommend an appropriate tool. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speed of different procedural language
On Wed, Dec 21, 2005 at 10:38:10PM +0100, Steinar H. Gunderson wrote: On Wed, Dec 21, 2005 at 02:24:42PM -0700, Michael Fuhr wrote: The difference is clear only in specific cases; just because you saw a 10x increase in some cases doesn't mean you can expect that kind of increase, or indeed any increase, in others. I've seen PL/pgSQL beat all other PL/* challengers handily many times, especially when the function does a lot of querying and looping through large result sets. That's funny, my biggest problems with PL/PgSQL have been (among others) exactly with large result sets... Out of curiosity, do you have a simple test case? I'd be interested in seeing what you're doing in PL/pgSQL that's contradicting what I'm seeing. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Speed of different procedural language
On Thu, Dec 22, 2005 at 02:08:23AM +0100, Steinar H. Gunderson wrote: On Wed, Dec 21, 2005 at 03:10:28PM -0700, Michael Fuhr wrote: That's funny, my biggest problems with PL/PgSQL have been (among others) exactly with large result sets... Out of curiosity, do you have a simple test case? I'd be interested in seeing what you're doing in PL/pgSQL that's contradicting what I'm seeing. I'm not sure if I have the code anymore (it was under 7.4 or 8.0), but it was largely scanning through ~2 million rows once, noting differences from the previous rows as it went. In that case, I didn't benchmark against any of the other PL/* languages, but it was pretty clear that even on a pretty speedy Opteron, it was CPU bound, which it really shouldn't have been. Try looping through two million rows with PL/Perl or PL/Tcl and you'll probably see significantly worse performance than with PL/pgSQL -- so much worse that I'd be surprised to see those languages make up the difference with whatever processing they'd be doing for each row unless it was something they're particularly good at and PL/pgSQL is particularly bad at. In 8.1 PL/Perl has a couple of ways to fetch query results: spi_exec_query to fetch all the rows at once into a single data structure, and spi_query/spi_fetchrow to fetch the rows one at a time. In my tests with one million rows, spi_exec_query was around 8 times slower than a loop in PL/pgSQL, not to mention requiring a lot of memory. spi_query/spi_fetchrow was about 25 times slower but didn't require the amount of memory that spi_exec_query did. A PL/Tcl function that used spi_exec was about 10 times slower than PL/pgSQL, or only slightly slower than PL/Perl and spi_exec_query. If you didn't benchmark the two million row query, do you have an example that you did benchmark? I don't doubt that PL/Perl and other langauges can do some things faster than PL/pgSQL, but looping through large result sets doesn't seem to be one of them. -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] make bulk deletes faster?
On Sat, Dec 17, 2005 at 09:10:40PM -0800, James Klo wrote: I'd like some suggestions on how to get the deletes to happen faster, as while deleting individually appears to extremely fast, when I go to delete lots of rows the operation takes an extremely long time to complete (5000 rows takes about 3 minutes, 100 rows takes almost close to 4 hours or more depending upon server load; wall time btw). Those times do seem excessive -- do any other tables have foreign key references to the table you're deleting from? If so, do those tables have indexes on the referring columns? Does this table or any referring table have triggers? Also, are you regularly vacuuming and analyzing your tables? Have you examined pg_locks to see if an unacquired lock might be slowing things down? -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] How much expensive are row level statistics?
On Mon, Dec 12, 2005 at 10:20:45PM -0500, Tom Lane wrote: Given the rather lackadaisical way in which the stats collector makes the data available, it seems like the backends are being much too enthusiastic about posting their stats_command_string status immediately. Might be worth thinking about how to cut back the overhead by suppressing some of these messages. Would a GUC setting akin to log_min_duration_statement be feasible? Does the backend support, or could it be easily modified to support, a mechanism that would post the command string after a configurable amount of time had expired, and then continue processing the query? That way admins could avoid the overhead of posting messages for short-lived queries that nobody's likely to see in pg_stat_activity anyway. -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How much expensive are row level statistics?
On Mon, Dec 12, 2005 at 01:33:27PM -0500, Merlin Moncure wrote: The cost depends on your usage patterns. I did tests with one of my applications and saw no significant performance difference for simple selects, but a series of insert/update/delete operations ran about 30% slower when block- and row-level statistics were enabled versus when the statistics collector was disabled. That approximately confirms my results, except that the penalty may even be a little bit higher in the worst-case scenario. Row level stats hit the hardest if you are doing 1 row at a time operations over a persistent connection. That's basically how the application I tested works: it receives data from a stream and performs whatever insert/update/delete statements are necessary to update the database for each chunk of data. Repeat a few thousand times. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How much expensive are row level statistics?
On Mon, Dec 12, 2005 at 10:23:42AM -0300, Alvaro Herrera wrote: Michael Fuhr wrote: The cost depends on your usage patterns. I did tests with one of my applications and saw no significant performance difference for simple selects, but a series of insert/update/delete operations ran about 30% slower when block- and row-level statistics were enabled versus when the statistics collector was disabled. This series of i/u/d operations ran with no sleep in between, right? I wouldn't expect a normal OLTP operation to be like this. (If it is you have a serious shortage of hardware ...) There's no sleeping but there is some client-side processing between groups of i/u/d operations. As I mentioned in another message, the application reads a chunk of data from a stream, does a few i/u/d operations to update the database, and repeats several thousand times. The hardware is old but it's adequate for this application. What kind of overhead would you expect? -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How much expensive are row level statistics?
On Mon, Dec 12, 2005 at 06:01:01PM -0500, Tom Lane wrote: IIRC, the only significant cost from enabling stats is the cost of transmitting the counts to the stats collector, which is a cost basically paid once at each transaction commit. So short transactions will definitely have more overhead than longer ones. Even for a really simple transaction, though, 30% seems high --- the stats code is designed deliberately to minimize the penalty. Now there goes Tom with his skeptical eye again, and here comes me saying oops again. Further tests show that for this application the killer is stats_command_string, not stats_block_level or stats_row_level. Here are timings for the same set of operations (thousands of insert, update, and delete statements in one transaction) run under various settings: stats_command_string = off stats_block_level = off stats_row_level = off time: 2:09.46 stats_command_string = off stats_block_level = on stats_row_level = off time: 2:12.28 stats_command_string = off stats_block_level = on stats_row_level = on time: 2:14.38 stats_command_string = on stats_block_level = off stats_row_level = off time: 2:50.58 stats_command_string = on stats_block_level = on stats_row_level = on time: 2:53.76 [Wanders off, swearing that he ran these tests before and saw higher penalties for block- and row-level statistics.] -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How much expensive are row level statistics?
On Sun, Dec 11, 2005 at 11:53:36AM +, Carlos Benkendorf wrote: I would like to use autovacuum but is not too much expensive collecting row level statistics? The cost depends on your usage patterns. I did tests with one of my applications and saw no significant performance difference for simple selects, but a series of insert/update/delete operations ran about 30% slower when block- and row-level statistics were enabled versus when the statistics collector was disabled. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Executing a shell command from a PG function
On Sat, Dec 10, 2005 at 04:55:56PM +0100, Yves Vindevogel wrote: Is it possible to run a shell script, passing values of fields to it, in a Postgres function ? Not directly from SQL or PL/pgSQL functions, but you can execute shell commands with the untrusted versions of PL/Perl, PL/Tcl, PL/Python, etc. There's even a PL/sh: http://pgfoundry.org/projects/plsh/ -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Stored Procedure
On Tue, Nov 22, 2005 at 07:29:37PM +0100, Yves Vindevogel wrote: Is there another way in PG to return a recordset from a function than to declare a type first ? In 8.1 some languages support OUT and INOUT parameters. CREATE FUNCTION foo(IN x integer, INOUT y integer, OUT z integer) AS $$ BEGIN y := y * 10; z := x * 10; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; SELECT * FROM foo(1, 2); y | z + 20 | 10 (1 row) CREATE FUNCTION fooset(IN x integer, INOUT y integer, OUT z integer) RETURNS SETOF record AS $$ BEGIN y := y * 10; z := x * 10; RETURN NEXT; y := y + 1; z := z + 1; RETURN NEXT; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; SELECT * FROM fooset(1, 2); y | z + 20 | 10 21 | 11 (2 rows) -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Stored Procedure
On Tue, Nov 22, 2005 at 11:20:09PM +0100, Yves Vindevogel wrote: 8.1, hmm, that's brand new. Yes, but give it a try, at least in a test environment. The more people use it, the more we'll find out if it has any problems. But, still, it's quite some coding for a complete recordset, not ? How so? The examples I posted are almost identical to how you'd return a composite type created with CREATE TYPE or SETOF that type, except that you declare the return columns as INOUT or OUT parameters and you no longer have to create a separate type. If you're referring to how I wrote two sets of assignments and RETURN NEXT statements, you don't have to do it that way: you can use a loop, just as you would with any other set-returning function. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Perl DBD and an alarming problem
On Thu, Nov 17, 2005 at 01:04:21PM -0800, Craig A. James wrote: When I set statement_timeout in the config file, it just didn't do anything - it never timed out (PG 8.0.3). I finally found in the documentation that I can do set statement_timeout = xxx from PerlDBI on a per-client basis, and that works. You probably shouldn't set statement_timeout on a global basis anyway, but did you reload the server after you made the change? Setting statement_timeout in postgresql.conf and then reloading the server works here in 8.0.4. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Perl DBD and an alarming problem
[Please copy the mailing list on replies.] On Thu, Nov 17, 2005 at 05:38:13PM -0800, Craig A. James wrote: You probably shouldn't set statement_timeout on a global basis anyway The server is a one trick pony so setting a global timeout value is actually appropriate. Beware that statement_timeout also applies to maintenance commands like VACUUM; it might be more appropriate to set per-user timeouts with ALTER USER. If you do set a global timeout then you might want to set a per-user timeout of 0 for database superusers so maintenance activities don't get timed out. ... but did you reload the server after you made the change? Setting statement_timeout in postgresql.conf and then reloading the server works here in 8.0.4. Yes. By reload I assume you mean restarting it from scratch. Either a restart or a pg_ctl reload, which sends a SIGHUP to the server. You can effect some changes by sending a signal to a running server without having to restart it entirely. In this case, I use /etc/init.d/postgresql restart It definitely had no effect at all. I tried values clear down to 1 millisecond, but the server never timed out for any query. Did you use SHOW statement_timeout to see if the value was set to what you wanted? Are you sure you edited the right file? As a database superuser execute SHOW config_file to see what file the server is using. What exactly did the line look like after you changed it? -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Perl DBD and an alarming problem
On Wed, Nov 16, 2005 at 12:59:21PM -0800, Craig A. James wrote: eval { local $SIG{ALRM} = sub {die(Timeout);}; $time = gettimeofday; alarm 20; $sth = $dbh-prepare(a query that may take a long time...); $sth-execute(); alarm 0; }; if ($@ $@ =~ /Timeout/) { my $elapsed = gettimeofday - $time; print Timed out after $elapsed seconds; } Now the mystery: It works, but it hardly matters what time I use for the alarm call, the actual alarm event always happens at 26 seconds. I can set alarm 1 or alarm 20, and it almost always hits right at 26 seconds. High-level languages' signal handlers don't always work well with low-level libraries. I haven't dug into the Perl source code but I'd guess that since only certain things are safe to do in a signal handler, Perl's handler simply sets some kind of state that the interpreter will examine later during normal execution. If you're using only Perl facilities then that probably happens fairly timely, but if you're stuck in a low-level library (e.g., libpq) then you might have to wait until that library returns control to Perl before Perl recognizes that a signal occurred. As an example, if I run code such as yours with alarm(2) and a query that takes 5 seconds, I see the following in a process trace (from ktrace/kdump on FreeBSD): 55395 perl 0.000978 CALL poll(0xbfbfe1b8,0x1,0x) 55395 perl 1.996629 RET poll -1 errno 4 Interrupted system call 55395 perl 0.13 PSIG SIGALRM caught handler=0x281be22c mask=0x0 code=0x0 55395 perl 0.50 CALL sigprocmask(0x1,0,0x805411c) 55395 perl 0.05 RET sigprocmask 0 55395 perl 0.20 CALL sigreturn(0xbfbfde60) 55395 perl 0.07 RET sigreturn JUSTRETURN 55395 perl 0.19 CALL poll(0xbfbfe1b8,0x1,0x) 55395 perl 3.004065 RET poll 1 55395 perl 0.24 CALL recvfrom(0x3,0x81c6000,0x4000,0,0,0) 55395 perl 0.16 GIO fd 3 read 60 bytes The poll() call is interrupted by SIGALRM after 2 seconds but then it starts again and doesn't return until the query completes after the remaining 3 seconds. Only sometime later does Perl invoke the ALRM handler I installed, presumably because it can't do so until the low-level code returns control to Perl. Is there a reason you're using alarm() in the client instead of setting statement_timeout on the server? -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Searching union views not using indices
On Fri, Nov 04, 2005 at 12:38:30PM +0100, Michal Taborsky wrote: SELECT object_id FROM commonview WHERE link_id=1234567 The result is sequential scan on all tables, append, sort and then filter scan on this whole thing. Which of course is slow as hell. We use version 8.0.2. I couldn't duplicate this in 8.0.4; I don't know if anything's changed since 8.0.2 that would affect the query plan. Could you post the EXPLAIN ANALYZE output? It might also be useful to see the output with enable_seqscan disabled. Have the tables been vacuumed and analyzed recently? -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Effects of cascading references in foreign keys
Does an UPDATE of e.g. m_fld1 in t_master cause a 'lookup' in all tables which have a cascading update-rule or is this 'lookup' only triggered if the referenced column in t_master is explicitly updated? My tests suggest that a lookup on the referring key is done only if the referenced key is changed. Here's an example from 8.1beta4; I used this version because EXPLAIN ANALYZE shows triggers and the time spent in them, but I see similar performance characteristics in earlier versions. I've intentionally not put an index on the referring column to make lookups on it slow. CREATE TABLE foo (id serial PRIMARY KEY, x integer NOT NULL); CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON UPDATE CASCADE); INSERT INTO foo (x) SELECT * FROM generate_series(1, 10); INSERT INTO bar (fooid) SELECT * FROM generate_series(1, 10); ANALYZE foo; ANALYZE bar; EXPLAIN ANALYZE UPDATE foo SET x = 1 WHERE id = 10; QUERY PLAN --- Index Scan using foo_pkey on foo (cost=0.00..3.01 rows=1 width=10) (actual time=0.059..0.070 rows=1 loops=1) Index Cond: (id = 10) Total runtime: 0.633 ms (3 rows) EXPLAIN ANALYZE UPDATE foo SET x = 1, id = 20 WHERE id = 10; QUERY PLAN -- Index Scan using foo_pkey on foo (cost=0.00..3.01 rows=1 width=6) (actual time=0.082..0.092 rows=1 loops=1) Index Cond: (id = 10) Trigger for constraint bar_fooid_fkey: time=232.612 calls=1 Total runtime: 233.073 ms (4 rows) I'm not sure if this is the right place to look, but I see several places in src/backend/utils/adt/ri_triggers.c with code that looks like this: /* * No need to do anything if old and new keys are equal */ if (ri_KeysEqual(pk_rel, old_row, new_row, qkey, RI_KEYPAIR_PK_IDX)) { heap_close(fk_rel, RowExclusiveLock); return PointerGetDatum(NULL); } After removing some detail tables which are not longer needed we see an improvemed performance so at the moment it _looks_ like each update in t_master triggers a 'lookup' in each referencing table also if the referenced column (m_id) is not changed. Do you have statistics enabled? You might be able to infer what happens by looking at pg_stat_user_tables or pg_statio_user_tables before and after an update, assuming that no concurrent activity is also affecting the statistics. I suppose there's overhead just from having a foreign key constraint, and possibly additional overhead for each constraint. If so then that might explain at least some of the performance improvement. Maybe one of the developers will comment. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Effects of cascading references in foreign keys
On Sat, Oct 29, 2005 at 09:49:47AM -0500, Bruno Wolff III wrote: On Sat, Oct 29, 2005 at 08:24:32 -0600, Michael Fuhr [EMAIL PROTECTED] wrote: My tests suggest that a lookup on the referring key is done only if the referenced key is changed. Here's an example from 8.1beta4; I used this version because EXPLAIN ANALYZE shows triggers and the time spent in them, but I see similar performance characteristics in earlier versions. I've intentionally not put an index on the referring column to make lookups on it slow. It looks like this feature was added last May, so I think it only applies to 8.1. Earlier versions appear to have at least some kind of optimization. Here's a test in 7.3.11 using the same tables I used in 8.1beta4, although on a slower box. test= UPDATE foo SET x = 1 WHERE id = 10; UPDATE 1 Time: 32.18 ms test= UPDATE foo SET x = 1, id = 20 WHERE id = 10; UPDATE 1 Time: 4144.95 ms test= DROP TABLE bar; DROP TABLE Time: 240.87 ms test= UPDATE foo SET x = 1, id = 10 WHERE id = 20; UPDATE 1 Time: 63.52 ms -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] impact of stats_command_string
On Thu, Oct 20, 2005 at 01:33:07PM -0700, [EMAIL PROTECTED] wrote: If I turn on stats_command_string, how much impact would it have on PostgreSQL server's performance during a period of massive data INSERTs? Do you really need to be doing massive data INSERTs? Can you use COPY, which is much more efficient for bulk loads? http://www.postgresql.org/docs/8.0/interactive/populate.html -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What gets cached?
On Fri, Oct 21, 2005 at 07:34:30AM -0500, Martin Nickel wrote: I don't know of any way to view what is actually cached at any point in time In 8.1 (currently in beta) you can use contrib/pg_buffercache. Code for older versions is available on PgFoundry: http://pgfoundry.org/projects/pgbuffercache/ Note that pg_buffercache shows only pages in PostgreSQL's buffer cache; it doesn't show your operating system's cache. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Inefficient escape codes.
On Tue, Oct 18, 2005 at 06:07:12PM +, Rodrigo Madera wrote: 1) Is there any way for me to send the binary field directly without needing escape codes? In 7.4 and later the client/server protocol supports binary data transfer. If you're programming with libpq you can use PQexecParams() to send and/or retrieve values in binary instead of text. http://www.postgresql.org/docs/8.0/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN APIs built on top of libpq or that implement the protcol themselves might provide hooks to this capability; check your documentation. What language and API are you using? See also COPY BINARY: http://www.postgresql.org/docs/8.0/interactive/sql-copy.html 2) Will this mean that the client actually wastes my network bandwidth converting binary data to text? Or does the client transparently manage this? Binary transfer sends data in binary, not by automatically converting to and from text. -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Inefficient escape codes.
[Please copy the mailing list on replies so others can participate in and learn from the discussion.] On Tue, Oct 18, 2005 at 07:09:08PM +, Rodrigo Madera wrote: What language and API are you using? I'm using libpqxx. A nice STL-style library for C++ (I am 101% C++). I've only dabbled with libpqxx; I don't know if or how you can make it send data in binary instead of text. See the documentation or ask in a mailing list like libpqxx-general or pgsql-interfaces. Binary transfer sends data in binary, not by automatically converting to and from text. Uh, I'm sorry I didn't get that... If I send: insert into foo values('\\001\\002') will libpq send 0x01, 0x02 or 001002?? If you do it that way libpq will send the string as text with escape sequences; you can use a sniffer like tcpdump or ethereal to see this for yourself. To send the data in binary you'd call PQexecParams() with a query like INSERT INTO foo VALUES ($1). The $1 is a placeholder; the other arguments to PQexecParams() provide the data itself, the data type and length, and specify whether the data is in text format or binary. See the libpq documentation for details. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] functions and execution plan caching
On Thu, Oct 06, 2005 at 08:17:54AM -0500, Kelly Burkhart wrote: Given: - Processes which are connected to the database for long periods of time (transactions are always short). - These processes will use some functions to query data. - Lots of data is being inserted into tables that these functions query. - Vacuums are done frequently. Am I at risk of degrading performance after some time due to stale execution plans? Yes, because plans are chosen based on the statistics that were current when the function was first called. For example, if a sequential scan made sense when you first called the function, then subsequent calls will also use a sequential scan. You can see this for yourself with a simple test: create a table, populate it with a handful of records, and call a function that issues a query that can (but won't necessarily) use an index. Then add a lot of records to the table and call the function again. You'll probably notice that the function runs slower than the same query run from outside the function, and that the function runs fast if you recreate it or call it in a new session. If you set debug_print_plan to on and client_min_messages to debug1, then you'll see the plan that the function chose (but only on the first call to the function). If you have statistics enabled, then you can query pg_stat_user_tables and pg_stat_user_indexes to see whether subsequent calls use sequential or index scans (this should be done when nobody else is querying the table so the statistics represent only what you did). You can avoid cached plans by using EXECUTE. You'll have to run tests to see whether the potential gain is worth the overhead. -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Queries 15 times slower on 8.1 beta 2 than on 8.0
On Thu, Sep 22, 2005 at 03:19:05PM -0700, Josh Berkus wrote: - Seq Scan on price p (cost=0.00..11317.75 rows=581475 width=4) (actual time=0.004..1143.720 rows=581475 loops=1) Well, this is your pain point. Can we see the index scan plan on 8.1? Given that it's *expecting* only one row, I can't understand why it's using a seq scan ... I've created a simplified, self-contained test case for this: CREATE TABLE price ( priceid integer PRIMARY KEY ); CREATE TABLE supplier ( supplierid integer PRIMARY KEY ); CREATE TABLE content ( contentid integer PRIMARY KEY, supplierid integer NOT NULL REFERENCES supplier, priceid integer NOT NULL REFERENCES price ); INSERT INTO price (priceid) SELECT * FROM generate_series(1, 5); INSERT INTO supplier (supplierid) SELECT * FROM generate_series(1, 1); INSERT INTO content (contentid, supplierid, priceid) VALUES (1, 1, 5); ANALYZE price; ANALYZE supplier; ANALYZE content; EXPLAIN ANALYZE SELECT 0 FROM content c LEFT OUTER JOIN supplier s ON c.supplierid = s.supplierid LEFT OUTER JOIN price pON c.priceid = p.priceid; Here's the EXPLAIN ANALYZE from 8.0.3: Nested Loop Left Join (cost=0.00..7.06 rows=1 width=0) (actual time=0.180..0.232 rows=1 loops=1) - Nested Loop Left Join (cost=0.00..4.04 rows=1 width=4) (actual time=0.105..0.133 rows=1 loops=1) - Seq Scan on content c (cost=0.00..1.01 rows=1 width=8) (actual time=0.021..0.029 rows=1 loops=1) - Index Scan using supplier_pkey on supplier s (cost=0.00..3.01 rows=1 width=4) (actual time=0.052..0.059 rows=1 loops=1) Index Cond: (outer.supplierid = s.supplierid) - Index Scan using price_pkey on price p (cost=0.00..3.01 rows=1 width=4) (actual time=0.046..0.055 rows=1 loops=1) Index Cond: (outer.priceid = p.priceid) Total runtime: 0.582 ms Here it is from 8.1beta2: Merge Right Join (cost=4.05..1054.06 rows=1 width=0) (actual time=676.863..676.895 rows=1 loops=1) Merge Cond: (outer.priceid = inner.priceid) - Index Scan using price_pkey on price p (cost=0.00..925.00 rows=5 width=4) (actual time=0.035..383.345 rows=5 loops=1) - Sort (cost=4.05..4.05 rows=1 width=4) (actual time=0.152..0.159 rows=1 loops=1) Sort Key: c.priceid - Nested Loop Left Join (cost=0.00..4.04 rows=1 width=4) (actual time=0.082..0.111 rows=1 loops=1) - Seq Scan on content c (cost=0.00..1.01 rows=1 width=8) (actual time=0.016..0.024 rows=1 loops=1) - Index Scan using supplier_pkey on supplier s (cost=0.00..3.01 rows=1 width=4) (actual time=0.039..0.047 rows=1 loops=1) Index Cond: (outer.supplierid = s.supplierid) Total runtime: 677.563 ms If we change content's priceid then we get the same plan but faster results: UPDATE content SET priceid = 1; Merge Right Join (cost=4.05..1054.06 rows=1 width=0) (actual time=0.268..0.303 rows=1 loops=1) Merge Cond: (outer.priceid = inner.priceid) - Index Scan using price_pkey on price p (cost=0.00..925.00 rows=5 width=4) (actual time=0.049..0.061 rows=2 loops=1) - Sort (cost=4.05..4.05 rows=1 width=4) (actual time=0.187..0.192 rows=1 loops=1) Sort Key: c.priceid - Nested Loop Left Join (cost=0.00..4.04 rows=1 width=4) (actual time=0.099..0.128 rows=1 loops=1) - Seq Scan on content c (cost=0.00..1.01 rows=1 width=8) (actual time=0.025..0.033 rows=1 loops=1) - Index Scan using supplier_pkey on supplier s (cost=0.00..3.01 rows=1 width=4) (actual time=0.046..0.053 rows=1 loops=1) Index Cond: (outer.supplierid = s.supplierid) Total runtime: 0.703 ms -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Queries 15 times slower on 8.1 beta 2 than on 8.0
On Thu, Sep 22, 2005 at 07:07:41PM -0400, Jean-Pierre Pelletier wrote: I don't know if it makes a difference but in my tables, content.supplierid and content.priceid were nullable. That makes no difference in the tests I've done. Tom Lane says he's found the problem; I expect he'll be committing a fix shortly. -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] CHECK vs REFERENCES
On Sat, Sep 10, 2005 at 01:03:03AM -0300, Marc G. Fournier wrote: On Fri, 9 Sep 2005, Michael Fuhr wrote: INSERT INTO test_check SELECT 1 FROM generate_series(1, 10); INSERT 0 10 Time: 3492.344 ms INSERT INTO test_fk SELECT 1 FROM generate_series(1, 10); INSERT 0 10 Time: 23578.853 ms Yowch, I expected CHECK to be better ... but not so significantly ... I figured I'd be saving milliseconds, which, on a busy server, would add up fast ... but not 10k' of milliseconds ... Results will differ depending on the table structure: if you're indexing ten columns and have five triggers then the foreign key check will have less of an overall impact. -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] CHECK vs REFERENCES
On Sat, Sep 10, 2005 at 12:23:19AM -0300, Marc G. Fournier wrote: Which is faster, where the list involved is fixed? My thought is that since it doesn't have to check a seperate table, the CHECK itself should be the faster of the two, but I can't find anything that seems to validate that theory ... Why not just benchmark each method as you intend to use them? Here's a simplistic example: CREATE TABLE test_none ( val integer NOT NULL ); CREATE TABLE test_check ( val integer NOT NULL CHECK (val IN (1, 2, 3, 4, 5)) ); CREATE TABLE test_vals ( id integer PRIMARY KEY ); INSERT INTO test_vals SELECT * FROM generate_series(1, 5); CREATE TABLE test_fk ( val integer NOT NULL REFERENCES test_vals ); \timing INSERT INTO test_none SELECT 1 FROM generate_series(1, 10); INSERT 0 10 Time: 3109.089 ms INSERT INTO test_check SELECT 1 FROM generate_series(1, 10); INSERT 0 10 Time: 3492.344 ms INSERT INTO test_fk SELECT 1 FROM generate_series(1, 10); INSERT 0 10 Time: 23578.853 ms -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Advise about how to delete entries
On Fri, Sep 02, 2005 at 01:43:05PM +0200, Arnau wrote: statistic_id | numeric(10,0)| not null default nextval('STATISTICS_OPERATOR_ID_SEQ'::text) Any reason this column is numeric instead of integer or bigint? That contains about 7.000.000 entries and I have to remove 33.000 entries. I have created an sql file with all the delete sentences, e.g.: DELETE FROM statistics_sasme WHERE statistic_id = 9832; then I do \i delete_items.sql. Remove a single entry takes more than 10 seconds. What would you do to speed it up? The referential integrity triggers might be slowing down the delete. Do you have indexes on all foreign key columns that refer to this table? Do all foreign key columns that refer to statistic_id have the same type as statistic_id (numeric)? What's the output EXPLAIN ANALYZE DELETE ...? Do you vacuum and analyze the tables regularly? What version of PostgreSQL are you using? -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Massive performance issues
On Thu, Sep 01, 2005 at 06:42:31PM +0100, Matthew Sackman wrote: address_pc_top_index btree (postcode_top) address_pc_top_middle_bottom_index btree (postcode_top, postcode_middle, postcode_bottom) address_pc_top_middle_index btree (postcode_top, postcode_middle) This doesn't address the query performance problem, but isn't only one of these indexes necessary? The second one, on all three columns, because searches involving only postcode_top or only postcode_top and postcode_middle could use it, making the indexes on only those columns superfluous. Or am I missing something? -- Michael Fuhr ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Weird performance drop after VACUUM
On Mon, Aug 29, 2005 at 11:07:17AM -0700, asif ali wrote: The database is on the same system. What I am doing is only VACUUM analyze conversion_table I did the the same thing on a newly created database. And got the same result. So after VACUUM analyze performance dropped. Please see this. Runtime changes from 7755.115 to 14859.291 ms As has been pointed out a couple of times, you're getting a different plan after VACUUM ANALYZE because the row count estimates are more accurate. Unfortunately the more accurate estimates result in a query plan that's slower than the plan for the less accurate estimates. PostgreSQL *thinks* the plan will be faster but your results show that it isn't, so you might need to adjust some of the planner's cost constants. A asked some questions that you didn't answer, so I'll ask them again: What's your effective_cache_size setting? What's your work_mem (8.x) or sort_mem (7.x) setting? What's your random_page_cost setting? How much available RAM does the machine have? What version of PostgreSQL are you running? Various tuning guides give advice on how to set the above and other configuration variables. Here's one such guide: http://www.powerpostgresql.com/PerfList/ -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] difference in plan between 8.0 and 8.1?
On Fri, Aug 26, 2005 at 10:45:07AM -0400, Alan Stange wrote: - Bitmap Heap Scan on tbltimeseries (cost=48714.09..1331000.42 rows=10854026 width=8) Recheck Cond: (timeseriesid 0) - Bitmap Index Scan on idx_timeseris (cost=0.00..48714.09 rows=10854026 width=0) Index Cond: (timeseriesid 0) I'm hoping someone can explain the new query plan (as I'm not sure I understand what it is doing). Search for bitmap in the 8.1 Release Notes: http://developer.postgresql.org/docs/postgres/release.html#RELEASE-8-1 You could probably find more detailed discussion in the pgsql-hackers archives. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Weird performance drop after VACUUM
On Fri, Aug 26, 2005 at 03:52:24PM -0700, asif ali wrote: I have the same issue. After doing VACCUME ANALYZE performance of the query dropped. Your EXPLAIN output doesn't show the actual query times -- could you post the EXPLAIN ANALYZE output? That'll also show how accurate the planner's row count estimates are. Before VACCUME ANALYZE Index Scan using conversion_table_pk on keyword_conversion_table c (cost=0.00..18599.25 rows=4986 width=95) Index Cond: ((conversion_date = '2005-06-07'::date) AND (conversion_date = '2005-08-17'::date)) After VACCUME ANALYZE Seq Scan on conversion_table c (cost=0.00..29990.83 rows=1094820 width=66) Filter: ((conversion_date = '2005-06-07'::date) AND (conversion_date = '2005-08-17'::date)) I dont know why system is doing Seq scan now. Notice the row count estimates: 4986 in the before query and 1094820 in the after query. In the latter, the planner thinks it has to fetch so much of the table that a sequential scan would be faster than an index scan. You can see whether that guess is correct by disabling enable_seqscan to force an index scan. It might be useful to see the output of the following: SET enable_seqscan TO on; SET enable_indexscan TO off; EXPLAIN ANALYZE SELECT ...; SET enable_seqscan TO off; SET enable_indexscan TO on; EXPLAIN ANALYZE SELECT ...; You might also experiment with planner variables like effective_cache_size and random_page_cost to see how changing them affects the query plan. However, be careful of tuning the system based on one query: make sure adjustments result in reasonable plans for many different queries. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Weird performance drop after VACUUM
On Fri, Aug 26, 2005 at 05:10:49PM -0700, asif ali wrote: GroupAggregate (cost=195623.66..206672.52 rows=20132 width=16) (actual time=8205.283..10139.369 rows=55291 loops=1) - Sort (cost=195623.66..198360.71 rows=1094820 width=16) (actual time=8205.114..9029.501 rows=863883 loops=1) Sort Key: keyword_id - Seq Scan on keyword_conversion_table c (cost=0.00..29990.83 rows=1094820 width=16) (actual time=0.057..1422.319 rows=863883 loops=1) Filter: ((conversion_date = '2005-06-07'::date) AND (conversion_date = '2005-08-17'::date)) Total runtime: 14683.617 ms What are your effective_cache_size and work_mem (8.x) or sort_mem (7.x) settings? How much RAM does the machine have? If you have enough memory then raising those variables should result in better plans; you might also want to experiment with random_page_cost. Be careful not to set work_mem/sort_mem too high, though. See Run-time Configuration in the Server Run-time Environment chapter of the documentation for more information about these variables. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speedier count(*)
On Wed, Aug 10, 2005 at 05:37:49PM -0600, Dan Harris wrote: Also, I am using select ... group by ... order by .. limit 1 to get the min/max since I have already been bit by the issue of min() max() being slower. PostgreSQL 8.1 will have optimizations for certain MIN and MAX queries. http://archives.postgresql.org/pgsql-committers/2005-04/msg00163.php http://archives.postgresql.org/pgsql-committers/2005-04/msg00168.php -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Table locking problems?
On Tue, Aug 09, 2005 at 12:04:11PM -0600, Dan Harris wrote: These changes have definitely improved performance, but I am now finding some trouble with UPDATE or DELETE queries hanging and never releasing their locks. As this happens, other statements queue up behind it. Have you examined pg_locks to see if the UPDATE or DELETE is blocked because of a lock another session holds? Are you using foreign keys? When updating referencing rows, released versions of PostgreSQL acquire a lock on the referenced row that can hurt concurrency or cause deadlock (this will be improved in 8.1). -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Two queries are better than one?
On Thu, Jul 28, 2005 at 04:04:25PM -0700, Karim Nassar wrote: I ran into a situation today maintaining someone else's code where the sum time running 2 queries seems to be faster than 1. The original code was split into two queries. I thought about joining them, but considering the intelligence of my predecessor, I wanted to test it. The question is, which technique is really faster? Is there some hidden setup cost I don't see with explain analyze? To see which technique will be faster in your application, time the application code. The queries you show are taking fractions of a millisecond; the communications overhead of executing two queries might make that technique significantly slower than just the server execution time that EXPLAIN ANALYZE shows. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to revoke a password
On Fri, Jul 08, 2005 at 05:16:27PM -0700, Bailey, Larry wrote: Thanks but it is still prompting for a password. Let's back up a bit: what problem are you trying to solve? Do you want the user to be able to log in without entering a password? If so then see Client Authentication in the documentation: http://www.postgresql.org/docs/8.0/static/client-authentication.html If you're trying to do something else then please elaborate, as it's not clear what you mean by I want to ALTER that user to exclude the password. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] ALTER TABLE tabla ALTER COLUMN columna SET STATISTICS number
On Wed, Jul 06, 2005 at 04:49:21PM -0300, Dario wrote: where is stored the value set by ALTER TABLE table_name ALTER COLUMN column_name SET STATISTICS = [1-1000]? pg_attribute.attstattarget Example query: SELECT attrelid::regclass, attname, attstattarget FROM pg_attribute WHERE attstattarget 0; See the System Catalogs chapter in the documentation for more information. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Postgresql7.4.5 running slow on plpgsql function
[Please copy the mailing list on replies so others can contribute to and learn from the discussion.] On Wed, Jun 29, 2005 at 12:29:42PM +0800, Chun Yit(Chronos) wrote: Yes, the function will delete records from the temporary table every time on each call. if leaving a lot of dead tuples, then how can we solve it? If the function deletes all records from the temporary table then you could use TRUNCATE instead of DELETE. Otherwise you could VACUUM the table between calls to the function (you can't run VACUUM inside a function). -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Postgresql7.4.5 running slow on plpgsql function
On Thu, Jun 23, 2005 at 05:56:52PM +0800, Chun Yit(Chronos) wrote: currently we have a function that use together with temp table, it calls search result function, everytime this function is calling, it will go through some filter before come out as a result. now we have some major problem , the first time the function execute, it take about 13 second second time the function is execute, it take about 17 second, every time you execute the function the time taken will grow about 4 second, ? may i know what going on here? since we use function with temp table, so every statement that related to temp table will using EXECUTE command. Could you post the function? Without knowing what the code is doing it's impossible to say what's happening. Is the temporary table growing on each function call? Does the function delete records from the table on each call, leaving a lot of dead tuples? -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance analysis of plpgsql code
On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote: On 06/27/2005 06:33:03 PM, Michael Fuhr wrote: See timeofday(). That only gives you the time at the start of the transaction, so you get no indication of how long anything in the transaction takes. Did you read the documentation or try it? Perhaps you're thinking of now(), current_timestamp, and friends, which don't advance during a transaction; but as the documentation states, timeofday() returns the wall-clock time and does advance during transactions. I just ran tests on versions of PostgreSQL going back to 7.2.8 and in all of them timeofday() advanced during a transaction. Does it not work on your system? If not then something's broken -- what OS and version of PostgreSQL are you using? -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance analysis of plpgsql code
On Tue, Jun 28, 2005 at 03:03:06AM +, Karl O. Pinc wrote: For all your work a documentation patch is appended that I think is easier to read and might avoid this problem in the future. Patches should go to the pgsql-patches list -- the people who review and apply patches might not be following this thread. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Limit clause not using index
On Tue, Jun 21, 2005 at 09:46:39PM +0200, Tobias Brox wrote: [John A Meinel - Tue at 10:14:24AM -0500] I believe if you drop the indexes inside a transaction, they will still be there for other queries, and if you rollback instead of commit, you won't lose anything. Has anyone tested this? Observations from tests with 8.0.3: DROP INDEX acquires an AccessExclusiveLock on the table and on the index. This will cause the transaction executing the DROP INDEX to block until no other transaction holds any kind of lock on either, and once the locks are acquired, no other transaction will be able to access the table or the index until the transaction doing the DROP INDEX commits or rolls back. Rolling back leaves the index in place. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Need help to decide Mysql vs Postgres
On Mon, Jun 06, 2005 at 08:25:08PM +0300, Tobias Brox wrote: [Jeffrey Tenny - Mon at 11:51:22AM -0400] There are some pitfalls to pgsql though, especially for existing SQL code using MAX and some other things which can really be blindsided (performance-wise) by pgsql if you don't use the workarounds. Yes, I discovered that - select max(num_attr) does a full table scan even if the figure can be found easily through an index. PostgreSQL 8.1 will be able to use indexes for MIN and MAX. http://archives.postgresql.org/pgsql-committers/2005-04/msg00163.php http://archives.postgresql.org/pgsql-committers/2005-04/msg00168.php -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] timestamp indexing
On Mon, May 30, 2005 at 05:19:51PM +0800, Tobias Brox wrote: We have a production database with transaction-style data, in most of the tables we have a timestamp attribute created telling the creation time of the table row. Naturally, this attribute is always increasing. The message subject is timestamp indexing but you don't mention whether you have an index on the timestamp column. Do you? By now we are hitting the limit where the table data does not fit in caches anymore. We have a report section where there are constantly requests for things like sum up all transactions for the last two weeks, and those requests seem to do a full table scan, even though only the last parts of the table is needed - so by now those reports have started to cause lots of iowait. Could you post an example query and its EXPLAIN ANALYZE output? If the query uses a sequential scan then it might also be useful to see the EXPLAIN ANALYZE output with enable_seqscan turned off. Since caching can cause a query to be significantly faster after being run several times, it might be a good idea to run EXPLAIN ANALYZE three times and post the output of the last run -- that should put the queries under comparison on a somewhat equal footing (i.e., we don't want to be misled about how much faster one query is than another simply because one query happened to use more cached data on a particular run). How many records are in the tables you're querying? Are you regularly vacuuming and analyzing the database or the individual tables? Are any of the tables clustered? If so, on what indexes and how often are you re-clustering them? What version of PostgreSQL are you using? -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] index on different types
On Fri, Apr 29, 2005 at 04:35:13AM +0200, Enrico Weigelt wrote: there's often some talk about indices cannot be used if datatypes dont match. PostgreSQL 8.0 is smarter than previous versions in this respect. It'll use an index if possible even when the types don't match. On a larger (and long time growed) application I tend to use OID for references on new tables while old stuff is using integer. If you're using OIDs as primary keys then you might wish to reconsider. See the caveats in the documentation and in the FAQ: http://www.postgresql.org/docs/8.0/interactive/datatype-oid.html http://www.postgresql.org/docs/faqs.FAQ.html#4.12 Is the planner smart enough to see both as compatible datatype or is manual casting required ? You can use EXPLAIN to see what the planner will do, but be aware that the planner won't always use an index even if it could: if it thinks a sequential scan would be faster then it won't use an index. To see if using an index is possible, you could set enable_seqscan to off before executing EXPLAIN. In any case, a foreign key column probably ought to have the same type as the column it references -- is there a reason for making them different? -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sort and index
On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote: I thought that an index can be used for sorting. I'm a little confused about the following result: create index OperationsName on Operations(cOperationName); explain SELECT * FROM Operations ORDER BY cOperationName; QUERY PLAN -- - Sort (cost=185.37..189.20 rows=1532 width=498) Sort Key: coperationname - Seq Scan on operations (cost=0.00..104.32 rows=1532 width=498) (3 rows) Is this supposed to be so? Since you are fetching the entire table, you are touching all the rows. If the query were to fetch the rows in index order, it would be seeking all over the table's tracks. By fetching in sequence order, it has a much better chance of fetching rows in a way that minimizes head seeks. Since disk I/O is generally 10-100x slower than RAM, the in-memory sort can be surprisingly slow and still beat indexed disk access. Of course, this is only true if the table can fit and be sorted entirely in memory (which, with 1500 rows, probably can). Out of curiosity, what are the results of the following queries? (Queries run twice to make sure time differences aren't due to caching.) SET enable_seqscan TO on; SET enable_indexscan TO off; EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName; EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName; SET enable_seqscan TO off; SET enable_indexscan TO on; EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName; EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName; SELECT version(); With 1500 rows of random data, I consistently see better performance with an index scan (about twice as fast as a sequence scan), and the planner uses an index scan if it has a choice (i.e., when enable_seqscan and enable_indexscan are both on). But my test case and postgresql.conf settings might be different enough from yours to account for different behavior. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] refcurosr vs. setof
On Sun, Apr 17, 2005 at 10:05:29PM +0200, Rüdiger Herrmann wrote: I need to write several PL/pgSQL functions all returning a result set wich can be obtained by a single SELECT statement. For now the functions are called by a Java application. Both REFCURSOR and SETOF serve my purpose, but I was wondering if there is a perfonance difference between the two. The result set can become quite large. Here's an excerpt from the Control Structures section of the PL/pgSQL documentation: The current implementation of RETURN NEXT for PL/pgSQL stores the entire result set before returning from the function, as discussed above. That means that if a PL/pgSQL function produces a very large result set, performance may be poor: data will be written to disk to avoid memory exhaustion, but the function itself will not return until the entire result set has been generatedCurrently, the point at which data begins being written to disk is controlled by the work_mem configuration variable. You might want to test both ways in typical and worst-case scenarios and see how each performs. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] profiling postgresql queries?
On Tue, Apr 12, 2005 at 12:46:43PM +0200, hubert lubaczewski wrote: the problem is that both the inserts and updated operate on heavy-tirggered tables. and it made me wonder - is there a way to tell how much time of backend was spent on triggers, index updates and so on? like: total query time: 1 secons trigger a: 0.50 second trigger b: 0.25 second index update: 0.1 second EXPLAIN ANALYZE in 8.1devel (CVS HEAD) prints a few statistics for triggers: EXPLAIN ANALYZE UPDATE foo SET x = 10 WHERE x = 20; QUERY PLAN -- Index Scan using foo_x_idx on foo (cost=0.00..14.44 rows=10 width=22) (actual time=0.184..0.551 rows=7 loops=1) Index Cond: (x = 20) Trigger row_trig1: time=1.625 calls=7 Trigger row_trig2: time=1.346 calls=7 Trigger stmt_trig1: time=1.436 calls=1 Total runtime: 9.659 ms (6 rows) 8.1devel changes frequently (sometimes requiring initdb) and isn't suitable for production, but if the trigger statistics would be helpful then you could set up a test server and load a copy of your database into it. Just beware that because it's bleeding edge, it might destroy your data and it might behave differently than released versions. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Functionscan estimates
On Fri, Apr 08, 2005 at 03:15:50PM -0700, Josh Berkus wrote: I'm wondering if it might be useful to be able to add estimated selectivity to a function definition for purposes of query estimation. Currently function scans automatically return a flat default 1000 estimated rows. It seems like the DBA ought to be able to ALTER FUNCTION and give it a row estimate for planning purposes. About a month ago I mentioned that I'd find that useful. In a followup, Christopher Kings-Lynne brought up the idea of a GUC variable that could give hints about the expected row count. http://archives.postgresql.org/pgsql-hackers/2005-03/msg00146.php http://archives.postgresql.org/pgsql-hackers/2005-03/msg00153.php -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] date - range
On Fri, Apr 01, 2005 at 09:59:44PM -0800, Mischa wrote: select ... where first_date = today and last_date = today Whatever index we create system always does a sequential scan (which I can understand). Has someone a smarter solution? Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index. That might not be necessary in this case. CREATE TABLE foo ( id serial PRIMARY KEY, first_date date NOT NULL, last_date date NOT NULL, CONSTRAINT check_date CHECK (last_date = first_date) ); /* populate table */ CREATE INDEX foo_date_idx ON foo (first_date, last_date); ANALYZE foo; EXPLAIN SELECT * FROM foo WHERE first_date = current_date AND last_date = current_date; QUERY PLAN Index Scan using foo_date_idx on foo (cost=0.01..15.55 rows=97 width=12) Index Cond: ((first_date = ('now'::text)::date) AND (last_date = ('now'::text)::date)) (2 rows) -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]