Re: [PERFORM] query plan question, nested loop vs hash join
On Fri, Oct 3, 2014 at 6:38 PM, Andrey Lizenko lizenk...@gmail.com wrote: Is it possible to force optimizer choose the second plan without doing set enable_hashjoin = off; ? Increasing of 'effective_cache_size' leads to similar thing with mergejoin, other options (work_mem, shared_buffers. etc) do not change anything. Have you tried changing random_page_cost? In small databases where most of the data is cached anyway, lowering random_page_cost to somewhere between 1 and 2 usually leads to better planner decisions. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance of SELECT * much faster than SELECT colname with large offset
On Fri, Oct 3, 2014 at 5:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: Marc Slemko ma...@znep.com writes: I ran into this oddity lately that goes against everything I thought I understood and was wondering if anyone had any insight. SELECT * avoids a projection step ... see ExecAssignScanProjectionInfo. It would be cool if OFFSET could somehow signal the child nodes don't bother constructing the actual tuple. Not sure if that could work in more complex queries. But this is just one of many performance problems with large OFFSETs. Of course you can always work around this using a subquery... select description from ( select * from ccrimes offset 514 limit 1 ) subq; But most of the time it's better to use scalable paging techniques: http://use-the-index-luke.com/sql/partial-results/fetch-next-page Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue: index not used on GROUP BY...
On Thu, Aug 28, 2014 at 11:50 AM, gmb gmbou...@gmail.com wrote: Can somebody please confirm whether aggregate functions such as GROUP BY should use indexes ? Yes, if the planner deems it faster than other approaches. It can make wrong choices for many reasons, but usually when your planner tunables like random_page_cost, effective_cache_size aren't set appropriately. There's some advice here: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server Just for the purpose of testing, you could try set enable_sort=false in your session and see if that makes it faster. On Thu, Aug 28, 2014 at 12:08 PM, gmb gmbou...@gmail.com wrote: Sort Key: co_id, client_id, doc_no, Something went missing from this line... Sort Method: external merge Disk: 80304kB Depends on your hardware and workloads, but more work_mem may also improve queries to avoid sorts and hashes needing to use disk. But beware, setting it too high may result in your server running out of memory. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Window functions, partitioning, and sorting performance
On Thu, Aug 21, 2014 at 4:29 PM, Eli Naeher enae...@gmail.com wrote: Clearly the bulk of the time is spent sorting the rows in the original table, and then again sorting the results of the subselect. But I'm afraid I don't really know what to do with this information. Is there any way I can speed this up? Sort Method: external merge Disk: 120976kB The obvious first step is to bump up work_mem to avoid disk-based sort. Try setting it to something like 256MB in your session and see how it performs then. This may also allow the planner to choose HashAggregate instead of sort. It not always straightforward how to tune correctly. It depends on your hardware, concurrency and query complexity, here's some advice: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem_maintainance_work_mem Also you could create an index on (route, direction, stop, stop_time) to avoid the inner sort entirely. And it seems that you can move the INNER JOIN stop to the outer query as well, not sure if that will change much. Try these and if it's still problematic, report back with a new EXPLAIN ANALYZE Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Window functions, partitioning, and sorting performance
On Thu, Aug 21, 2014 at 7:19 PM, Eli Naeher enae...@gmail.com wrote: However, when I try to do a test self-join using it, Postgres does two seq scans across the whole table, even though I have indexes on both id and previous_stop_event: http://explain.depesz.com/s/ctck. Any idea why those indexes are not being used? Because the planner thinks seq scan+hash join is going to be faster than incurring the overhead of index scans for other kinds of plans. You can try out alternative plan types by running 'set enable_hashjoin=off' in your session. If it does turn out to be faster, then it usually means you haven't set planner tunables right (random_page_cost, effective_cache_size and possibly cpu_tuple_cost). Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how does the planer to estimate row when i use order by and group by
On Tue, Aug 12, 2014 at 5:59 AM, 楊新波 silent0...@gmail.com wrote: why does the planer estimate 200 rows when i use order by and group by . evn:postgresql 8.4 and 9.3 Can anybody suggest something or explain this behavior? Because the table is empty, analyze doesn't store any stats for the table, so the planner uses some default guesses. This is actually beneficial for cases where you have done some inserts to a new table, and autovacuum hasn't gotten around to analyzing it yet. And it rarely hurts because any query plan will be fast when there's no data. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two table join with order by on both tables attributes
On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin itparan...@gmail.com wrote: select * from users join notifications on users.id=notifications.user_id ORDER BY users.priority desc ,notifications.priority desc limit 10; In my understanding, i need to have two indexes on users(priority desc, id) and notifications(user_id, priority desc) And actually with this kind of query we really want the most wanted notifications, by the user. So we really can rewrite to order by users.priority desc, id asc, notifications.priority desc according to business logic. You can rewrite it with LATERAL to trick the planner into sorting each user's notifications separately. This should give you the nestloop plan you expect: SELECT * FROM users, LATERAL ( SELECT * FROM notifications WHERE notifications.user_id=users.id ORDER BY notifications.priority DESC ) AS notifications ORDER BY users.priority DESC, users.id It would be great if Postgres could do this transformation automatically. There's a partial sort patch in the current CommitFest, which would solve the problem partially (it could use the index on users, but the notifications sort would have to be done in memory still). https://commitfest.postgresql.org/action/patch_view?id=1368 Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Evaluating query performance with caching in PostgreSQL 9.1.6
On Fri, May 31, 2013 at 7:32 PM, fburg...@radiantblue.com wrote: 1.) Is there any way to clear the cache so that we can ensure that when we run explain analyze on a query and make some minor adjustments to that query and re-execute, the plan is not cached. PostgreSQL doesn't cache query plans if you do a normal SELECT or EXPLAIN ANALYZE SELECT query. Plans are cached only if you use prepared queries: 1. Embedded queries within PL/pgSQL procedures 2. Explicit PREPARE/EXECUTE commands 3. PQprepare in the libpq library (or other client library) If you don't use these, then you are experiencing something else and not plan cache. Maybe you're referring to disk cache. The only way to clear PostgreSQL's cache (shared buffers) is to restart it, but there is another level of caching done by the operating system. On Linux you can drop the OS cache using: echo 1 /proc/sys/vm/drop_caches 2.) I am noticing that when I look at pg_stat_activities: autovacuum is re-processing some old Partition tables way back in 2007, which are static and are essentially read-only partitions. the line item in pg_stat reads as follows: autovacuum:VACUUM public.digi_sas_y2007m07 (to prevent wraparound). Is there a way to have autovacuum skip these static type partition tables, No. This is a necessary and critical operation. PostgreSQL stores row visibility information based on 32-bit transaction IDs (xids). This value is small enough that it can wrap around, so very old tables need to be frozen. Details here: http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND If this is a problem for you then you may want to schedule manual VACUUM FREEZE on old tables during low usage periods. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] statistics target for columns in unique constraint?
On Mon, May 13, 2013 at 6:01 PM, ach alanchi...@gmail.com wrote: what I'm wondering is, since the unique constraint already covers the whole table and all rows in entirety, is it really necessary for statistics to be set that high on those? AFAIK if there are exact-matching unique constraints/indexes for a query's WHERE clause, the planner will deduce that the query only returns 1 row and won't consult statistics at all. Or does that only serve to slow down inserts to that table? It doesn't slow down inserts directly. Tables are analyzed in the background by autovacuum. However, I/O traffic from autovacuum analyze may slow down inserts running concurrently. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit order by performance issue
On Tue, Oct 16, 2012 at 10:47 PM, Karl Denninger k...@denninger.net wrote: Put an index on time_stamp (I assume there is one on id_signal already) Well the optimal index for this particular query would include both columns: (id_signal, time_stamp) -- in this order. Additionally, if you want to take advantage of the index-only scans feature, add the SELECTed column too: (id_signal, time_stamp, var_value) Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] timing != log duration
On Wed, Mar 21, 2012 at 13:42, Rafael Martinez r.m.guerr...@usit.uio.no wrote: I am wondering why the time reported by \timing in psql is not the same as the time reported by duration in the log file when log_duration or log_min_duration_statement are on? psql's \timing measures time on the client -- which includes the network communication time (time to send the query to the server, and receive back the results) log_min_duration_statement measures time on the server, so it doesn't know how long network transmission takes. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Large insert and delete batches
On Thu, Mar 1, 2012 at 21:06, Kääriäinen Anssi anssi.kaariai...@thl.fi wrote: The queries are select * from the_table where id = ANY(ARRAY[list_of_numbers]) and the similar delete, too. [...] However, once you go into millions of items in the list, the query will OOM my Postgres server. The problem with IN() and ARRAY[] is that the whole list of numbers has to be parsed by the SQL syntax parser, which has significant memory and CPU overhead (it has to accept arbitrary expressions in the list). But there's a shortcut around the parser: you can pass in the list as an array literal string, e.g: select * from the_table where id = ANY('{1,2,3,4,5}') The SQL parser considers the value one long string and passes it to the array input function, which is a much simpler routine. This should scale up much better. Even better if you could pass in the array as a query parameter, so the SQL parser doesn't even see the long string -- but I think you have to jump through some hoops to do that in psycopg2. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATE on NOT JOIN
On Wed, Feb 15, 2012 at 20:33, Gabriel Biberian ad...@beemotechnologie.com wrote: Currently, i use the following query to update the filesystem table with the missing files : UPDATE filesystem SET dead=some_value WHERE dead=0 AND (SELECT 1 FROM temporary AS t WHERE t.hash=filesystem.hash LIMIT 1) IS NULL I don't know if this solves your problem entirely, but an obvious improvement would be using the NOT EXISTS (SELECT ...) construct: UPDATE filesystem SET dead=some_value WHERE dead=0 AND NOT EXISTS (SELECT 1 FROM temporary AS t WHERE t.hash=filesystem.hash); PostgreSQL 8.4+ can optimize this into an anti join query (you didn't mention what version you are using). Also, if your hardware isn't very limited, you should increase the work_mem setting from the default (1MB). If the above doesn't help significantly, please post the full EXPLAIN ANALYZE output. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance on large, append-only tables
On Wed, Feb 8, 2012 at 20:03, David Yeu david@skype.net wrote: * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET ?; * Pages of twenty rows. A good improvement for this sort of queries is the scalable paging trick. Instead of increasing the OFFSET argument -- which means that Postgres has to scan more and more rows -- you should remember an index key where the last page ended. In other words, you get the first page using: WHERE group_id = ? ORDER BY created_at DESC LIMIT 20 Say, this page returns created_at values between 2012-01-01 and 2012-01-10. If the user clicks next page, you run a query like this instead: WHERE group_id = ? AND created_at'2012-01-10' ORDER BY created_at DESC LIMIT 20 Thus, every next page fetch always takes a constant time. Of course there's a small problem when two rows have equal times. Then, you can add primary key to the sort key to disambiguate those rows: WHERE group_id = ? AND (created_at, pkey_col) ('2012-01-10', 712) ORDER BY created_at, pkey_col DESC LIMIT 20 Of course an index on (group_id, created_at) or (group_id, created_at, pkey_col) is necessary for these to work well Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgress is taking lot of CPU on our embedded hardware.
On Sat, Jan 28, 2012 at 19:11, Jayashankar K B jayashankar...@lnties.com wrote: But we are stumped by the amount of CPU Postgres is eating up. You still haven't told us *how* slow it actually is and how fast you need it to be? What's your database layout like (tables, columns, indexes, foreign keys)? What do the queries look like that you have problems with? Our database file is located on a class 2 SD Card. So it is understandable if there is lot of IO activity and speed is less. Beware that most SD cards are unfit for database write workloads, since they only perform very basic wear levelling (in my experience anyway -- things might have changed, but I'm doubtful). It's a matter of time before you wear out some frequently-written blocks and they start returning I/O errors or corrupted data. If you can spare the disk space, increase checkpoint_segments, as that means at least WAL writes are spread out over a larger number of blocks. (But heap/index writes are still a problem) They can also corrupt your data if you lose power in the middle of a write -- since they use much larger physical block sizes than regular hard drives and it can lose the whole block, which file systems or Postgres are not designed to handle. They also tend to not respect flush/barrier requests that are required for database consistency. Certainly you should do such power-loss tests before you release your product. I've built an embedded platform with a database. Due to disk corruptions, in the end I opted for mounting all file systems read-only and keeping the database only in RAM. Any configuration settings we could check up? For one, you should reduce max_connections to a more reasonable number -- I'd guess you don't need more than 5 or 10 concurrent connections. Also set synchronous_commit=off; this means that you may lose some committed transactions after power loss, but I think with SD cards all bets are off anyway. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL Parallel Processing !
On Fri, Jan 27, 2012 at 06:31, sridhar bamandlapally sridhar@gmail.com wrote: -- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -- | 0 | SELECT STATEMENT | | 7444K| 944M| 16077 (4)| 00:03:13 | | 1 | TABLE ACCESS FULL| EMP | 7444K| 944M| 16077 (4)| 00:03:13 | -- Sorry to take this off topic, but... Seriously, over 3 minutes to read 944 MB of data? That's less than 5 MB/s, what's wrong with your database? :) Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Dramatic change in memory usage with version 9.1
On Mon, Dec 19, 2011 at 17:04, Rafael Martinez r.m.guerr...@usit.uio.no wrote: * Sudden decrease of swap when running backup/vacuum+analyze jobs Do you know for certain that this memory use is attributed to vacuum/analyze/backup, or are you just guessing? You should isolate whether it's the vacuum or a backup process/backend that takes this memory. Do you launch vacuum/analyze manually or are you just relying on autovacuum? How many parallel vacuum jobs are there? What's your autovacuum_max_workers set to? How large is your database? How did you perform the upgrade -- via pg_upgrade or pg_dump? Any ideas about why this dramatic change in memory usage when the only thing apparently changed from our side is the postgres version? Well, for one, there have been many planner changes that make it use memory more aggressively, these probably being the most significant: * Materialize for nested loop queries in 9.0: http://rhaas.blogspot.com/2010/04/materialization-in-postgresql-90.html * Hash join usage for RIGHT and FULL OUTER JOINs in 9.0 However, none of these would apply to vacuum, analyze or backups. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] will the planner ever use an index when the condition is ?
On Sun, Dec 18, 2011 at 16:52, Roxanne Reid-Bennett r...@tara-lu.com wrote: Is there an index type that can check not equal? This specific column has a limited number of possible values - it is essentially an enumerated list. Instead of writing WHERE foo3 you could rewrite it as WHERE foo IN (1,2,4,...) or WHERE foo 3 OR foo 3. Both of these are indexable queries, but obviously the planner may choose not to use index scan if it's not worth it. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is it possible to use index on column for regexp match operator '~'?
2011/12/14 Rural Hunter ruralhun...@gmail.com: for example, the where condition is: where '' ~ col1. I created a normal index on col1 but seems it is not used. I assume you want to search values that match one particular pattern, that would be col1 ~ '' The answer is, only very simple patterns that start with '^'. Note that you MUST use the text_pattern_ops index opclass: # create table words (word text); # copy words from '/usr/share/dict/words'; # create index on words (word text_pattern_ops); # explain select * from words where word ~ '^post'; Index Scan using words_word_idx on words (cost=0.00..8.28 rows=10 width=9) Index Cond: ((word ~=~ 'post'::text) AND (word ~~ 'posu'::text)) Filter: (word ~ '^post'::text) If you just want to search for arbitrary strings, in PostgreSQL 9.1+ you can use pg_trgm extension with a LIKE expression: # create extension pg_trgm; # create index on words using gist (word gist_trgm_ops); # explain select * from words where word like '%post%'; Bitmap Heap Scan on words (cost=4.36..40.23 rows=10 width=9) Recheck Cond: (word ~~ '%post%'::text) - Bitmap Index Scan on words_word_idx1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (word ~~ '%post%'::text) There's also the wildspeed external module which is somewhat faster at this: http://www.sai.msu.su/~megera/wiki/wildspeed And someone is working to get pg_trgm support for arbitrary regular expression searches. This *may* become part of the next major PostgreSQL release (9.2) http://archives.postgresql.org/message-id/CAPpHfduD6EGNise5codBz0KcdDahp7--MhFz_JDD_FRPC7-i=a...@mail.gmail.com Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Response time increases over time
On Thu, Dec 8, 2011 at 06:37, Aidan Van Dyk ai...@highrise.ca wrote: Let me guess, debian squeeze, with data and xlog on both on a single ext3 filesystem, and the fsync done by your commit (xlog) is flushing all the dirty data of the entire filesystem (including PG data writes) out before it can return... This is fixed with the data=writeback mount option, right? (If it's the root file system, you need to add rootfsflags=data=writeback to your kernel boot flags) While this setting is safe and recommended for PostgreSQL and other transactional databases, it can cause garbage to appear in recently written files after a crash/power loss -- for applications that don't correctly fsync data to disk. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intersect/Union X AND/OR
On Mon, Dec 5, 2011 at 14:14, Thiago Godoi thiagogodo...@gmail.com wrote: My original query : select table1.id from table1, (select function(12345) id) table2 where table1.kind = 1234 and table1.id = table2.id Nested Loop (cost=0.00..6.68 rows=1 width=12) Join Filter: () - Seq Scan on recorte (cost=0.00..6.39 rows=1 width=159) Filter: (id = 616) - Result (cost=0.00..0.26 rows=1 width=0) Note that this EXPLAIN output is quite different from your query. Intead of a kind=1234 clause there's id=616. Also, please post EXPLAIN ANALYZE results instead whenever possible. When I changed the query to use intersect : [...] The second plan is about 10 times faster than the first one. Judging by these plans, the 1st one should not be slower. Note that just running the query once and comparing times is often misleading, especially for short queries, since noise often dominates the query time -- depending on how busy the server was at the moment, what kind of data was cached, CPU power management/frequency scaling, etc. ESPECIALLY don't compare pgAdmin timings since those also include network variance, the time taken to render results on your screen and who knows what else. A simple way to benchmark is with pgbench. Just write the query to a text file (it needs to be a single line and not more than ~4000 characters). Then run 'pgbench -n -f pgbench_script -T 5' to run it for 5 seconds. These results are still not entirely reliable, but much better than pgAdmin timings. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSL encryption makes bytea transfer slow
On Tue, Nov 8, 2011 at 12:25, Albe Laurenz laurenz.a...@wien.gv.at wrote: I can't get oprofile to run on this RHEL6 box, it doesn't record anything, so all I can test is total query duration. Maybe this helps you with OProfile? http://people.planetpostgresql.org/andrew/index.php?/archives/224-The-joy-of-Vx.html Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSL encryption makes bytea transfer slow
On Fri, Oct 28, 2011 at 14:02, Albe Laurenz laurenz.a...@wien.gv.at wrote: Without SSL the SELECT finished in about a second, with SSL it took over 23 seconds (measured with \timing in psql). When you query with psql, it requests columns in text format. Since bytea hex-encodes its value if output is text, this means it's transmitting 60 MB for a 30 MB bytea value. If you could make sure that your app is requesting binary output, then you could cut 50% off this time. As others mentioned, most of the overhead is in SSL compression (not encryption), which can be disabled, but is not very easy to do. But 23 seconds for 60 MB is still *very* slow, so something else could be going wrong. What kind of CPU is this? On Thu, Nov 3, 2011 at 16:48, Albe Laurenz laurenz.a...@wien.gv.at wrote: Disabling OpenSSL compression in the source (which is possible since OpenSSL 1.0.0) does not give me any performance improvement. If it doesn't give you any performance improvement then you haven't disabled compression. Modern CPUs can easily saturate 1 GbitE with AES256-encrypted connections. Compression is usually the bottleneck, at 20-30 MB/s. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Ineffective autovacuum
1. First things first: vacuum cannot delete tuples that are still visible to any old running transactions. You might have some very long queries or transactions that prevent it from cleaning properly: select * from pg_stat_activity where xact_start now()-interval '10 minutes'; 2. On 8.3 and earlier servers with large tables, it's critical that your max_fsm_pages and max_fsm_relations are tuned properly. Failing that, autovacuum will permanently leak space that can only be fixed with a VACUUM FULL (which will take an exclusive lock and run for a very long time) PostgreSQL version 8.4 addressed this problem, but for the unfortunate, you have to follow the tuning advice here: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#autovacuum_max_fsm_pages.2C_max_fsm_relations On Tue, Sep 27, 2011 at 08:08, Royce Ausburn royce...@inomial.com wrote: I've noticed that the same tables seem to be auto vacuum'd over and over again… Some of the tables are a bit surprising in that they're updated semi-regularly, but not enough (I'd think) to warrant an autovacuum every few minutes… Is this unusual? Maybe they're just auto-analyze processes? Those get triggered on insert-only tables too, when vacuum normally wouldn't run. Perhaps unrelated: I've done some digging around and happened across a nightly task doing: select pg_stat_reset() AFAIK (but I could be wrong), vacuum uses a separate set of statistics not affected by pg_stat_reset. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] Insights: fseek OR read_cluster?
On Mon, Sep 26, 2011 at 15:51, Antonio Rodriges antonio@gmail.com wrote: What is read_cluster() ? Are you talking about some kind of async and/or I meant that if you want to read a chunk of data from file you (1) might not call traditional fseek but rather memorize hard drive cluster numbers to boost disk seeks and, (2) perform the read of disk cluster directly. PostgreSQL accesses regular files on a file system via lseek(), read() and write() calls, no magic. In modern extent-based file systems, mapping a file offset to a physical disk sector is very fast -- compared to the time of actually accessing the disk. I can't see how direct cluster access would even work, unless you'd give the database direct access to a raw partition, in which case Postgres would effectively have to implement its own file system. The gains are simply not worth it for Postgres, our developer resources are better spent elsewhere. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Odd misprediction
On Fri, Sep 16, 2011 at 17:50, Claudio Freire klaussfre...@gmail.com wrote: It's not an issue for me (it's not really impacting performance), but since it was odd I thought I might ask. I have this supermegaquery: SELECT t.date AS status_date, lu.id AS memberid, lu.username AS username, u.url AS url, ub.url_pattern AS urlpattern, lu.email AS email, lu.birth_date AS birthdate, lu.creation_date AS creationdate, s.name AS state, co.name AS country, opd.survey_id AS originalSurvey, c.name AS city , lu.confirmed AS confirmed , pd.name AS action , sd.duration AS loi FROM tracks t LEFT JOIN surveyduration_v sd ON sd.member_id = t.member_id AND sd.survey_id = 5936 INNER JOIN all_users_v lu ON lu.id = t.member_id AND lu.panel_source_id = 1 LEFT JOIN track_status ts ON ts.id = t.track_status_id LEFT JOIN partners p ON p.id = t.partner_id LEFT JOIN urls u ON u.id = t.url_id AND u.survey_id = 5936 LEFT JOIN url_batchs ub ON u.url_batch_id = ub.id LEFT JOIN states s ON lu.state_id = s.id LEFT JOIN cities c ON lu.city_id = c.id LEFT JOIN countries co ON lu.country_id = co.id LEFT JOIN partner_deliveries pd ON pd.id = t.partner_delivery_id AND t.partner_id IS NOT NULL LEFT JOIN partner_deliveries opd ON opd.id = pd.originator_id WHERE t.survey_id = 5936 AND t.track_status_id IN (5) With the views CREATE OR REPLACE VIEW surveyduration_v AS SELECT date_part('epoch'::text, t.date - tl2.date) / 60::double precision AS duration, t.member_id, t.survey_id FROM tracks t JOIN track_logs tl2 ON t.id = tl2.track_id WHERE tl2.track_status_id = 8 AND t.track_status_id = 7; CREATE OR REPLACE VIEW all_users_v AS SELECT 1 AS panel_source_id, livra_users.id, livra_users.birth_date, livra_users.creation_date, livra_users.email, livra_users.first_name, livra_users.last_name, livra_users.username, livra_users.image_link, livra_users.confirmed, livra_users.is_panelist, livra_users.unregistered, livra_users.reason, livra_users.privacy, livra_users.sex, livra_users.site, livra_users.country_id, livra_users.state_id, livra_users.city_id, livra_users.last_activity_date, livra_users.partner_id, livra_users.survey_id, livra_users.panelist_update, livra_users.panelist_percentage FROM livra_users UNION ALL SELECT 2 AS panel_source_id, - external_users.id AS id, NULL::timestamp without time zone AS birth_date, external_users.creation_date, external_users.email, NULL::character varying AS first_name, NULL::character varying AS last_name, external_users.username, NULL::character varying AS image_link, true AS confirmed, external_users.is_panelist, false AS unregistered, NULL::integer AS reason, 0 AS privacy, NULL::integer AS sex, external_users.site, external_users.country_id, NULL::integer AS state_id, NULL::integer AS city_id, NULL::timestamp without time zone AS last_activity_date, NULL::integer AS partner_id, external_users.survey_id, NULL::bigint AS panelist_update, NULL::smallint AS panelist_percentage FROM external_users; Server is 9.0.3 running on linux The BIG tables are tracks, track_logs and urls, all 30M rows. One detail that could be related is that tracks.member_id is an undeclared (denoramlized) foreign key to livra_users. The resulting plan is: Hash Left Join (cost=51417.93..974563.27 rows=2241518 width=1276) Hash Cond: (*SELECT* 1.country_id = co.id) - Hash Left Join (cost=51415.40..941722.50 rows=2241518 width=1271) Hash Cond: (*SELECT* 1.state_id = s.id) - Hash Left Join (cost=51373.45..910859.68 rows=2241518 width=1263) Hash Cond: (t.partner_delivery_id = pd.id) Join Filter: (t.partner_id IS NOT NULL) - Hash Left Join (cost=32280.78..854175.26 rows=2241518 width=1256) Hash Cond: (*SELECT* 1.city_id = c.id) - Hash Join (cost=24183.20..792841.63 rows=2241518 width=1249) Hash Cond: (*SELECT* 1.id = t.member_id) - Append (cost=0.00..148254.38 rows=3008749 width=168) - Subquery Scan on *SELECT* 1 (cost=0.00..140223.96 rows=3008748 width=168) - Seq Scan on livra_users (cost=0.00..110136.48 rows=3008748 width=168) - Subquery Scan on *SELECT* 2 (cost=0.00..8030.42 rows=1 width=60) - Result (cost=0.00..8030.41 rows=1 width=60) One-Time Filter: false - Seq Scan on external_users (cost=0.00..8030.41 rows=1 width=60)
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Tue, Sep 13, 2011 at 00:26, Robert Klemme shortcut...@googlemail.com wrote: In the case of PG this particular example will work: 1. TX inserts new PK row 2. TX tries to insert same PK row = blocks 1. TX commits 2. TX fails with PK violation 2. TX does the update (if the error is caught) That goes against the point I was making in my earlier comment. In order to implement this error-catching logic, you'll have to allocate a new subtransaction (transaction ID) for EVERY ROW you insert. If you're going to be loading billions of rows this way, you will invoke the wrath of the vacuum freeze process, which will seq-scan all older tables and re-write every row that it hasn't touched yet. You'll survive it if your database is a few GB in size, but in the terabyte land that's unacceptable. Transaction IDs are a scarce resource there. In addition, such blocking will limit the parallelism you will get from multiple inserters. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Tue, Sep 13, 2011 at 19:34, Robert Klemme shortcut...@googlemail.com wrote: I don't think so. You only need to catch the error (see attachment). Or does this create a sub transaction? Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a SAVEPOINT it can roll back to in case of an error. Yes, I mentioned the speed issue. But regardless of the solution for MySQL's INSERT..ON DUPLICATE KEY UPDATE which Igor mentioned you will have the locking problem anyhow if you plan to insert concurrently into the same table and be robust. In a mass-loading application you can often divide the work between threads in a manner that doesn't cause conflicts. For example, if the unique key is foobar_id and you have 4 threads, thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes (foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the work. I already suggested this in my original post. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Mon, Sep 12, 2011 at 23:04, Shaun Thomas stho...@peak6.com wrote: I was alluding to the fact that if a DBA had his system running for a week at our transaction level, and PG didn't have forced auto vacuum, and their maintenance lapsed even slightly, they could end up with a corrupt database. It doesn't actually corrupt your database. If you manage to hit the wraparound age, PostgreSQL disallows new connections and tells you to run a VACUUM from a standalone backend. (But that should never happen due to the forced vacuum freeze processes) Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 17:23, Andy Colson a...@squeakycode.net wrote: On 09/11/2011 08:59 AM, Igor Chudov wrote: By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? You have two options: 1) write a function like: create function doinsert(_id integer, _value text) returns void as 2) use two sql statements: Unfortunately both of these options have caveats. Depending on your I/O speed, you might need to use multiple loader threads to saturate the write bandwidth. However, neither option is safe from race conditions. If you need to load data from multiple threads at the same time, they won't see each other's inserts (until commit) and thus cause unique violations. If you could somehow partition their operation by some key, so threads are guaranteed not to conflict each other, then that would be perfect. The 2nd option given by Andy is probably faster. You *could* code a race-condition-safe function, but that would be a no-go on a data warehouse, since each call needs a separate subtransaction which involves allocating a transaction ID. Which brings me to another important point: don't do lots of small write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides being inefficient, they introduce a big maintenance burden. In PostgreSQL's MVCC, each tuple contains a reference to the 32-bit transaction ID that inserted it (xmin). After hitting the maximum 32-bit value transaction ID, the number wraps around. To prevent old rows from appearing as new, a vacuum freeze process will run after passing autovacuum_freeze_max_age transactions (200 million by default) to update all old rows in your database. Using fewer transaction IDs means it runs less often. On small databases, this is usually not important. But on a 10TB data warehouse, rewriting a large part of your database totally kills performance for any other processes. This is detailed in the documentation: http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] seq scan in the case of max() on the primary key column
On Thu, Jun 16, 2011 at 21:36, Shaun Thomas stho...@peak6.com wrote: You can call that instead of max, and it'll be much faster. You can create an analog for min if you need it. So for this, you'd call: Cool, I've needed this function sometimes but never bothered enough to write it myself. Now I created a wiki snippet page for this handy feature here: https://wiki.postgresql.org/wiki/Efficient_min/max_over_partitioned_table With Jim Nasby's idea to use regclass instead of relation names, the function is now half its length and probably more reliable. There's no need to touch pg_class directly at all. I also changed it to return bigint instead of integer, as that's more versatile, and the performance loss is probably negligible. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 100% CPU Utilization when we run queries.
On Wed, Jun 8, 2011 at 07:19, bakkiya bakk...@gmail.com wrote: We have a postgresql 8.3.8 DB which consumes 100% of the CPU whenever we run any query. We got vmstat output Machine details are below: Any query? Does even SELECT 1 not work? Or SELECT * FROM sometable LIMIT 1 Or are you having problems with only certain kinds of queries? If so, please follow this for how to report it: https://wiki.postgresql.org/wiki/SlowQueryQuestions Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql on itanium server
On Thu, Jun 9, 2011 at 13:03, muthu krishnan muthu.krishnan.li...@gmail.com wrote: Thank you for suggesting the valuable URL, we are getting 3 floating point assist fault error for every second, will it impact the performance for postgresql? Probably. The kernel throttles these messages, so you're probably performing many more of these calculations than the number of messages. Is there any option to turn on flush to zero mode in itanium cpu while compiling postgresql from source? As the URL mentions, you can build with CFLAGS=-ffast-math, that should work for PostgreSQL too. But since you know you're operating with denormal numbers, you WILL get different results to queries. Whether that's a problem for you depends on your application. You could start getting division by zero errors for instance. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: {Spam} [PERFORM] Will shared_buffers crash a server
Qiang Wang forest_qi...@yahoo.com wrote: We have PostgreSQL 8.3 running on Debian Linux server. We built an applicantion using PHP programming language and Postgres database. There are appoximatly 150 users using the software constantly. We had some performance degration before and after some studies we figured out we will need to tune PostgreSQL configurations. However we suffered 2 times server crashes after tunning the configuration. Does anyone have any idea how this can happen? Could you explain in more detail, *how* it crashed? On Linux, the first suspect for crashes is usually the OOM (out-of-memory) killer. When the kernel thinks it's run out of memory, it picks a task and kills it. Due to the way PostgreSQL uses shared memory, it's more likely to be killed than other processes. To figure out whether you've suffered an OOM kill, run dmesg, you would see something like: [2961426.424851] postgres invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [2961426.424857] postgres cpuset=/ mems_allowed=0 [2961426.424861] Pid: 932, comm: postgres Not tainted 2.6.31-22-server #65-Ubuntu [2961426.424863] Call Trace: ... The first step in solving OOM kills is disabling memory overcommit; add 'vm.overcommit_memory = 0' to /etc/sysctl.conf and run the command 'echo 0 /proc/sys/vm/overcommit_memory' This doesn't prevent OOM kills entirely, but usually reduces them significantly, queries will now abort with an out of memory error if they're responsible for memory exhaustion. You can also reduce the chance that PostgreSQL is chosen for killing, by changing its oom_adj, documented here: http://blog.credativ.com/en/2010/03/postgresql-and-linux-memory-management.html Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multiple table scan performance
On Wed, Mar 30, 2011 at 01:16, Samuel Gendler sgend...@ideasculptor.com wrote: I've got some functionality that necessarily must scan a relatively large table Is there any performance benefit to revamping the workload such that it issues a single: insert into (...) select ... UNION select ... UNION select as opposed to 3 separate insert into () select ... statements. Apparently not, as explained by Claudio Freire. This seems like missed opportunity for the planner, however. If it scanned all three UNION subqueries in parallel, the synchronized seqscans feature would kick in and the physical table would only be read once, instead of 3 times. (I'm assuming that seqscan disk access is your bottleneck) You can trick Postgres (8.3.x and newer) into doing it in parallel anyway: open 3 separate database connections and issue each of these 'INSERT INTO ... SELECT' parts separately. This way all the queries should execute in about 1/3 the time, compared to running them in one session or with UNION ALL. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg9.0.3 explain analyze running very slow compared to a different box with much less configuration
On Thu, Mar 24, 2011 at 11:11, Achilleas Mantzios ach...@matrix.gatewaynet.com wrote: My problem had to do with the speed of gettimeofday. You might want to do some special setting regarding your box's way of reading time for the hw clock. Just for extra info, on x86, TSC is usually the fast timeofday implementation. On recent CPUs in single-socket configurations, TSC should always be available, regardless of any power management. I don't know about multi-socket. If you want to know whether your kernel is using tsc, run: cat /sys/devices/system/clocksource/clocksource0/current_clocksource On older CPUs, you often had to disable some sort of power management in order to get a stable TSC -- the ondemand scaling governor is the top suspect. Disabling this is distro-specific. You have to reboot to get the kernel to re-test TSC. Unfortunately disabling power management later at boot doesn't help you, you have to prevent it from activating at all. For debugging, grepping dmesg for tsc or clocksource is often helpful. On machines with unstable TSC you'll see output like this: [0.00] Fast TSC calibration using PIT [0.164068] checking TSC synchronization [CPU#0 - CPU#1]: passed. [0.196730] Switching to clocksource tsc [0.261347] Marking TSC unstable due to TSC halts in idle [0.261536] Switching to clocksource acpi_pm If you just want to get repeatable timings, you can force both machines to use the hpet clocksource: echo hpet /sys/devices/system/clocksource/clocksource0/current_clocksource Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query on CLUTER -ed tables
2011/3/23 Laszlo Nagy gand...@shopzeus.com: GroupAggregate (cost=5553554.25..5644888.17 rows=2283348 width=50) - Sort (cost=5553554.25..5559262.62 rows=2283348 width=50) Sort Key: pph.hid, ppoh.merchantid, pph.hdate - Nested Loop (cost=0.00..5312401.66 rows=2283348 width=50) - Index Scan using idx_product_price_history_id_hdate on product_price_history pph (cost=0.00..8279.80 rows=4588 width=16) Index Cond: (id = 37632081) - Index Scan using pk_product_price_offer_history on product_price_offer_history ppoh (cost=0.00..1149.86 rows=498 width=42) Index Cond: (ppoh.hid = pph.hid) Filter: (ppoh.isfeatured = 1) I suspect that, since the matched hid's probably aren't sequential, many of those ~500 product_price_offer_history rows will be far apart on disk. Please show the EXPLAIN ANALYZE output in the slow case, not just EXPLAIN. Also, PostgreSQL version? What configuration options have you changed? (http://wiki.postgresql.org/wiki/SlowQueryQuestions) Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?
On Sun, Mar 13, 2011 at 18:36, runner run...@winning.com wrote: Tried removing the indexes and other constraints just for the import but for a noob like me, this was too much to ask. Maybe when I get more experience. pgAdmin should make it pretty easy. Choose each index and constraint, save the code from the SQL pane for when you need to restore it, and do a right click - Drop Other than being very inefficient, and consuming more time than necessary, is there any other down side to importing into an indexed table? Doing so will result in somewhat larger (more bloated) indexes, but generally the performance impact of this is minimal. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?
On Fri, Mar 11, 2011 at 21:06, fork forkandw...@gmail.com wrote: Like the following? Will it rebuild the indexes in a sensical way? Don't insert data into an indexed table. A very important point with bulk-loading is that you should load all the data first, then create the indexes. Running multiple (different) CREATE INDEX queries in parallel can additionally save a lot of time. Also don't move data back and forth between the tables, just drop the original when you're done. Doing this should give a significant performance win. Partitioning them to fit in cache should improve it further, but I'm not sure anymore that it's worthwhile considering the costs and extra maintenance. Is there a rule of thumb on tradeoffs in a partitioned table? The only certain thing is that you'll lose group aggregate and merge join query plans. If you only see HashAggregate plans when you EXPLAIN your GROUP BY queries then it probably won't make much of a difference. I would use the partition column whatever I am most likely to cluster by in a single big table, right? Yes. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?
On Thu, Mar 10, 2011 at 17:40, fork forkandw...@gmail.com wrote: The data is not particularly sensitive; if something happened and it rolled back, that wouldnt be the end of the world. So I don't know if I can use dangerous setting for WAL checkpoints etc. There are also aren't a lot of concurrent hits on the DB, though a few. If you don't mind long recovery times in case of a crash, set checkpoint_segments to ~100 and checkpoint_completion_target=0.9; this will improve write throughput significantly. Also, if you don't mind CORRUPTing your database after a crash, setting fsync=off and full_page_writes=off gives another significant boost. I am loathe to create a new table from a select, since the indexes themselves take a really long time to build. UPDATE on a table with many indexes will probably be slower. If you want to speed up this part, use INSERT INTO x SELECT and take this chance to partition your table, such that each individual partition and most indexes will fit in your cache. Index builds from a warm cache are very fast in PostgreSQL. You can create several indexes at once in separate sessions, and the table will only be scanned once. Don't forget to bump up maintenance_work_mem for index builds, 256MB might be a reasonable arbitrary value. The downside is that partitioning can interfere with your read queries if they expect the data in a sorted order. But then, HashAggregate tends to be faster than GroupAggregate in many cases, so this might not matter for your queries. Alternatively you can experiment with PostgreSQL 9.1 alpha, which has mostly fixed this shortcoming with the merge append plan node. As the title alludes, I will also be doing GROUP BY's on the data, and would love to speed these up, mostly just for my own impatience... I think regular tuning is the best you can do here. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Perl Binding affects speed?
On Fri, Feb 25, 2011 at 05:02, Sam Wong s...@hellosam.net wrote: * But if I do this - using binding: $dbh-selectall_arrayref(SELECT * from shipment_lookup WHERE (UPPER(lookup) LIKE ?), undef, '0GURG5YGVQA9%'); It took 10 seconds to finish the query, just like it was using full table scan instead! Even though the 'explain' shows the same query plan. This is a pretty common shortcoming with placeholders. Since planning of parameterized queries is done *before* binding parameters, the planner has no knowledge of what the ? placeholder actually is. Thus it often gets the selectivity statistics wrong and produces worse plans for your values. AFAIK the only workaround is to not use variable binding in these cases, but escape and insert your variables straight it into the SQL query. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem
On Tue, Feb 15, 2011 at 20:01, Scott Marlowe scott.marl...@gmail.com wrote: run htop and look for red. if youi've got lots of red bar on each CPU but no io wait then it's waiting for memory access. I don't think this is true. AFAICT the red bar refers to system time, time that's spent in the kernel -- either in syscalls or kernel background threads. Operating systems don't generally account memory accesses (cache misses) for processes, if you don't specially ask for it. The closest thing I know of is using Linux perf tools, e.g. perf top -e cache-misses. OProfile, DTrace and SystemTap can probably do something similar. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIMIT on partitioned-table!?
On Tue, Feb 15, 2011 at 21:33, Kim A. Brandt kimabra...@gmx.de wrote: removing the ORDER BY worked. But I am afraid to ask this. How can I order by partition? It seams that the planner has picked a random(!?) order of partition to select from. The returned records, from the selected partition, are correctly sorted bythe index though. If a single query accesses more than one partition, PostgreSQL currently cannot read the values in index-sorted order. Hence with ORDER BY and LIMIT, PostgreSQL cannot return *any* results before it has read all matching rows and then sorted them. Adding a LIMIT doesn't help much. Your only bet is to reduce the number of matched rows, or make sure that you only access a single partition. Increasing work_mem may speed up the sort step if you're hitting the disk (EXPLAIN ANALYZE VERBOSE will tell you whether that's the case). This will change in PostgreSQL 9.1 which has a new Merge Append plan node. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 8, 2011 at 18:36, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Yeah, current behavior with that shutdown option is the opposite of smart for any production environment I've seen. (I can see where it would be handy in development, though.) What's best in production is the equivalent of the fast option with escalation to immediate if necessary to ensure shutdown within the time limit. +1, we should call it dumb :) Not accepting new connections with the database system is shutting down makes it even worse -- it means you can't log in to the server to inspect who's querying it or call pg_terminate_backend() on them. I couldn't find any past discussions about changing the default to fast. Are there any reasons why that cannot be done in a future release? Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 8, 2011 at 22:09, Greg Smith g...@2ndquadrant.com wrote: Kevin and I both suggested a fast plus timeout then immediate behavior is what many users seem to want. My comments were at http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for an example of how fast shutdown can fail see http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php True, I've hit that a few times too. Seems that a better solution would be implementing a new -m option that does this transparently? Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Mon, Feb 7, 2011 at 05:03, Craig Ringer cr...@postnewspapers.com.au wrote: What would possibly help would be if Pg could fall back to lower shared_buffers automatically, screaming about it in the logs but still launching. OTOH, many people don't check the logs, so they'd think their new setting had taken effect and it hadn't - you've traded one usability problem for another. Even if Pg issued WARNING messages to each client that connected, lots of (non-psql) clients don't display them, so many users would never know. Do you have a suggestion about how to do this better? The current approach is known to be rather unlovely, but nobody's come up with a better one that works reasonably and doesn't trample on other System V shared memory users that may exist on the system. We could do something similar to what Apache does -- provide distros with a binary to check the configuration file in advance. This check program is launched before the restart command, and if it fails, the server is not restarted. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Different execution plans for semantically equivalent queries
On Mon, Feb 7, 2011 at 00:03, Mikkel Lauritsen ren...@tala.dk wrote: SELECT * FROM table t1 WHERE 0 = (SELECT COUNT(*) FROM table t2 WHERE t2.type = t1.type AND t2.timestamp t1.timestamp) I suspect that *any* database is going to have trouble optimizing that. Just out of curiosity I've been looking a bit at the optimizer code in PostgreSQL, and it seems as if it would be at least theoretically possible to add support for things like transforming the query at hand into the NOT EXISTS form; a bit like how = NULL is converted to IS NULL. Would a change like that be accepted, or would you rather try to indirectly educate people into writing better SQL? There are some reasonable and generic optimizations that could be done here. Being able to inline subqueries with aggregates into joins would be a good thing e.g. transform your query into this: SELECT t1.* FROM table t1 JOIN table t2 ON (t2.type = t1.type) WHERE t2.timestamp t1.timestamp GROUP BY t1.* HAVING COUNT(t2.*)=0 However, this is probably still worse than a NOT EXISTS query. I am less excited about turning COUNT(x)=0 query to NOT EXISTS because that's just a bad way to write a query. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] COPY TO stdout statements occurrence in log files
On Fri, Jan 14, 2011 at 23:19, Chris Browne cbbro...@acm.org wrote: 2. In 9.1, there will be a new answer, as there's a GUC to indicate the application_name. Actually this was already introduced in PostgreSQL 9.0 :) You can add application_name to your log_line_prefix with %a. For pg_dump it will display pg_dump Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] encourging bitmap AND
On Thu, Dec 23, 2010 at 22:52, Tom Lane t...@sss.pgh.pa.us wrote: Ben midfi...@gmail.com writes: i have a schema similar to the following create index foo_s_idx on foo using btree (s); create index foo_e_idx on foo using btree (e); i want to do queries like select * from foo where 150 between s and e; That index structure is really entirely unsuited to what you want to do, so it's not surprising that the planner isn't impressed with the idea of a bitmap AND. Why is it unsuited for this query? It expands to (150 s AND 150 e) which should work nicely with bitmap AND as far as I can tell. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MySQL HandlerSocket - Is this possible in PG?
On Tue, Dec 21, 2010 at 11:09, Michael Ben-Nes mich...@epoch.co.il wrote: Just stumbled on the following post: http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html The post claim that MySQL can do more qps then MemCahed or any other NoSQL when doing simple queries like: SELECT * FROM table WHERE id=num; And I wonder if: 1. Currently, is it possbile to achive the same using PG 9.0.x 2. Is it possible at all? I was curious what could be done currently, without any modifications to PostgreSQL itself, so I ran a simple benchmark. Table: create table usr (user_id int primary key not null, user_name text not null, user_email text not null, created timestamp not null); insert into usr select generate_series(1, 100), 'Yukari Takeba', 'yukari.tak...@dena.jp', '2010-02-03 11:22:33'; ?php $db = pg_connect(''); $res = pg_prepare($db, 'get_user', 'select user_name, user_email, created from usr where user_id=$1'); $res = pg_query($db, 'begin'); $args = array(); for($i = 0; $i 25; $i++) { $args[0] = rand(1, 100); $res = pg_execute($db, 'get_user', $args); $row = pg_fetch_row($res); } ? Each process does 250k queries, so when I run 4 in parallel it's 1M queries total. I'm running PostgreSQL 9.1alpha2, PHP 5.3.4, kernel 2.6.36.2 on Arch Linux; AMD Phenom II X4 955. The only tuning I did was setting shared_buffers=256M Results: % time php pg.php time php pg.php time php pg.php time php pg.php sleep 11 [1] 29792 [2] 29793 [3] 29795 [4] 29797 php pg.php 1,99s user 0,97s system 30% cpu 9,678 total [2]done time php pg.php php pg.php 1,94s user 1,06s system 30% cpu 9,731 total [3] - done time php pg.php php pg.php 1,92s user 1,07s system 30% cpu 9,746 total [1] - done time php pg.php php pg.php 2,00s user 1,04s system 31% cpu 9,777 total [4] + done time php pg.php So around 10 seconds to run the test in total. These numbers aren't directly comparable to their test -- I tested over a local UNIX socket, with PHP client on the same machine -- but it's a datapoint nevertheless. Bottom line, you can expect up to 100 000 QPS using pg_execute() on a cheap quad-core gamer CPU. You won't be beating memcached with current PostgreSQL, but I think it's a respectable result. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Auto-clustering?
2010/12/17 Filip Rembiałkowski filip.rembialkow...@gmail.com: regarding clustering: it does not help with index bloat. I'm almost sure it does, CLUSTER re-creates all indexes from scratch after copying the tuples. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgres performance tunning
On Thu, Dec 16, 2010 at 14:33, selvi88 selvi@gmail.com wrote: I have a requirement for running more that 15000 queries per second. Can you please tell what all are the postgres parameters needs to be changed to achieve this. You have not told us anything about what sort of queries they are or you're trying to do. PostgreSQL is not the solution to all database problems. If all you have is a dual-core machine then other software can possibly make better use of the available hardware. First of all, if they're mostly read-only queries, you should use a caching layer (like memcache) in front of PostgreSQL. And you can use replication to spread the load across multiple machines (but you will get some latency until the updates fully propagate to slaves). If they're write queries, memory databases (like Redis), or disk databases specifically optimized for writes (like Cassandra) might be more applicable. Alternatively, if you can tolerate some latency, use message queuing middleware like RabbitMQ to queue up a larger batch and send updates to PostgreSQL in bulk. As for optimizing PostgreSQL itself, if you have a high connection churn then you will need connection pooling middleware in front -- such as pgbouncer or pgpool. But avoiding reconnections is a better idea. Also, use prepared queries to avoid parsing overheads for every query. Obviously all of these choices involve tradeoffs and caveats, in terms of safety, consistency, latency and application complexity. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] only one index is using, why?
On Wed, Dec 15, 2010 at 08:56, AI Rumman rumman...@gmail.com wrote: My question is why crmentity_setype_idx index is being used only. crmentity_deleted_idx index is not using. Any idea please. Because the planner determined that the cost of scanning *two* indexes and combining the results is more expensive than scanning one index and filtering the results afterwards. Looks like your query could use a composite index on both columns: (deleted, setype) Or a partial index: (setype) WHERE deleted=0 Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware recommendations
On Thu, Dec 9, 2010 at 04:28, Scott Marlowe scott.marl...@gmail.com wrote: On Wed, Dec 8, 2010 at 5:03 PM, Benjamin Krajmalnik k...@servoyant.com wrote: My biggest concern with SSD drives is their life expectancy, Generally that's not a big issue, especially as the SSDs get larger. Being able to survive a power loss without corruption is more of an issue, so if you go SSD get ones with a supercapacitor that can write out the data before power down. I agree with Benjamin here. Even if you put multiple SSD drives into a RAID array, all the drives get approximately the same write load and thus will likely wear out and fail at the same time! As for the Areca controllers, I haven't tested them with the latest drivers or firmware, but we would routinely get 180 to 460 days of uptime between lockups That sucks! But does a BBU even help with SSDs? The flash eraseblock is larger than the RAID cache unit size anyway, so as far as I can tell, it might not save you in the case of a power loss. Any thoughts whether software RAID on SSD is a good idea? Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware recommendations
On Thu, Dec 9, 2010 at 01:26, Andy angelf...@yahoo.com wrote: If you are IO-bound, you might want to consider using SSD. A single SSD could easily give you more IOPS than 16 15k SAS in RAID 10. Are there any that don't risk your data on power loss, AND are cheaper than SAS RAID 10? Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query to get last created row using CURRVAL
On Sat, Dec 4, 2010 at 13:56, Mathieu De Zutter math...@dezutter.org wrote: I have no idea why in some cases the index scan is not considered. Does anyone have an idea? I guess that it's because the currval() function is volatile -- its value has to be tested for again each row. Try this instead: SELECT user_id FROM log_event WHERE id = (SELECT CURRVAL('log_event_id_seq')); This will assure that there's only one call to currval(). Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wed, Nov 17, 2010 at 01:31, Tom Lane t...@sss.pgh.pa.us wrote: Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? In my testing, when running a large bulk insert query with fdatasync on ext4, changing wal_buffers has very little effect: http://ompldr.org/vNjNiNQ/wal_sync_method1.png (More details at http://archives.postgresql.org/pgsql-performance/2010-11/msg00094.php ) It would take some more testing to say this conclusively, but looking at the raw data, there only seems to be an effect when moving from 8 to 16MB. Could be different on other file systems though. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On Sat, Nov 13, 2010 at 07:53, Craig Ringer cr...@postnewspapers.com.au wrote: Oracle's MVCC approach has its own costs. Like Pg's, those costs increase with update/delete frequency. Instead of table bloat, Oracle suffers from redo log growth (or redo log size management issues). Instead of increased table scan costs from dead rows, Oracle suffers from random I/O costs as it looks up the out-of-line redo log for old rows. Instead of long-running writer transactions causing table bloat, Oracle can have problems with long-running reader transactions aborting when the redo log runs out of space. Another advantage of Oracle's approach seems that they need much less tuple-level overhead. IMO the 23-byte tuple overhead is a much bigger drawback in Postgres than table fragmentation. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On Thu, Nov 11, 2010 at 20:25, Kyriacos Kyriacou kyriac...@prime-tel.com wrote: By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a new copy of the row in a new location. result is to have huge fragmentation on table space, unnecessary updates in all affected indexes, unnecessary costly I/O operations, poor performance on SELECT that retrieves big record sets (i.e. reports etc) and slower updates. Have you tried reducing the table fillfactor and seeing if HOT update ratio increases? PostgreSQL 8.3 introduced HOT updates as kind of a middle ground -- if the update doesn't affect indexed columns and there's enough space in the same page that is being updated, then the new version will be written in the same page and indexes don't need to be touched at all. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Sat, Nov 13, 2010 at 20:01, Tom Lane t...@sss.pgh.pa.us wrote: What's your basis for asserting he's uninterested? Please have a little patience. My apologies, I was under the impression that he hadn't answered your request, but he did in the -hackers thread. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Mon, Nov 8, 2010 at 20:40, Tom Lane t...@sss.pgh.pa.us wrote: The latter choice is the one that requires testing to prove that it is the proper and preferred default from the performance and data reliability POV. And, in fact, the game plan is to do that testing and see which default we want. I think it's premature to argue further about this until we have some test results. Who will be doing that testing? You said you're relying on Greg Smith to manage the testing, but he's obviously uninterested, so it seems unlikely that this will go anywhere. I posted my results with the simple INSERT test, but nobody cared. I could do some pgbench runs, but I have no idea what parameters would give useful results. Meanwhile, PostgreSQL performance is regressing and there's still no evidence that open_datasync is any safer. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Mon, Nov 8, 2010 at 02:05, Greg Smith g...@2ndquadrant.com wrote: Where's your benchmarks proving it then? If you're right about this, and I'm not saying you aren't, it should be obvious in simple bechmarks by stepping through various sizes for wal_buffers and seeing the throughput/latency situation improve. Since benchmarking is the easy part, I did that. I plotted the time taken by inserting 2 million rows to a table with a single integer column and no indexes (total 70MB). Entire script is attached. If you don't agree with something in this benchmark, please suggest improvements. Chart: http://ompldr.org/vNjNiNQ/wal_sync_method1.png Spreadsheet: http://ompldr.org/vNjNiNg/wal_sync_method1.ods (the 2nd worksheet has exact measurements) This is a different machine from the original post, but similar configuration. One 1TB 7200RPM Seagate Barracuda, no disk controller cache, 4G RAM, Phenom X4, Linux 2.6.36, PostgreSQL 9.0.1, Arch Linux. This time I created a separate 20GB ext4 partition specially for PostgreSQL, with all default settings (shared_buffers=32MB). The partition is near the end of the disk, so hdparm gives a sequential read throughput of ~72 MB/s. I'm getting frequent checkpoint warnings, should I try larger checkpoing_segments too? The partition is re-created and 'initdb' is re-ran for each test, to prevent file system allocation from affecting results. I did two runs of all benchmarks. The points on the graph show a sum of INSERT time + COMMIT time in seconds. One surprising thing on the graph is a plateau, where open_datasync performs almost equally with wal_buffers=128kB and 256kB. Another noteworthy difference (not visible on the graph) is that with open_datasync -- but not fdatasync -- and wal_buffers=128M, INSERT time keeps shrinking, but COMMIT takes longer. The total INSERT+COMMIT time remains the same, however. I have a few expendable hard drives here so I can test reliability by pulling the SATA cable as well. Is this kind of testing useful? What workloads do you suggest? Regards, Marti pgtest.sh Description: Bourne shell script -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Mon, Nov 8, 2010 at 01:35, Greg Smith g...@2ndquadrant.com wrote: Yes; it's supposed to, and that logic works fine on some other platforms. No, the logic was broken to begin with. Linux technically supported O_DSYNC all along. PostgreSQL used fdatasync as the default. Now, because Linux added proper O_SYNC support, PostgreSQL suddenly prefers O_DSYNC over fdatasync? Until you've quantified which of the cases do that--which is required for reliable operation of PostgreSQL--and which don't, you don't have any data that can be used to draw a conclusion from. If some setups are faster because they write less reliably, that doesn't automatically make them the better choice. I don't see your point. If fdatasync worked on Linux, AS THE DEFAULT, all the time until recently, then how does it all of a sudden need proof NOW? If anything, the new open_datasync should be scrutinized because it WASN'T the default before and it hasn't gotten as much testing on Linux. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences
On Fri, Nov 5, 2010 at 13:32, A B gentosa...@gmail.com wrote: I was just thinking about the case where I will have almost 100% selects, but still needs something better than a plain key-value storage so I can do some sql queries. The server will just boot, load data, run, hopefully not crash but if it would, just start over with load and run. If you want fast read queries then changing fsync/full_page_writes/synchronous_commit won't help you. Just follow the regular tuning guide. shared_buffers, effective_cache_size, work_mem, default_statistics_target can make a difference. http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences
On Fri, Nov 5, 2010 at 13:11, Guillaume Cottenceau g...@mnc.ch wrote: Don't use PostgreSQL, just drop your data, you will end up with the same results and be even faster than any use of PostgreSQL. If anyone needs data, then just say you had data corruption, and that since 100% dataloss is accepted, then all's well. You're not helping. There are legitimate reasons for trading off safety for performance. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Fri, Nov 5, 2010 at 23:10, Greg Smith g...@2ndquadrant.com wrote: Not having a real O_DSYNC on linux until recently makes it even more dubious to have it as a default... If Linux is now defining O_DSYNC Well, Linux always defined both O_SYNC and O_DSYNC, but they used to have the same value. The defaults changed due to an unfortunate heuristic in PostgreSQL, which boils down to: #if O_DSYNC != O_SYNC #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC #else #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC (see src/include/access/xlogdefs.h for details) In fact, I was wrong in my earlier post. Linux always offered O_DSYNC behavior. What's new is POSIX-compliant O_SYNC, and the fact that these flags are now distinguished. Here's the change in Linux: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b2f3d1f769be5779b479c37800229d9a4809fc3 Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Sat, Nov 6, 2010 at 00:06, Greg Smith g...@2ndquadrant.com wrote: Please refrain from making changes to popular documents like the tuning guide on the wiki based on speculation about what's happening. I will grant you that the details were wrong, but I stand by the conclusion. I can state for a fact that PostgreSQL's default wal_sync_method varies depending on the fcntl.h header. I have two PostgreSQL 9.0.1 builds, one with older /usr/include/bits/fcntl.h and one with newer. When I run show wal_sync_method; on one instance, I get fdatasync. On the other one I get open_datasync. So let's get down to code. Older fcntl.h has: #define O_SYNC 01 # define O_DSYNCO_SYNC /* Synchronize data. */ Newer has: #define O_SYNC 0401 # define O_DSYNC01 /* Synchronize data. */ So you can see that in the older header, O_DSYNC and O_SYNC are equal. src/include/access/xlogdefs.h does: #if defined(O_SYNC) #define OPEN_SYNC_FLAG O_SYNC ... #if defined(OPEN_SYNC_FLAG) /* O_DSYNC is distinct? */ #if O_DSYNC != OPEN_SYNC_FLAG #define OPEN_DATASYNC_FLAG O_DSYNC ^ it's comparing O_DSYNC != O_SYNC #if defined(OPEN_DATASYNC_FLAG) #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC #elif defined(HAVE_FDATASYNC) #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC ^ depending on whether O_DSYNC and O_SYNC were equal, the default wal_sync_method will change. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Simple (hopefully) throughput question?
Just some ideas that went through my mind when reading your post. On Wed, Nov 3, 2010 at 17:52, Nick Matheson nick.d.mathe...@noaa.gov wrote: than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the storage overhead we have observed in Postgres. In the example below, it takes 1 GB to store 350 MB of nominal data. PostgreSQL 8.3 and later have 22 bytes of overhead per row, plus page-level overhead and internal fragmentation. You can't do anything about row overheads, but you can recompile the server with larger pages to reduce page overhead. Is there any way using stored procedures (maybe C code that calls SPI directly) or some other approach to get close to the expected 35 MB/s doing these bulk reads? Perhaps a simpler alternative would be writing your own aggregate function with four arguments. If you write this aggregate function in C, it should have similar performance as the sum() query. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Insert performance with composite index
On Mon, Nov 1, 2010 at 14:49, Divakar Singh dpsma...@yahoo.com wrote: I am trying to tune my libpq program for insert performance. When I tried inserting 1M rows into a table with a Primary Key, it took almost 62 seconds. After adding a composite index of 2 columns, the performance degrades to 125 seconds. This sounds a lot like the bottleneck I was hitting. What Linux kernel version are you running? If it's 2.6.33 or later, see: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#wal_sync_method_wal_buffers http://archives.postgresql.org/pgsql-performance/2010-10/msg00602.php Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Insert performance with composite index
On Mon, Nov 1, 2010 at 14:56, Divakar Singh dpsma...@yahoo.com wrote: Thanks for your tips. i will try those. I am on Solaris Sparc 5.10 Sorry, I assumed you were running Linux. But still it could be the same problem as I had. Be careful changing your wal_sync_method, as it has the potential to corrupt your database. I have no experience with Solaris. For what it's worth, Jignesh Shah recommends using wal_sync_method=fsync on Solaris: http://blogs.sun.com/jkshah/entry/postgresql_on_solaris_better_use http://blogs.sun.com/jkshah/entry/postgresql_wal_sync_method_and Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Hi pgsql-performance, I was doing mass insertions on my desktop machine and getting at most 1 MB/s disk writes (apart from occasional bursts of 16MB). Inserting 1 million rows with a single integer (data+index 56 MB total) took over 2 MINUTES! The only tuning I had done was shared_buffers=256MB. So I got around to tuning the WAL writer and found that wal_buffers=16MB works MUCH better. wal_sync_method=fdatasync also got similar results. First of all, I'm running PostgreSQL 9.0.1 on Arch Linux * Linux kernel 2.6.36 (also tested with 2.6.35. * Quad-core Phenom II * a single Seagate 7200RPM SATA drive (write caching on) * ext4 FS over LVM, with noatime, data=writeback I am creating a table like: create table foo(id integer primary key); Then measuring performance with the query: insert into foo (id) select generate_series(1, 100); 130438,011 mswal_buffers=64kB, wal_sync_method=open_datasync (all defaults) 29306,847 ms wal_buffers=1MB, wal_sync_method=open_datasync 4641,113 ms wal_buffers=16MB, wal_sync_method=open_datasync ^ from 130s to 4.6 seconds by just changing wal_buffers. 5528,534 ms wal_buffers=64kB, wal_sync_method=fdatasync 4856,712 ms wal_buffers=16MB, wal_sync_method=fdatasync ^ fdatasync works well even with small wal_buffers 2911,265 mswal_buffers=16MB, fsync=off ^ Not bad, getting 60% of ideal throughput These defaults are not just hurting bulk-insert performance, but also everyone who uses synchronus_commit=off Unless fdatasync is unsafe, I'd very much want to see it as the default for 9.1 on Linux (I don't know about other platforms). I can't see any reasons why each write would need to be sync-ed if I don't commit that often. Increasing wal_buffers probably has the same effect wrt data safety. Also, the tuning guide on wiki is understating the importance of these tunables. Reading it I got the impression that some people change wal_sync_method but it's dangerous and it even literally claims about wal_buffers that 1MB is enough for some large systems But the truth is that if you want any write throughput AT ALL on a regular Linux desktop, you absolutely have to change one of these. If the defaults were better, it would be enough to set synchronous_commit=off to get all that your hardware has to offer. I was reading mailing list archives and didn't find anything against it either. Can anyone clarify the safety of wal_sync_method=fdatasync? Are there any reasons why it shouldn't be the default? Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Sun, Oct 31, 2010 at 21:59, Greg Smith g...@2ndquadrant.com wrote: open_datasync support was just added to Linux itself very recently. Oh I didn't realize it was a new feature. Indeed O_DSYNC support was added in 2.6.33 It seems like bad behavior on PostgreSQL's part to default to new, untested features. I have updated the tuning wiki page with my understanding of the problem: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#wal_sync_method_wal_buffers Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance