[PERFORM] Bad query plan with high-cardinality column
I have a planner problem that looks like a bug, but I'm not familiar enough with how planner the works to say for sure. This is my schema: create table comments ( id serial primary key, conversation_id integer, created_at timestamp ); create index comments_conversation_id_index on comments (conversation_id); create index comments_created_at_index on comments (created_at); The table has 3.5M rows, and 650k unique values for conversation_id, where the histogram goes up to 54000 rows for the most frequent ID, with a long tail. There are only 20 values with a frequency of 1000 or higher. The created_at column has 3.5M distinct values. Now, I have this query: select comments.id from comments where conversation_id = 3975979 order by created_at limit 13 This filters about 5000 rows and returns the oldest 13 rows. But the query is consistently planned wrong: Limit (cost=0.00..830.53 rows=13 width=12) (actual time=3174.862..3179.525 rows=13 loops=1) Buffers: shared hit=2400709 read=338923 written=21 - Index Scan using comments_created_at_index on comments (cost=0.00..359938.52 rows=5634 width=12) (actual time=3174.860..3179.518 rows=13 loops=1) Filter: (conversation_id = 3975979) Rows Removed by Filter: 2817751 Buffers: shared hit=2400709 read=338923 written=21 Total runtime: 3179.553 ms It takes anywhere between 3 seconds and several minutes to run, depending on how warm the disk cache is. This is the correct plan and index usage: Limit (cost=6214.34..6214.38 rows=13 width=12) (actual time=25.471..25.473 rows=13 loops=1) Buffers: shared hit=197 read=4510 - Sort (cost=6214.34..6228.02 rows=5471 width=12) (actual time=25.469..25.470 rows=13 loops=1) Sort Key: created_at Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=197 read=4510 - Index Scan using comments_conversation_id_index on comments (cost=0.00..6085.76 rows=5471 width=12) (actual time=1.163..23.955 rows=5834 loops=1) Index Cond: (conversation_id = 3975979) Buffers: shared hit=197 read=4510 Total runtime: 25.500 ms Now, the problem for Postgres is obviously to estimate how many rows have a given conversation_id value, but it does have that number. I'm at a loss how to explain why it thinks scanning a huge index that covers the entire table will ever beat scanning a small index that has 17% of the table's values. It will consistently use the bad plan for higher-frequency values, and the good plan for lower-frequency values. If I run ANALYZE repeatedly, the planner will sometimes, oddly enough, choose the correct plan. This behaviour actually seems related to effective_cache_size; if it's set small (128MB), the planner will sometimes favour the good plan, but if large (2GB) it will consistently use the bad plan. I have bumped the statistics target up to 1, but it does not help. I have also tried setting n_distinct for the column manually, since Postgres guesses it's 285k instead of 650k, but that does not help. What is odd is that the bad plan is really never correct in any situation for this query. It will *always* be better to branch off the comments_conversation_id_index index. Our environment: 9.2.2, tweaked memory parameters (work_mem,
[PERFORM] Bad query plan with high-cardinality column
I have a problem with a query that is planned wrong. This is my schema: create table comments ( id serial primary key, conversation_id integer, created_at timestamp ); create index comments_conversation_id_index on comments (conversation_id); create index comments_created_at_index on comments (created_at); The table has 3.5M rows, and 650k unique values for conversation_id, where the histogram goes up to 54000 rows for the most frequent ID, with a long tail. There are only 20 values with a frequency of 1000 or higher. The created_at column has 3.5M distinct values. Now, I have this query: select comments.id from comments where conversation_id = 3975979 order by created_at limit 13 This filters about 5000 rows and returns the oldest 13 rows. But the query is consistently planned wrong: Limit (cost=0.00..830.53 rows=13 width=12) (actual time=3174.862..3179.525 rows=13 loops=1) Buffers: shared hit=2400709 read=338923 written=21 - Index Scan using comments_created_at_index on comments (cost=0.00..359938.52 rows=5634 width=12) (actual time=3174.860..3179.518 rows=13 loops=1) Filter: (conversation_id = 3975979) Rows Removed by Filter: 2817751 Buffers: shared hit=2400709 read=338923 written=21 Total runtime: 3179.553 ms It takes anywhere between 3 seconds and several minutes to run, depending on how warm the disk cache is. This is the correct plan and index usage: Limit (cost=6214.34..6214.38 rows=13 width=12) (actual time=25.471..25.473 rows=13 loops=1) Buffers: shared hit=197 read=4510 - Sort (cost=6214.34..6228.02 rows=5471 width=12) (actual time=25.469..25.470 rows=13 loops=1) Sort Key: created_at Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=197 read=4510 - Index Scan using comments_conversation_id_index on comments (cost=0.00..6085.76 rows=5471 width=12) (actual time=1.163..23.955 rows=5834 loops=1) Index Cond: (conversation_id = 3975979) Buffers: shared hit=197 read=4510 Total runtime: 25.500 ms The problem for Postgres is obviously to estimate how many rows have a given conversation_id value, but I have confirmed that the value is correctly tracked in the histogram. I'm at a loss how to explain why the planner thinks scanning a huge index that covers the entire table will ever beat scanning a small index that has 17% of the table's values. Even if the entire database were in RAM, this would hit way too much buffers unnecessarily. (I have determined that planner will consistently use the bad plan for higher-frequency values, and the good plan for lower-frequency values.) It will *always* be better to branch off the comments_conversation_id_index index. Another curious thing: If I run ANALYZE repeatedly, the planner will sometimes, oddly enough, choose the correct plan. This behaviour actually seems related to effective_cache_size; if it's set small (128MB), the planner will sometimes favour the good plan, but if large (= 2GB) it will consistently use the bad plan. Not sure if ANALYZE is changing anything or if it's just bad timing. Things I have tried: I have bumped the statistics target up to 1, but it does not help. I have also tried setting n_distinct for the column manually, since Postgres guesses it's 285k instead of 650k, but
Re: [PERFORM] Bad query plan with high-cardinality column
On Friday, February 22, 2013 at 21:33 , Tom Lane wrote: The reason is that the LIMIT may stop the query before it's scanned all of the index. The planner estimates on the assumption that the desired rows are roughly uniformly distributed within the created_at index, and on that assumption, it looks like this query will stop fairly soon ... but evidently, that's wrong. On the other hand, it knows quite well that the other plan will require pulling out 5000-some rows and then sorting them before it can return anything, so that's not going to be exactly instantaneous either. In this example, I'll bet that conversation_id and created_at are pretty strongly correlated, and that most or all of the rows with that specific conversation_id are quite far down the created_at ordering, so that the search through the index takes a long time to run. OTOH, with another conversation_id the same plan might run almost instantaneously. That's right. So I created a composite index, and not only does this make the plan correct, but the planner now chooses a much more efficient plan than the previous index that indexed only on conversation_id: Limit (cost=0.00..30.80 rows=13 width=12) (actual time=0.042..0.058 rows=13 loops=1) Buffers: shared hit=8 - Index Scan using index_comments_on_conversation_id_and_created_at on comments (cost=0.00..14127.83 rows=5964 width=12) (actual time=0.039..0.054 rows=13 loops=1) Index Cond: (conversation_id = 3975979) Buffers: shared hit=8 Total runtime: 0.094 ms Is this because it can get the value of created_at from the index, or is it because it can know that the index is pre-sorted, or both? Very impressed that Postgres can use a multi-column index for this. I just assumed, wrongly, that it couldn't. I will have to go review my other tables now and see if they can benefit from multi-column indexes. Thanks! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad query plan with high-cardinality column
On Friday, February 22, 2013 at 21:47 , Kevin Grittner wrote: I suspect you would be better off without those two indexes, and instead having an index on (conversation_id, created_at). Not just for the query you show, but in general. Indeed, that solved it, thanks! In my experience these problems come largely from the planner not knowing the cost of dealing with each tuple. I see a lot less of this if I raise cpu_tuple_cost to something in the 0.03 to 0.05 range. Is this something I can just frob a bit without worrying about it adversely impacting database performance across the board, or should I be very careful and do lots of testing on a staging box first? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorting by an arbitrary criterion
On Thu, Jul 9, 2009 at 6:26 PM, Craig Jamescraig_ja...@emolecules.com wrote: Suppose I have a large table with a small-cardinality CATEGORY column (say, categories 1..5). I need to sort by an arbitrary (i.e. user-specified) mapping of CATEGORY, something like this: 1 = 'z' 2 = 'a' 3 = 'b' 4 = 'w' 5 = 'h' So when I get done, the sort order should be 2,3,5,4,1. If the object is to avoid a separate table, you can do it with a case statement: select ... from ... order by case category when 1 then 'z' when 2 then 'a' when 3 then 'b' when 4 then 'w' when 5 then 'h' end If you this sounds slow, you're right. But it might perform well enough for your use case. A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How would you store read/unread topic status?
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote: This flags table has more or less the following fields: UserID - TopicID - LastReadAnswerID We are doing pretty much same thing. My problem is that everytime a user READS a topic, it UPDATES this flags table to remember he has read it. This leads to multiple updates at the same time on the same table, and an update can take a few seconds. This is not acceptable for my users. First of all, and I'm sure you thought of this, an update isn't needed every time a user reads a topic; only when there are new answers that need to be marked as read. So an update ... where last_read_answer_id ? should avoid the need for an update. (That said, I believe PostgreSQL diffs tuple updates, so in practice PostgreSQL might not be writing anything if you run an update with the same value. I will let someone more intimate with the internal details of updates to comment on this.) Secondly, an update should not take a few seconds. You might want to investigate this part before you turn to further optimizations. In our application we defer the updates to a separate asynchronous process using a simple queue mechanism, but in our case, we found that the updates are fast enough (in the order of a few milliseconds) not to warrant batching them into single transactions. A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How would you store read/unread topic status?
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote: The flags table keeps track of every topic a member has visited and remembers the last answer which was posted at this moment. It allows the user to come back a few days after and immediately jump to the last answer he has not read. I forgot to mention that we speed up our queries by caching the last read ID in Memcached. This is the kind of thing that Memcached is ideal for. For example, we show the list of the most recent posts, along with a comment count, eg. 42 comments (6 new). We found that joining posts against the last-read table is expensive, so instead we read from Memcached on every post to find the number of unread comments. We use the thread's last commented at timestamp as part of the key so that when somebody posts a new comment, every user's cached unread count is invalidated; it is automatically recalculated the next time they view the post. A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Yet another slow nested loop
Here's the query: select photos.* from photos inner join event_participations on event_participations.user_id = photos.creator_id and event_participations.attend = true inner join event_instances on event_instances.id = event_participations.event_instance_id where ( (event_instances.venue_id = 1290) and (photos.taken_at (event_instances.time + interval '-3600 seconds')) and (photos.taken_at (event_instances.time + interval '25200 seconds')) ) order by taken_at desc limit 20 It occasionally takes four minutes to run: QUERY PLAN -- Limit (cost=0.00..10997.65 rows=20 width=116) (actual time=262614.474..262614.474 rows=0 loops=1) - Nested Loop (cost=0.00..5729774.95 rows=10420 width=116) (actual time=262614.470..262614.470 rows=0 loops=1) Join Filter: ((photos.taken_at (event_instances.time + '-01:00:00'::interval)) AND (photos.taken_at (event_instances.time + '07:00:00'::interval))) - Nested Loop (cost=0.00..2055574.35 rows=11869630 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1) - Index Scan Backward using photos_taken_at on photos (cost=0.00..40924.34 rows=544171 width=116) (actual time=14.997..1357.724 rows=544171 loops=1) - Index Scan using event_participations_user_id_index on event_participations (cost=0.00..2.95 rows=60 width=8) (actual time=0.007..0.159 rows=26 loops=544171) Index Cond: (event_participations.user_id = photos.creator_id) Filter: event_participations.attend - Index Scan using event_instances_pkey on event_instances (cost=0.00..0.29 rows=1 width=12) (actual time=0.008..0.008 rows=0 loops=14013998) Index Cond: (event_instances.id = event_participations.event_instance_id) Filter: (event_instances.venue_id = 1290) Total runtime: 262614.585 ms With enable_nestloop to false, it takes about 1 second to run. Database is freshly analyzed and vacuumed. Default statistics target is 100. I have tried increasing the stats on event_participations.user_id, event_participations.event_instance_id and photos.taken_at to 1000, but no improvement. This is PostgreSQL 8.3.3. A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another slow nested loop
On Tue, Jun 16, 2009 at 3:56 PM, Dave Dutcherd...@tridecap.com wrote: -Original Message- From: Alexander Staubo - Nested Loop (cost=0.00..5729774.95 rows=10420 width=116) (actual time=262614.470..262614.470 rows=0 loops=1) Join Filter: ((photos.taken_at (event_instances.time + '-01:00:00'::interval)) AND (photos.taken_at (event_instances.time + '07:00:00'::interval))) - Nested Loop (cost=0.00..2055574.35 rows=11869630 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1) Do you have any of the other enable_* options set to false? No. What do you have random_page_cost set to? I ask because I'm surprised to see postgres choose to loop when it knows it will have to loop 11 million times. The default, ie. 4.0. A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another slow nested loop
On Tue, Jun 16, 2009 at 4:36 PM, Tom Lanet...@sss.pgh.pa.us wrote: Actually the easiest way to fix that is to get rid of the LIMIT. (Maybe use a cursor instead, and fetch only twenty rows.) LIMIT magnifies the risks from any estimation error, and you've got a lot of that here ... There's no cursor support in ActiveRecord, the ORM library we use, and I'm not going to write it. Anyway, I would prefer not to gloss over the underlying problem with something that requires a TODO next to it. What can be done to fix the underlying problem? Nothing? A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Storing sensor data
On Thu, May 28, 2009 at 2:54 PM, Ivan Voras ivo...@freebsd.org wrote: The volume of sensor data is potentially huge, on the order of 500,000 updates per hour. Sensor data is few numeric(15,5) numbers. The size of that dataset, combined with the apparent simplicity of your schema and the apparent requirement for most-sequential access (I'm guessing about the latter two), all lead me to suspect you would be happier with something other than a traditional relational database. I don't know how exact your historical data has to be. Could you get by with something like RRDTool? RRdTool is a round-robin database that stores multiple levels of historical values aggregated by function. So you typically create an average database, a max database and so on, with the appropriate functions to transform the data, and you subdivide these into day, month, year and so on, by the granularity of your choice. When you store a value, the historical data is aggregated appropriately -- at appropriate levels of granularity, so the current day database is more precise than the monthly one, and so on -- and you always have access to the exact current data. RRDTool is used by software such as Munin and Cacti that track a huge number of readings over time for graphing. If you require precise data with the ability to filter, aggregate and correlate over multiple dimensions, something like Hadoop -- or one of the Hadoop-based column database implementations, such as HBase or Hypertable -- might be a better option, combined with MapReduce/Pig to execute analysis jobs A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Storing sensor data
On Thu, May 28, 2009 at 5:06 PM, Ivan Voras ivo...@freebsd.org wrote: If you require precise data with the ability to filter, aggregate and correlate over multiple dimensions, something like Hadoop -- or one of the Hadoop-based column database implementations, such as HBase or Hypertable -- might be a better option, combined with MapReduce/Pig to execute analysis jobs This looks like an interesting idea to investigate. Do you have more experience with such databases? How do they fare with the following requirements: We might want to take this discussion off-list, since this list is about PostgreSQL. Feel free to reply privately. * Storing large datasets (do they pack data well in the database? No wasted space like in e.g. hash tables?) Columns databases like Hypertable and HBase are designed to store data quite efficiently. Each column is grouped in a unit called a column family and stored together in chunks usually called SSTables, after the Google Bigtable paper. (When you design your database you must determine which columns are usually accessed together, in other to avoid incurring the I/O cost of loading non-pertinent columns.) Each SSTable is like a partition. When storing a chunk to disk, the column is compressed, each column being stored sequentially for optimal compression. I have used HBase, but I don't have any feel for how much space it wastes. In theory, though, space usage should be more optimal than with PostgreSQL. I have used Cassandra, another column database I would also recommend, which is very efficient. In many ways I prefer Cassandra to HBase -- it's leaner, completely decentralized (no single point of failure) and independent of the rather huge, monolithic Hadoop project -- but it does not currently support MapReduce. If you want to implement some kind of distributed analysis system, you will need to write yourself. All three column stores support mapping information by a time dimension. Each time you write a key, you also provide a timestamp. In theory you can retain the entire history of a single key. HBase lets you specify how many revisions to retain; not sure what Cassandra does. However, Cassandra introduces the notion of a supercolumn family, another grouping level which lets you use the timestamp as a column key. To explain how this works, consider the following inserts: # insert(table_name, key, column, value, timestamp) db.insert(readings, temperature_sensor, value:1, 23, 200905281725023) db.insert(readings, temperature_sensor, value:2, 27, 200905281725023) db.insert(readings, temperature_sensor, value:3, 21, 200905281725023) The resulting temperature_sensor row will have three column values: value:1 value:2 value:3 23 27 21 You can keep adding values and the row will get bigger. Because columns are dynamic, only that row will grow; all other rows will stay the same size. Cassandra users usually use the column name as a kind of value -- image it's like subindexing an array. As you can see, I also passed a timestamp (the 2009.. bit), which is used for versioning. Since anyone can write to any node in a cluster, Cassandra needs to be able to resolve conflicts. Note that these databases are inherently distributed. You can run them on a single node just fine -- and that might be appropriate in your case -- but they really shine when you run a whole cluster. Cassandra is multi-master, so you can just boot up a number of nodes and read from/write to any of them. * Retrieving specific random records based on a timestamp or record ID? Absolutely. * Storing inifinite datasets (i.e. whose size is not known in advance - cf. e.g. hash tables) This is one area where column databases are better than relational ones. The schema is completely dynamic, and you can treat it as a hash table. On the other hand, we could periodically transfer data from PostgreSQL into a simpler database (e.g. BDB) for archival purposes (at the expense of more code). Would they be better suited? Considering the size and sequential nature of the data, I think they would be better match than a simple key-value store like BDB. A. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad plan for nested loop + limit
On Sun, Mar 1, 2009 at 4:32 AM, Robert Haas robertmh...@gmail.com wrote: What do you have default_statistics_target set to? If it's less than 100, you should probably raise it to 100 and re-analyze (the default value for 8.4 will be 100, but for 8.3 and prior it is 10). Changing it to 100 fixed the problem. Thanks for alerting me to the existence of default_statistics_target. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad plan for nested loop + limit
On Fri, Feb 27, 2009 at 11:54 PM, Robert Haas robertmh...@gmail.com wrote: The problem here is that the planner estimates the cost of a Limit plan node by adding up (1) the startup cost of the underlying plan node, in this case 0 for the nestjoin, and (2) a percentage of the run cost, based on the ratio of the number of rows expected to be returned to the total number of rows. In this case, the nested loop is expected to return 6944 rows, so it figures it won't have to get very far to find the 4 you requested. [...] I will think about this some more but nothing is occurring to me off the top of my head. Thanks for explaining. Is there any way to rewrite the query in a way that will avoid the nested loop join -- other than actually disabling nested loop joins? If I do the latter, the resulting query uses a hash join and completes in 80-100 ms, which is still pretty horrible, especially for a query that returns nothing, but extremely auspicious compared to the unthinkable 4-5 seconds for the current query. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad plan for nested loop + limit
On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo a...@bengler.no wrote: On Sun, Feb 15, 2009 at 5:29 AM, David Wilson david.t.wil...@gmail.com wrote: On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo a...@bengler.no wrote: Output from explain analyze: Limit (cost=0.00..973.63 rows=4 width=48) (actual time=61.554..4039.704 rows=1 loops=1) - Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual time=61.552..4039.700 rows=1 loops=1) - Nested Loop (cost=0.00..68247.77 rows=297 width=52) (actual time=61.535..4039.682 rows=1 loops=1) Those estimates are pretty far off. Did you try increasing the statistics target? Also, is the first query repeatable (that is, is it already in cache when you do the test, or alternately, are all queries *out* of cache when you test?) All right, this query keeps coming back to bite me. If this part of the join: ... and section_items.sandbox_id = 16399 yields a sufficiently large number of matches, then performance goes 'boink', like so: Limit (cost=0.00..34.86 rows=4 width=48) (actual time=4348.696..4348.696 rows=0 loops=1) - Nested Loop (cost=0.00..60521.56 rows=6944 width=48) (actual time=4348.695..4348.695 rows=0 loops=1) - Index Scan using index_event_occurrences_on_start_time on event_occurrences (cost=0.00..11965.38 rows=145712 width=48) (actual time=0.093..138.029 rows=145108 loops=1) Index Cond: (start_time '2009-02-27 18:01:14.739411+01'::timestamp with time zone) - Index Scan using index_section_items_on_subject_type_and_subject_id on section_items (cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0 loops=145108) Index Cond: (((section_items.subject_type)::text = 'Event'::text) AND (section_items.subject_id = event_occurrences.event_id)) Filter: (section_items.sandbox_id = 9) Total runtime: 4348.777 ms In this case: # select count(*) from section_items where sandbox_id = 9; count --- 3126 If I remove the start_time ... clause, performance is fine. Upping the statistics setting on any of the columns involved seems to have no effect. Is this a pathological border case, or is there something I can do to *generally* make this query run fast? Keep in mind that the query itself returns no rows at all. I want to avoid doing an initial select count(...) just to avoid the bad plan. Suffice to say, having a web request take 5 seconds is asking too much from our users. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Sun, Feb 15, 2009 at 6:35 PM, Greg Smith gsm...@gregsmith.com wrote: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm goes over this topic, with Appendix B: pg_stat_bgwriter sample analysis covering a look at what to do based on a pg_stat_bgwriter snapshot. Wonderful, thank you. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad plan for nested loop + limit
On Sun, Feb 15, 2009 at 5:29 AM, David Wilson david.t.wil...@gmail.com wrote: On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo a...@bengler.no wrote: Output from explain analyze: Limit (cost=0.00..973.63 rows=4 width=48) (actual time=61.554..4039.704 rows=1 loops=1) - Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual time=61.552..4039.700 rows=1 loops=1) - Nested Loop (cost=0.00..68247.77 rows=297 width=52) (actual time=61.535..4039.682 rows=1 loops=1) Those estimates are pretty far off. Did you try increasing the statistics target? Also, is the first query repeatable (that is, is it already in cache when you do the test, or alternately, are all queries *out* of cache when you test?) All in the cache when I do the test. Ok, so upping the statistics to 100 on section_items.subject_id fixed it: Limit (cost=3530.95..3530.96 rows=4 width=48) (actual time=0.107..0.107 rows=1 loops=1) - Sort (cost=3530.95..3531.12 rows=66 width=48) (actual time=0.106..0.106 rows=1 loops=1) Sort Key: event_occurrences.start_time Sort Method: quicksort Memory: 25kB - Nested Loop (cost=0.00..3529.96 rows=66 width=48) (actual time=0.098..0.100 rows=1 loops=1) - Index Scan using index_section_items_on_sandbox_id on section_items (cost=0.00..104.29 rows=22 width=4) (actual time=0.017..0.033 rows=7 loops=1) Index Cond: (sandbox_id = 16399) Filter: ((subject_type)::text = 'Event'::text) - Index Scan using index_event_occurrences_on_event_id on event_occurrences (cost=0.00..154.79 rows=74 width=48) (actual time=0.008..0.008 rows=0 loops=7) Index Cond: (event_occurrences.event_id = section_items.subject_id) Filter: (event_occurrences.start_time '2009-02-14 18:15:14.739411+01'::timestamp with time zone) Total runtime: 0.142 ms Thanks. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Fri, Feb 13, 2009 at 6:35 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: You should definitely set effective_cache_size. Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for cache), and today's average write frequency went from 20MB/sec to just 1MB/sec. The documentation suggests that effective_cache_size is only used for query planning in conjunction with indexes. So how come it affects write access? If you still see the problem after that, I suggest testing different settings for: bgwriter_lru_maxpages bgwriter_lru_multiplier checkpoint_segments checkpoint_timeout checkpoint_completion_target Both the checkpoint process and the background writer changed quite a bit, and these are the main knobs for tuning the new version. We are hoping to set up a duplicate instance and play back the SQL log against it so we can experiment with different settings. Until we have such a setup, I'm not sure what to do with the knobs other than frob them wildly. :-) Are there any statistics, either in PostgreSQL proper or in the OS, that I can use as metrics to guide the tuning? For example, is there anything in pg_stat_bgwriter that can help me tune the bgwriter_lru_* settings? Do transactions that only contain query statements end up writing entries to the WAL when they commit? If yes, can we avoid the writes by wrapping our queries in read only transactions, and would it be worth the effort? Our web application is handling 30 requests per second at peak time; each request is performing dozens queries in autocommit mode, ie. one transaction per query. Only a minority of those requests actually end up modifying the database. PostgreSQL is committing and fsyncing 600-800 transactions per second, so that's probably a good chunk of disk/CPU usage wasted, right? Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Sat, Feb 14, 2009 at 9:49 AM, Craig Ringer cr...@postnewspapers.com.au wrote: Is there any chance you had pg_xlog stored separately on your old database, and I/O for it wasn't being recorded? No, the database files have always been on a single volume. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Sat, Feb 14, 2009 at 8:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Staubo a...@bengler.no writes: Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for cache), and today's average write frequency went from 20MB/sec to just 1MB/sec. The documentation suggests that effective_cache_size is only used for query planning in conjunction with indexes. So how come it affects write access? It *is* only used for query planning. A plausible theory is that you caused some join queries to change from hash or merge joining involving a temporary hash or sort file to an index nestloop that doesn't use any temporary storage. If the required hash or sort storage exceeded work_mem, which you have set to just 10MB, that would have created some write traffic. Interesting. Is there any statistic available that can tell me whether work_mem is being exceeded? The tools to monitor exactly what PostgreSQL is doing -- especially on a production box -- are fairly limited, especially since Linux does not yet have anything close to DTrace in functionality. Did you happen to notice whether your queries got faster or slower when you did this? Watching only aggregate write traffic is a pretty limited view of what is happening in your database. Unfortunately we don't log SQL queries at the moment. We do log application response times, but they look roughly the same as before the change. I could revert the effective_cache_size setting, turn on SQL logging for a while, then reapply the change and compare. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Bad plan for nested loop + limit
This dinky little query takes about 4 seconds to run: select event_occurrences.* from event_occurrences join section_items on section_items.subject_id = event_occurrences.event_id and section_items.subject_type = 'Event' and section_items.sandbox_id = 16399 where event_occurrences.start_time '2009-02-14 18:15:14.739411 +0100' order by event_occurrences.start_time limit 4; Output from explain analyze: Limit (cost=0.00..973.63 rows=4 width=48) (actual time=61.554..4039.704 rows=1 loops=1) - Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual time=61.552..4039.700 rows=1 loops=1) - Nested Loop (cost=0.00..68247.77 rows=297 width=52) (actual time=61.535..4039.682 rows=1 loops=1) - Index Scan using index_event_occurrences_on_start_time on event_occurrences (cost=0.00..13975.01 rows=159718 width=48) (actual time=0.024..398.152 rows=155197 loops=1) Index Cond: (start_time '2009-02-14 18:15:14.739411+01'::timestamp with time zone) - Index Scan using index_section_items_on_subject_type_and_subject_id on section_items (cost=0.00..0.33 rows=1 width=4) (actual time=0.023..0.023 rows=0 loops=155197) Index Cond: (((section_items.subject_type)::text = 'Event'::text) AND (section_items.subject_id = event_occurrences.event_id)) Filter: (section_items.sandbox_id = 16399) - Index Scan using event_instances_pkey on events (cost=0.00..6.23 rows=1 width=4) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: (events.id = event_occurrences.event_id) Filter: (events.read_permissions = (-1)) Total runtime: 4039.788 ms Now, if I use limit 50 it uses a plan that is several orders of magnitude more efficient: Limit (cost=6202.38..6202.51 rows=50 width=48) (actual time=0.170..0.171 rows=1 loops=1) - Sort (cost=6202.38..6203.20 rows=326 width=48) (actual time=0.170..0.170 rows=1 loops=1) Sort Key: event_occurrences.start_time Sort Method: quicksort Memory: 25kB - Nested Loop (cost=5.09..6191.55 rows=326 width=48) (actual time=0.160..0.161 rows=1 loops=1) - Bitmap Heap Scan on section_items (cost=5.09..328.94 rows=96 width=4) (actual time=0.024..0.087 rows=7 loops=1) Recheck Cond: (sandbox_id = 16399) Filter: ((subject_type)::text = 'Event'::text) - Bitmap Index Scan on index_section_items_on_sandbox_id (cost=0.00..5.06 rows=107 width=0) (actual time=0.018..0.018 rows=7 loops=1) Index Cond: (sandbox_id = 16399) - Index Scan using index_event_occurrences_on_event_id on event_occurrences (cost=0.00..60.14 rows=74 width=48) (actual time=0.010..0.010 rows=0 loops=7) Index Cond: (event_occurrences.event_id = section_items.subject_id) Filter: (event_occurrences.start_time '2009-02-14 18:15:14.739411+01'::timestamp with time zone) Total runtime: 0.210 ms Similarly if I disable nested joins with set enable_nestloop = off: Limit (cost=10900.13..10900.14 rows=4 width=48) (actual time=191.476..191.478 rows=1 loops=1) - Sort (cost=10900.13..10900.95 rows=326 width=48) (actual time=191.474..191.475 rows=1 loops=1) Sort Key: event_occurrences.start_time Sort Method: quicksort Memory: 25kB - Hash Join (cost=8944.52..10895.24 rows=326 width=48) (actual time=162.104..191.463 rows=1 loops=1) Hash Cond: (section_items.subject_id = event_occurrences.event_id) - Bitmap Heap Scan on section_items (cost=5.09..328.94 rows=96 width=4) (actual time=0.026..0.050 rows=7 loops=1) Recheck Cond: (sandbox_id = 16399) Filter: ((subject_type)::text = 'Event'::text) - Bitmap Index Scan on index_section_items_on_sandbox_id (cost=0.00..5.06 rows=107 width=0) (actual time=0.019..0.019 rows=7 loops=1) Index Cond: (sandbox_id = 16399) - Hash (cost=5580.54..5580.54 rows=157752 width=48) (actual time=161.832..161.832 rows=155197 loops=1) - Seq Scan on event_occurrences (cost=0.00..5580.54 rows=157752 width=48) (actual time=0.030..75.406 rows=155197 loops=1) Filter: (start_time '2009-02-14 18:15:14.739411+01'::timestamp with time zone) Total runtime: 192.496 ms Some statistics: # # select attname, n_distinct from pg_stats where tablename = 'event_occurrences'; attname | n_distinct + id | -1 created_at | -0.291615 updated_at | -0.294081 created_by |715 updated_by |715 event_id | 2146 start_time | -0.10047 end_time | 5602 # select attname, n_distinct from pg_stats where tablename = 'section_items'; attname| n_distinct --+ id
[PERFORM] I/O increase after upgrading to 8.3.5
After upgrading from 8.2 to 8.3.5, the write load on our database server has increased dramatically and inexplicably -- as has the CPU usage. Here's a Munin graph of iostat showing the sudden increase in blocks written/sec: http://purefiction.net/paste/Totals-iostatwrite1-week.png We expected the upgrade to give us better, not worse, performance. How do I diagnose this problem? The application code has not changed, and neither have the usage patterns. The number of tuple inserts/updates/deletes per second has not deviated from the normal. However, CPU usage has increased by around 30%, and the load average has similarly increased. The other performance metrics I monitor, such as transaction load, HTTP traffic, etc., show everything else as either normal or slightly decreased, as you would expect when the database server slows down. The upgrade was done with dump/restore using pg_dump -Fc. The old database lived on a SAN volume, whereas the new database lives on a local disk volume. OS is Linux 2.6.24 on Intel (x86_64). Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Fri, Feb 13, 2009 at 3:46 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Alexander Staubo a...@bengler.no wrote: After upgrading from 8.2 to 8.3.5, the write load on our database server has increased dramatically and inexplicably -- as has the CPU usage. Did you do a VACUUM ANALYZE of the database after loading it? Without the database VACUUM, the first read of any page causes it to be rewritten to set hint bits. Without an ANALYZE, it might be picking very inefficient plans. I actually run a VACUUM FREEZE ANALYZE after loading a database (as with the upgrade), to prevent massive rewrites of everything in the database at some later date due to freeze activity. Thanks, the lack of statistics should explain why things are a bit slow. I ran a vacuum freeze analyze now, but the I/O level is still quite high. I have verified using pidstat that the I/O is all caused by PostgreSQL. Here's some sample output from iostat, interval 1 second (the 4th column is KB written): sda1173.0068.00149672.00 68 149672 sda 14.00 0.00 1712.00 0 1712 sda 2.00 0.00 336.00 0336 sda 679.00 344.00115200.00344 115200 sda 238.00 0.00 61764.00 0 61764 sda 436.00 0.00 95004.00 0 95004 sda 14.00 0.00 1032.00 0 1032 sda1882.0072.00 82380.00 72 82380 sda 173.00 8.00 7936.00 8 7936 What I find odd is that PostgreSQL is only clocking at about 8 tuple modifications per second average (on our user tables). There are around 800 transactions per second, but most of these are only reads. How can 8 tuples/sec result in 115MB writes per second? The database is not large enough and the tuples not large enough to explain those huge amounts of data. At 115MB/s you could rewrite the entire database in 1.5 minutes. Is there any way to determine the *sizes* of the tuple mods, not just the frequency, that PostgreSQL performs? What kinds of tools are available to track down the causes of these writes? Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Fri, Feb 13, 2009 at 12:53 PM, Alexander Staubo a...@bengler.no wrote: The upgrade was done with dump/restore using pg_dump -Fc. The old database lived on a SAN volume, whereas the new database lives on a local disk volume. I need to correct myself: The Munin graphs were never set to track the SAN volume where the old database lived. So when the graph goes from near-zero to lots, it's actually correct. When I compare the correct graph, however, it's apparently that I/O writes have, on average, doubled. The new volume uses the same file system and block size as the old one. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Fri, Feb 13, 2009 at 5:17 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Could you show the non-commented lines from old and new postgresql.conf files, please? Attached. The differences are not performance-related, as far as I can see, aside from the additional of synchronous_commit = off. Alexander. 82.conf Description: Binary data 83.conf Description: Binary data -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Identifying the nature of blocking I/O
On Mon, Aug 25, 2008 at 3:34 AM, Scott Carey [EMAIL PROTECTED] wrote: DTrace is available now on MacOSX, Solaris 10, OpenSolaris, and FreeBSD. Linux however is still in the dark ages when it comes to system monitoring, especially with I/O. While that's true, newer 2.6 kernel versions at least have I/O accounting built in, something which only used to be available through the atop accounting kernel patch: $ cat /proc/22785/io rchar: 31928 wchar: 138 syscr: 272 syscw: 4 read_bytes: 0 write_bytes: 0 cancelled_write_bytes: 0 Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Optimization with Kruskal’s Algorithm
On 5/7/08, Tarcizio Bini [EMAIL PROTECTED] wrote: I'm working on optimizing queries using the Kruskal algorithm (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4318118). That paper looks very interesting. I would love to hear what the PostgreSQL committers think of this algorithm. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Putting files into fields in a table
On 12/13/07, Campbell, Lance [EMAIL PROTECTED] wrote: I am looking at the possibility of storing files in some of my database tables. My concern is obviously performance. I have configured PostgreSQL to take advantage of Linux file caching. So my PostgreSQL does not have a large setting for shared_buffers even though I have 24G of memory. This used to be the recommended way before 8.0. In 8.0, it is advantageous to give PostgreSQL more buffers. You should still make some room for the kernel cache. By storing files, I assume you mean a lot of data imported from files. The procs and cons of storing large amounts of data as PostgreSQL tuples has been debated before. You might want to search the archives. My opinion is that PostgreSQL is fine up to a point (let's say 10,000 tuples x 2KB), above which I would merely store references to file-system objects. Managing these objects can be painful, especially in a cluster of more than one machine, but at least it's fast and lightweight. What data type should I use for fields that hold files? PostgreSQL has two ways of storing large amounts of data in a single tuple: variable-length columns, and blobs. Blobs are divided into chunks and stored in separate tables, one tuple per chunk, indexed by offset, and PostgreSQL allows random access to the data. The downside is that they take up more room, they're slower to create, slower to read from end to end, and I believe there are some operations (such as function expressions) that don't work on them. Some replication products, including (the last time I looked) Slony, does not support replicating blobs. Blobs are not deprecated, I think, but they feel like they are. Variable-length columns such as bytea and text support a system called TOAST, which allow the first few kilobytes of the data to be stored in-place in the tuple, and the overflow to be stored elsewhere and potentially compressed. This system is much faster and tighter than blobs, but does not offer random I/O. Is there anything that I should be aware of when putting files into a field in a table? Backup dumps will increase in size in proportion to the size of your data. PostgreSQL is no speed demon at loading/storing data, so this might turn out to be the Achilles heel. When PostgreSQL accesses a table that has fields that contain files does it put the fields that contain the files into the shared_buffers memory area? I believe so. Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL performance on various distribution stock kernels
On 11/26/07, Damon Hart [EMAIL PROTECTED] wrote: So, what's different between these tests? I'm seeing performance differences of between +65% to +90% transactions per second of the OpenVZ kernel running on the HN over the stock Fedora 8 kernel. Is this reflective of different emphasis between RHEL and Fedora kernel builds? Some OpenVZ optimization on top of the RHEL5 build? Something else? Where should I look? A recent FreeBSD benchmark (which also tested Linux performance) found major performance differences between recent versions of the kernel, possibly attributable to the new so-called completely fair scheduler: http://archives.postgresql.org/pgsql-performance/2007-11/msg00132.php No idea if it's relevant. Alexander. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Estimation problem with a LIKE clause containing a /
On 11/7/07, Guillaume Smet [EMAIL PROTECTED] wrote: While studying a query taking forever after an ANALYZE on a never analyzed database (a bad estimate causes a nested loop on a lot of tuples), I found the following problem: [snip] Total runtime: 31.097 ms [snip] Total runtime: 31.341 ms [snip] Total runtime: 34.778 ms Which is a really good estimate. That's a difference of less than *three milliseconds* -- a difference probably way within the expected overhead of running explain analyze. Furthermore, all three queries use the same basic plan: a sequential scan with a filter. At any rate you're microbenchmarking in a way that is not useful to real-world queries. In what way are these timings a problem? Have you tried using an index which supports prefix searches? The text_pattern_ops operator class lets yo do this with a plain B-tree index: create index cms_items_ancestors_index on cms_items (ancestors text_pattern_ops); analyze cms_items; Now all like 'prefix%' queries should use the index. Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] TEXT or LONGTEXT?
On 9/24/07, Fabiola Fernández [EMAIL PROTECTED] wrote: I have a database with an amount of tables and in several of them I have an attribute for a semantic definition, for which I use a field of type text. I am trying to decide if it would be worth using LONGTEXT instead of TEXT, as maybe it would slow down the data insertion and extraction. I hope that you could help me. Thank you. Easy choice -- PostgreSQL does not have a data type named longtext. Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Replication
On 6/15/07, Craig James [EMAIL PROTECTED] wrote: I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could bring the whole system down. If you're using it for INSERT/UPDATE, then there can only be one PGPool server. Are you sure? I have been considering this possibility, too, but I didn't find anything in the documentation. The main mechanism of the proxy is taking received updates and playing them one multiple servers with 2PC, and the proxies should not need to keep any state about this, so why couldn't you install multiple proxies? Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Replication
On 6/15/07, Craig James [EMAIL PROTECTED] wrote: [snip] Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? * Mammoth Replicator, commercial. * Continuent uni/cluster, commercial (http://www.continuent.com/index.php?option=com_contenttask=viewid=212Itemid=169). * pgpool-II. Supports load-balancing and replication by implementing a proxy that duplicates all updates to all slaves. It can partition data by doing this, and it can semi-intelligently route queries to the appropriate servers. * Cybertec. This is a commercial packaging of PGCluster-II from an Austrian company. * Greenplum Database (formerly Bizgres MPP), commercial. Not so much a replication solution as a way to parallelize queries, and targeted at the data warehousing crowd. Similar to ExtenDB, but tightly integrated with PostgreSQL. * DRDB (http://www.drbd.org/), a device driver that replicates disk blocks to other nodes. This works for failover only, not for scaling reads. Easy migration of devices if combined with an NFS export. * Skytools (https://developer.skype.com/SkypeGarage/DbProjects/SkyTools), a collection of replication tools from the Skype people. Purports to be simpler to use than Slony. Lastly, and perhaps most promisingly, there's the Google Summer of Code effort by Florian Pflug (http://code.google.com/soc/postgres/appinfo.html?csaid=6545828A8197EBC6) to implement true log-based replication, where PostgreSQL's transaction logs are used to keep live slave servers up to date with a master. In theory, such a system would be extremely simple to set up and use, especially since it should, as far as I can see, also transparently replicate the schema for you. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Some info to share: db_STRESS Benchmark results
On 5/31/07, Dimitri [EMAIL PROTECTED] wrote: just wanted to share some benchmark results from one long performance study comparing MySQL, PostgreSQL and Oracle transactions throughput and engine scalability on T2000 and V890 (under Solaris). Interesting, if awfully cryptic. The lack of axis labels, the lack of axis normalization, and the fact that you put the graphs for different databases and parameters on separate pages makes it rather hard to compare the various results. Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Feature suggestion : FAST CLUSTER
On 5/27/07, PFC [EMAIL PROTECTED] wrote: PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster since I run it on dual core ; InnoDB uses only one core). However, InnoDB can automatically cluster tables without maintenance. How does it know what to cluster by? Does it gather statistics about query patterns on which it can decide an optimal clustering, or does it merely follow a clustering previously set up by the user? Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] LIKE search and performance
On 5/23/07, Andy [EMAIL PROTECTED] wrote: An example would be: SELECT * FROM table WHERE name like '%john%' or street like '%srt%' Anyway, the query planner always does seq scan on the whole table and that takes some time. How can this be optimized or made in another way to be faster? There's no algorithm in existence that can index arbitrary substrings the way you think. The only rational way to accomplish this is to first break the text into substrings using some algorithm (eg., words delimited by whitespace and punctuation), and index the substrings individually. You can do this using vanilla PostgreSQL, and you can use Tsearch2 and/or its GIN indexes to help speed up the searches. The simplest solution would be to put all the substrings in a table, one row per substring, along with an attribute referencing the source table/row. At this point you have effectively reduced your search space: you can use a query to isolate the words in your dictionary that contain the substrings. So a query might be: select ... from persons where id in ( select person_id from person_words where word like '%john%'; ) The like search, even though it will use a sequential scan, is bound to be much faster on these small words than searching for substrings through large gobs of text in the persons table. Note that PostgreSQL *can* exploit the index for *prefix* matching, if you tell it to use the right operator class: create index persons_name_index on persons (name text_pattern_ops); or, if you're using varchars (though there is rarely any reason to): create index persons_name_index on persons (name varchar_pattern_ops); (These two may be identical internally. Check the docs.) Now you can do: select ... from persons where name like 'john%'; which will yield a query plan such as this: Index Scan using persons_name_index on persons (cost=0.00..8.27 rows=1 width=29) (actual time=0.184..0.373 rows=51 loops=1) Index Cond: ((name ~=~ 'john'::text) AND (name ~~ 'joho'::text)) Filter: (title ~~ 'john%'::text) Alexander. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performace comparison of indexes over timestamp fields
On 5/22/07, Arnau [EMAIL PROTECTED] wrote: On older versions of PostgreSQL, at least in my experience, queries on timestamps fields even having indexes where performing quite bad mainly sequential scans where performed. PostgreSQL uses B-tree indexes for scalar values. For an expression such as t between a and b, I believe it's going to match both sides of the table independently (ie., t = a and t = b) and intersect these subsets. This is inefficient. You should get better performance by mapping timestamps to a one-dimensional plane and indexing them using GiST. GiST implements an R-tree-like structure that supports bounding-box searches. This involves setting up a functional index: create index ... on payment_transactions using gist ( box(point(extract(epoch from time), 0), point(extract(epoch from time), 0)) box_ops) I'm using box() here because GiST doesn't have a concept of points. Then insert as usual, and then query with something like: select ... from payment_transactions where box( point(extract(epoch from '2006-04-01'::date), 0), point(extract(epoch from '2006-08-01'::date), 0)) box( point(extract(epoch from time), 0), point(extract(epoch from time), 0)); PostgreSQL should be able to exploit the GiST index by recognizing that the result of box() expression operand is already computed in the index. This much less inconvenient and portable -- I would love for PostgreSQL to be provide syntactic sugar and special-casing to make this transparent -- but worth it if you are dealing with a lot of range searches. Now I have a newer version of PostgreSQL and I've done some tests comparing the performance of an index over a timestamp field with a numeric field. To do so, I have the following table: Table public.payment_transactions Column |Type |Modifiers +-+- transaction_id | character varying(32) | not null timestamp_in | timestamp without time zone | default now() credits| integer | epoch_in | bigint | epoch_in2 | double precision| [snip] A timestamp is stored internally as an 8-byte double-precision float. Therefore, timestamp_in and epoch_in2 should behave identically. While doing the tests this table has about 100.000 entries. Make sure PostgreSQL is able to keep the entire table in memory by setting shared_buffers; you don't want to be hitting to the disk. Make sure you run analyze on the table before you execute the test. To test the diferent indexes I have executed the following: Your query plans are roughly identical. The difference in the timings implies that you only ran the queries once. I suggest you run each query at least 10 times, and report the individual numbers (the total runtime parts of the output) you get. Arithmetic means are not that interesting. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performace comparison of indexes over timestamp fields
On 5/22/07, Steinar H. Gunderson [EMAIL PROTECTED] wrote: On Tue, May 22, 2007 at 02:39:33PM +0200, Alexander Staubo wrote: PostgreSQL uses B-tree indexes for scalar values. For an expression such as t between a and b, I believe it's going to match both sides of the table independently (ie., t = a and t = b) and intersect these subsets. This is inefficient. A B-tree index can satisfy range queries such as this. You're right, and I'm wrong -- my head is not in the right place today. B-trees are inefficient for intervals, but quite satisfactory for range searches. Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Tips Tricks for validating hardware/os
On 5/22/07, Stephane Bailliez [EMAIL PROTECTED] wrote: Out of curiosity, can anyone share his tips tricks to validate a machine before labelling it as 'ready to use postgres - you probably won't trash my data today' ? I'm looking for a way to stress test components especially kernel/disk to have confidence 0 that I can use postgres on top of it. Any secret trick is welcome (beside the memtest one :) Compile the Linux kernel -- it's a pretty decent stress test. You could run pgbench, which comes with PostgreSQL (as part of the contrib package). Give a database size that's larger than the amount of physical memory in the box. Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] bitmap index and IS NULL predicate
On 5/15/07, Jason Pinnix [EMAIL PROTECTED] wrote: Does the bitmap index not store a bit vector for the NULL value (i.e. a bit vector that contains a 1 for each row with a NULL value and 0 for other rows) ? You should be able to do this with a conditional index: create index ... (col) where col is null; Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Best OS for Postgres 8.2
On 5/8/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: [snip] I personally don't trust reiserfs, jfs seems to be a tools for transitioning from AIX more then anything else [...] What makes you say this? I have run JFS for years with complete satisfaction, and I have never logged into an AIX box. JFS has traditionally been seen as an underdog, but undeservedly so, in my opinion; one cause might be the instability of the very early releases, which seems to have tainted its reputation, or the alienness of its AIX heritage. However, every benchmark I have come across puts its on par with, and often surpassing, the more popular file systems in performance. In particular, JFS seems to shine with respect to CPU overhead. Alexander. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
On 5/8/07, Tom Lane [EMAIL PROTECTED] wrote: You're not getting the indexscan optimization of the LIKE clause, which is most likely due to having initdb'd the 8.1 installation in something other than C locale. You can either redo the initdb in C locale (which might be a good move to fix other inconsistencies from the 7.3 behavior you're used to) or create a varchar_pattern_ops index on the column(s) you're using LIKE with. Given the performance implications of setting the wrong locale, and the high probability of accidentally doing this (I run my shells with LANG=en_US.UTF-8, so all my databases have inherited this locale), why is there no support for changing the database locale after the fact? # alter database test set lc_collate = 'C'; ERROR: parameter lc_collate cannot be changed Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pg_stat_* collection
On 5/3/07, Greg Smith [EMAIL PROTECTED] wrote: Today's survey is: just what are *you* doing to collect up the information about your system made available by the various pg_stat views? I have this hacked together script that dumps them into a file, imports them into another database, and then queries against some of the more interesting data. You would thing there would be an organized project addressing this need around to keep everyone from reinventing that wheel, but I'm not aware of one. I have a bunch of plugin scripts for Munin (http://munin.projects.linpro.no/) that collect PostgreSQL statistics. Graphs like this are useful: http://purefiction.net/paste/pg_munin_example.png I have been considering tarring them up as a proper release at some point. Anyone interested? Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] index structure for 114-dimension vector
On 5/1/07, Andrew Lazarus [EMAIL PROTECTED] wrote: Let me just thank the list, especially for the references. (I found similar papers myself with Google: and to think I have a university library alumni card and barely need it any more!) I'll write again on the sorts of results I get. Looking forward to hearing about them. I have worked with such dataset problems, but never attempted to apply them to a relational database such as PostgreSQL. If you want speed, nothing beats in-memory vectors on a modern CPU architecture. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] index structure for 114-dimension vector
On 4/20/07, Andrew Lazarus [EMAIL PROTECTED] wrote: I have a table with 2.5 million real[] arrays. (They are points in a time series.) Given a new array X, I'd like to find, say, the 25 closest to X in some sense--for simplification, let's just say in the usual vector norm. Speed is critical here, and everything I have tried has been too slow. Let me chime in with the observation that this is a multidimensional nearest neighbour (reverse nearest neighbour and its close cousin, k-NN) that is well known in statistics, and particularly relevant to statistical learning and classification. Knowing the jargon might help you dig up efficient algorithms to mine your data; there are tons of fascinating papers available through Citeseer. In particular, I recommend the paper Efficient k-NN Search on Vertically Decomposed Data by de Vries et al, SIGMOD 2002 (PDF here: http://citeseer.ist.psu.edu/618138.html), if only for inspiration. It proposes an algorithm called BOND to drastically reduce the search space by probalistic means. They give an example using image histograms, but the algorithm applies to any multidimensional data. Briefly put, it points out that proximity comparison can be computed vertically, a few dimensions at a time, and entire subsets can be thrown away when it's apparent that they are below a statistically derived lower bound. The only gotcha is that the algorithm derives much of its performance from the assumption that your data is vertically decomposed, one table per dimension, otherwise the search effectively incurs a sequential scan of the entire dataset, and then you're pretty much back to square one. The most common approach to nearest neighbour search is to use a spatial data structure. The classic algorithm is the kd-tree (http://en.wikipedia.org/wiki/Kd-tree) and there's the newer K-D-B tree, neither of which are available in PostgreSQL. If I remember correctly, R-trees have also been shown to be useful for high numbers of dimensions; with PostgreSQL you have R-trees and even better R-tree-equivalent support through GiST. I have no idea whether you can actually munge your integer vectors into something GiST can index and search, but it's a thought. (GiST, presumably, can also theoretically index kd-trees and other spatial trees.) Alexander. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Simple query, 10 million records...MySQL ten times faster
On 24 Apr 2007 14:26:46 -0700, zardozrocks [EMAIL PROTECTED] wrote: I have this table: CREATE TABLE test_zip_assoc ( id serial NOT NULL, f_id integer DEFAULT 0 NOT NULL, lat_radians numeric(6,5) DEFAULT 0.0 NOT NULL, long_radians numeric(6,5) DEFAULT 0.0 NOT NULL ); CREATE INDEX lat_radians ON test_zip_assoc USING btree (lat_radians); CREATE INDEX long_radians ON test_zip_assoc USING btree (long_radians); This is a spatial search -- B-tree indexes are much less efficient than this than certain other data structures. The R-tree and its many variants are based on subdividing the space in regions, allowing you to do efficient checks on containment, intersection, etc., based on points or bounding boxes. PostgreSQL implements R-trees natively as well as through a mechanism called GiST, a framework for implementing pluggable tree-like indexes. It also provides some geometric data types. However, as far as I know, PostgreSQL's R-tree/GiST indexes do *not* provide the operators to do bounding box searches. For this you need PostGIS. PostGIS implements the whole GIS stack, and it's so good at this that it's practically the de facto tool among GIS analysts. Installing PostGIS into a database is simple, and once you have done this, you can augment your table with a geometry (*): alter table test_zip_assoc add column lonlat geometry; update test_zip_assoc set lonlat = makepoint( long_radians / (3.14159265358979 / 180), lat_radians / (3.14159265358979 / 180)); The division is to convert your radians into degrees; PostGIS works with degrees, at least out of the box. Now you can query on a bounding box (although, are you sure you got your lons and lats in order? That's Antarctica, isn't it?): select * from test_zip_assoc where lonlat makebox2d( makepoint(-77.6049721697096, 39.5425768302107), makepoint(-76.1592790300818, 40.9882699698386)) This is bound to be blazingly fast. Next you can order by geographic distance if you like: order by distance_sphere(lonlat, makepoint(-77.6049721697096, 39.5425768302107)) Nobody has mentioned PostGIS so far, so I hope I'm not missing some crucial detail, like no spatial indexes allowed!. (*) I cheated here. The PostGIS manual recommends that you use a function to create geometric column, because it will set up some auxilary data structures for you that are needed for certain operations. The recommended syntax is: select AddGeometryColumn('', 'test_zip_assoc', 'geom', -1, 'POINT', 2); Alexander. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Simple query, 10 million records...MySQL ten times faster
On 4/27/07, Alexander Staubo [EMAIL PROTECTED] wrote: [snip] PostGIS implements the whole GIS stack, and it's so good at this that it's practically the de facto tool among GIS analysts. Installing PostGIS into a database is simple, and once you have done this, you can augment your table with a geometry (*): alter table test_zip_assoc add column lonlat geometry; I forgot to include the crucial step, of course: create index test_zip_assoc_lonlat_index on test_zip_assoc using gist (lonlat gist_geometry_ops); analyze test_zip_assoc_lonlat; This creates a GiST index on the geometry and (significantly) updates the table statistics. Alexander. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Warm - standby system.
On 4/24/07, Nimesh Satam [EMAIL PROTECTED] wrote: Can anybody let me know the steps which are supposed to be followed to make the standby machine for read access? and how it should be one. Not possible at the moment. The warm standby is not hot -- it cannot be used for queries while it's acting as a standby. Explained here: http://www.postgresql.org/docs/8.2/static/warm-standby.html Alexander. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Two hard drives --- what to do with them?
On Feb 25, 2007, at 04:39 , Carlos Moreno wrote: I do have the option to configure it in RAID-0, but I'm sort of reluctant; I think there's the possibility that having two filesystems that can be accessed truly simultaneously can be more beneficial. The question is: does PostgreSQL have separate, independent areas that require storage such that performance would be noticeably boosted if the multiple storage operations could be done simultaneously? Putting the WAL (aka pg_xlog) on a separate disk will take some load off your main database disk. See http://www.varlena.com/GeneralBits/ Tidbits/perf.html for this. It is also possible to put individual tables and/or indexes on separate disks by using tablespaces: For example, an index which is very heavily used can be placed on a very fast, highly available disk, such as an expensive solid state device. At the same time a table storing archived data which is rarely used or not performance critical could be stored on a less expensive, slower disk system. (http://www.postgresql.org/docs/8.2/interactive/manage-ag- tablespaces.html) In both cases, the performance benefits tend to be relative to the amount of write activity you experience, and the latter solution assumes you know where the hotspots are. If you have two tables that see continuous, intense write activity, for example, putting each on a separate disk Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 15, 2006, at 04:09 , Ron wrote: At 07:27 PM 12/14/2006, Alexander Staubo wrote: Sorry, I neglected to include the pertinent graph: http://purefiction.net/paste/pgbench2.pdf In fact, your graph suggests that using arch specific options in addition to -O3 actually =hurts= performance. The difference is very slight. I'm going to run without -funroll- loops and -pipe (which are not arch-related) to get better data. Alexander. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 15, 2006, at 17:53 , Ron wrote: At 09:50 AM 12/15/2006, Greg Smith wrote: On Fri, 15 Dec 2006, Merlin Moncure wrote: The slower is probably due to the unroll loops switch which can actually hurt code due to the larger footprint (less cache coherency). The cache issues are so important with current processors that I'd suggest throwing -Os (optimize for size) into the mix people test. That one may stack usefully with -O2, but probably not with -O3 (3 includes optimizations that increase code size). -Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. So far I have been compiling PostgreSQL and running my pgbench script manually, but this makes me want to modify my script to run pgbench automatically using all possible permutations of a set of compiler flags. Last I tried GCC to produce 32-bit code on this Opteron system, though, it complained about the lack of a compiler; can I persuade it to generate 32-bit code (or 32-bit pointers for that matter) without going the cross-compilation route? Alexander. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 14, 2006, at 16:00 , Greg Smith wrote: On Wed, 13 Dec 2006, Ron wrote: The slowest results, Michael's, are on the system with what appears to be the slowest CPU of the bunch; and the ranking of the rest of the results seem to similarly depend on relative CPU performance. This is not what one would naively expect when benching a IO intensive app like a DBMS. pgbench with 3000 total transactions and fsync off is barely doing I/O to disk; it's writing a bunch of data to the filesystem cache and ending the benchmark before the data even makes it to the hard drive. This is why his results become completely different as soon as the number of transactions increases. With little or no actual disk writes, you should expect results to be ranked by CPU speed. I also second your suggestion that pgbench should be run with -S to disable updates. As far as I can see, nobody has reported numbers for this setting, so here goes. I also increased the buffer size, which I found was needed to avoid hitting the disk for block reads, and increased the memory settings. My PostgreSQL config overrides, then, are: shared_buffers = 1024MB work_mem = 1MB maintenance_work_mem = 16MB fsync = off Environment: Linux 2.6.15-23-amd64-generic on Ubuntu. Dual-core AMD Opteron 280 with 4GB of RAM. LSI PCI-X Fusion-MPT SAS. Running with: pgbench -S -v -n -t 5000 -c 5. Results as a graph: http://purefiction.net/paste/pgbench.pdf Stats for CFLAGS=-O0: 18440.181894 19207.882300 19894.432185 19635.625622 19876.858884 20032.597042 19683.597973 20370.19 19989.157881 20207.343510 19993.745956 20081.353580 20356.416424 20047.810017 20319.834190 19417.807528 19906.788454 20536.039929 19491.308046 20002.144230 Stats for CFLAGS=-O3 -msse2 -mfpmath=sse -funroll-loops -m64 - march=opteron -pipe: 23830.358351 26162.203569 25569.091264 26762.755665 26590.822550 26864.908197 26608.029665 26796.116921 26323.742015 26692.576261 26878.859132 26106.770425 26328.371664 26755.595130 25488.304946 26635.527959 26377.485023 24817.590708 26480.245737 26223.427801 Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 14, 2006, at 20:28 , Ron wrote: Can you do runs with just CFLAGS=-O3 and just CFLAGS=-msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -pipe as well ? All right. From my perspective, the effect of -O3 is significant, whereas architecture-related optimizations have no statistically significant effect. As far as I'm aware, though, there's no other arch targets on the Opteron that will make sense, there being no predecessor CPU instruction set to choose from; -march=pentium4 doesn't exist. Also, what HDs are you using? How many in what config? I believe the volume is a two-drive RAID 1 configuration, but I'm not managing these servers, so I'll ask the company's support people. Interestingly enough I see that PostgreSQL seems to be writing around 1MB/s during the pgbench run, even though I'm running pgbench in the - S mode. I haven't had the chance to look at the source yet; is it really only doing selects? Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 15, 2006, at 01:16 , Ron wrote: At 05:39 PM 12/14/2006, Alexander Staubo wrote: On Dec 14, 2006, at 20:28 , Ron wrote: Can you do runs with just CFLAGS=-O3 and just CFLAGS=-msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -pipe as well ? All right. From my perspective, the effect of -O3 is significant, whereas architecture-related optimizations have no statistically significant effect. Is this opinion? Or have you rerun the tests using the flags I suggested? If so, can you post the results? Sorry, I neglected to include the pertinent graph: http://purefiction.net/paste/pgbench2.pdf The raw data: CFLAGS=-msse2 -mfpmath=sse -funroll-loops -m64 -march=opteron -pipe: 18480.899621 19977.162108 19640.562003 19823.585944 19500.293284 19964.383540 20228.664827 20515.766366 19956.431120 19740.795459 20184.551390 19984.907398 20457.260691 19771.395220 20159.225628 19907.248149 20197.580815 19947.498185 20209.450748 20088.501904 CFLAGS=-O3 23814.672315 26846.761905 27137.807960 26957.898233 27109.057570 26997.227925 27291.056939 27565.553643 27422.624323 27392.397185 27757.144967 27402.365372 27563.365421 27349.544685 27544.658154 26957.200592 27523.824623 27457.380654 27052.910082 24452.819263 CFLAGS=-O0 18440.181894 19207.882300 19894.432185 19635.625622 19876.858884 20032.597042 19683.597973 20370.19 19989.157881 20207.343510 19993.745956 20081.353580 20356.416424 20047.810017 20319.834190 19417.807528 19906.788454 20536.039929 19491.308046 20002.144230 CFLAGS=-O3 -msse2 -mfpmath=sse -funroll-loops -m64 -march=opteron - pipe 23830.358351 26162.203569 25569.091264 26762.755665 26590.822550 26864.908197 26608.029665 26796.116921 26323.742015 26692.576261 26878.859132 26106.770425 26328.371664 26755.595130 25488.304946 26635.527959 26377.485023 24817.590708 26480.245737 26223.427801 If -O3 -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron - pipe results in a 30-40% speed up over -O0, and -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -pipe results in a 5-10% speedup, then ~ 1/8 - 1/3 of the total possible speedup is due to arch specific optimizations. Unfortunately, I don't see a 5-10% speedup; -O0 and -msse2 ... are statistically identical. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 11, 2006, at 23:22 , Daniel van Ham Colchete wrote: I ran this test at a Gentoo test machine I have here. It's a Pentium 4 3.0GHz (I don't know witch P4) Try cat /proc/cpuinfo. TESTS RESULTS == On a dual-core Opteron 280 with 4G RAM with an LSI PCI-X Fusion-MPT SAS controller, I am getting wildly uneven results: tps = 264.775137 (excluding connections establishing) tps = 160.365754 (excluding connections establishing) tps = 151.967193 (excluding connections establishing) tps = 148.010349 (excluding connections establishing) tps = 260.973569 (excluding connections establishing) tps = 144.693287 (excluding connections establishing) tps = 148.147036 (excluding connections establishing) tps = 259.485717 (excluding connections establishing) I suspect the hardware's real maximum performance of the system is ~150 tps, but that the LSI's write cache is buffering the writes. I would love to validate this hypothesis, but I'm not sure how. Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 12, 2006, at 13:32 , Michael Stone wrote: On Tue, Dec 12, 2006 at 12:29:29PM +0100, Alexander Staubo wrote: I suspect the hardware's real maximum performance of the system is ~150 tps, but that the LSI's write cache is buffering the writes. I would love to validate this hypothesis, but I'm not sure how. With fsync off? The write cache shouldn't really matter in that case. (And for this purpose that's probably a reasonable configuration.) No, fsync=on. The tps values are similarly unstable with fsync=off, though -- I'm seeing bursts of high tps values followed by low-tps valleys, a kind of staccato flow indicative of a write caching being filled up and flushed. Alexander. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 11, 2006, at 04:35 , Tatsuo Ishii wrote: That's not the whole story. UTF-8 and other variable-width encodings don't provide a 1:1 mapping of logical characters to single bytes; in particular, combination characters opens the possibility of multiple different byte sequences mapping to the same code point; therefore, string comparison in such encodings generally cannot be done at the byte level (unless, of course, you first acertain that the strings involved are all normalized to an unambiguous subset of your encoding). Can you tell me such encodings supported by PostgreSQL other than UTF-8? http://www.postgresql.org/docs/8.1/interactive/ multibyte.html#MULTIBYTE-CHARSET-SUPPORTED Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] File Systems Compared
On Dec 6, 2006, at 16:40 , Brian Wipf wrote: All tests are with bonnie++ 1.03a [snip] Care to post these numbers *without* word wrapping? Thanks. Alexander. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] query produces 1 GB temp file
While I can't explain why PostgreSQL would use that memory, I recommend looking into tweaking the work_mem parameter. This setting specifies how much memory PostgreSQL on certain temporary data structures (hash tables, sort vectors) until it starts using temporary files. Quoting the docs: work_mem (integer) Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash- based processing of IN subqueries. Alexander. On Feb 5, 2005, at 18:25 , Dirk Lutzebaeck wrote: Hi, here is a query which produces over 1G temp file in pgsql_tmp. This is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB sort_mem and 320MB shared_mem. Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All tables have been analyzed before. Can some please explain why the temp file is so huge? I understand there are a lot of rows. All relevant indices seem to be used. Thanks in advance, Dirk EXPLAIN SELECT DISTINCT ON (ft.val_9, ft.created, ft.flatid) ft.docstart, ft.flatobj, bi.oid, bi.en FROM bi, en, df AS ft, es WHERE bi.rc=130170467 AND bi.en=ft.en AND bi.co=117305223 AND bi.hide=FALSE AND ft.en=en.oid AND es.en=bi.en AND es.co=bi.co AND es.spec=122293729 AND (ft.val_2='DG' OR ft.val_2='SK') AND ft.docstart=1 ORDER BY ft.val_9 ASC, ft.created DESC LIMIT 1000 OFFSET 0; Limit (cost=8346.75..8346.78 rows=3 width=1361) - Unique (cost=8346.75..8346.78 rows=3 width=1361) - Sort (cost=8346.75..8346.76 rows=3 width=1361) Sort Key: ft.val_9, ft.created, ft.flatid - Nested Loop (cost=0.00..8346.73 rows=3 width=1361) - Nested Loop (cost=0.00..5757.17 rows=17 width=51) - Nested Loop (cost=0.00..5606.39 rows=30 width=42) - Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8) Index Cond: ((spec = 122293729) AND (co = 117305223::oid)) - Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42) Index Cond: (outer.en = bi.en) Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false)) - Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9) Index Cond: (outer.en = en.oid) - Index Scan using df_en on df ft (cost=0.00..151.71 rows=49 width=1322) Index Cond: (outer.en = ft.en) Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1)) (17 rows) -- EXPLAIN ANALYZE gives: Limit (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1) - Unique (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1) - Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1) Sort Key: ft.val_9, ft.created, ft.flatid - Nested Loop (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677 loops=1) - Nested Loop (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563 loops=1) - Nested Loop (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014 rows=48563 loops=1) - Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8) (actual time=0.184..46.519 rows=5863 loops=1) Index Cond: ((spec = 122293729) AND (co = 117305223::oid)) - Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42) (actual time=0.052..0.218 rows=8 loops=5863) Index Cond: (outer.en = bi.en) Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false)) - Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9) (actual time=0.015..0.019 rows=1
Re: [PERFORM] Optimization of this SQL sentence
On Oct 17, 2006, at 11:33 , Ruben Rubio wrote: CREATE TABLE comment ( idcomment int4 NOT NULL DEFAULT nextval('comment_idcomment_seq'::regclass), [snip 28 columns] CONSTRAINT comment_pkey PRIMARY KEY (idcomment) ) Ficha structure: No indexes in ficha Ficha rows: 17.850 CREATE TABLE ficha ( idficha int4 NOT NULL DEFAULT nextval ('ficha_idficha_seq'::regclass), [snip 67 (!) columns] CONSTRAINT pk_ficha PRIMARY KEY (idficha), CONSTRAINT fk_ficha_geonivel6 FOREIGN KEY (idlocacion) REFERENCES geonivel6 (idgeonivel6) ON UPDATE NO ACTION ON DELETE NO ACTION ) These tables are particularly egregious examples of ignorant database design. You need to understand the relational model (http:// en.wikipedia.org/wiki/Relational_model), specifically data normalization (http://en.wikipedia.org/wiki/Database_normalization) and 3NF (http://en.wikipedia.org/wiki/3NF). These columns are particularly telling: searchengine1 int4, searchengine2 int4, searchengine3 int4, searchengine4 int4, searchengine5 int4, searchengine6 int4, deseo1 int4, deseo2 int4, deseo3 int4, deseo4 int4, deseo5 int4, deseo6 int4, titulomapa_l0 varchar(255), titulomapa_l1 varchar(255), titulomapa_l2 varchar(255), titulomapa_l3 varchar(255), titulomapa_l4 varchar(255), titulomapa_l5 varchar(255), titulomapa_l6 varchar(255), titulomapa_l7 varchar(255), titulomapa_l8 varchar(255), titulomapa_l9 varchar(255), Refactor into three separate tables: create table searchengine ( idficha int references ficha (idficha), searchengine int, primary key (idficha, searchengine) ); create table deseo ( idficha int references ficha (idficha), deseo int, primary key (idficha, deseo) ); create table titulomapa ( idficha int references ficha (idficha), titulomapa int, primary key (idficha, titulomapa) ); Now you can find all search engines for a single ficha row: select searchengine from searchengine where idficha = n This design allows for more than 5 search engines per ficha row, and allows expressive joins such as: select ficha.idficha, searchengine.searchengine inner join searchengine on searchengine.idfciha = ficha.idficha Also, most of your columns are nullable. This alone shows that you don't understand your own data. Lastly, note that in PostgreSQL these length declarations are not necessary: contacto varchar(255), fuente varchar(512), prefijopais varchar(10) Instead, use: contacto text, fuente text, prefijopais text See the PostgreSQL manual for an explanation of varchar vs. text. Alexander. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimization of this SQL sentence
On Oct 17, 2006, at 17:10 , Craig A. James wrote: These tables are particularly egregious examples of ignorant database design. You need to understand the relational model This email is a *particularly* egregious example of rudeness. You owe Mr. Staubo, and the Postgress community, an apology. I'm sorry you feel that way, but I don't think I was out of line. I did point to several informative sources of documentation, and described some of the problems (but by no means all) with the person's schema and how to solve them. If you think the database design in question is *not* ignorant database design, please do explain why, but on technical grounds. (Ignorance, of course, is not a sin.) Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Optimization of this SQL sentence
On Oct 17, 2006, at 17:29 , Mario Weilguni wrote: Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo: Lastly, note that in PostgreSQL these length declarations are not necessary: contacto varchar(255), fuente varchar(512), prefijopais varchar(10) Enforcing length constraints with varchar(xyz) is good database design, not a bad one. Using text everywhere might be tempting because it works, but it's not a good idea. Enforcing length constraints is generally a bad idea because it assumes you know the data domain as expressed in a quantity of characters. Off the top of your head, do you know the maximum length of a zip code? A street address? The name of a city? In almost all cases the limit you invent is arbitrary, and the probability of being incompatible with any given input is inversely proportional to that arbitrary limit. Encoding specific length constraints in the database makes sense when they relate explicitly to business logic, but I can think of only a few cases where it would make sense: restricting the length of passwords, user names, and so on. In a few cases you do know with 100% certainty the limit of your field, such as with standardized abbreviations: ISO 3166 country codes, for example. And sometimes you want to cap data due to storage or transmission costs. The length constraint on text fields is primarily a historical artifact stemming from the way databases have traditionally been implemented, as fixed-length fields in fixed-length row structures. The inexplicable, improbable space-padded (!) character data type in ANSI SQL is a vestige of this legacy. PostgreSQL's variable-length rows and TOAST mechanism makes the point moot. Quoth the PostgreSQL manual, section 8.3: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. Alexander. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Poor performance on very simple query ?
On Oct 3, 2006, at 13:25 , Arnaud Lesauvage wrote: The problem is that simple select queries with the primary key in the WHERE statement take very long to run. For example, this query returns only 7 rows and takes about 1 second to run ! SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41); This is a very small table, but generally speaking, such queries benefit from an index; eg., create index table1_gid on table1 (gid); Note that PostgreSQL may still perform a sequential scan if it thinks this has a lower cost, eg. for small tables that span just a few pages. I have run VACUUM FULL on this table many times... I don't know what to try next ! PostgreSQL's query planner relies on table statistics to perform certain optimizations; make sure you run analyze table1. Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly