Re: [PERFORM] The state of PG replication in 2008/Q2?
Dan Harris wrote: > My desire would be to have a parallel server that could act as a hot > standby system with automatic fail over in a multi-master role. I will add my "me too" for DRBD + Heartbeat. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres not using array
André Volpato <[EMAIL PROTECTED]> writes: > Tom Lane escreveu: >>> We are guessing that a dual core 3.0GHz will beat up a quad core 2.2, >>> at least in this environmnent with less than 4 concurrent queryes. >> >> The most you could hope for from that is less than a 50% speedup. I'd >> suggest investing some tuning effort first. Some rethinking of your >> schema, for example, might buy you orders of magnitude ... with no new >> hardware investment. > > I think we almost reached the tuning limit, without changing the schema. It's hard to tell from the plan you posted (and with only a brief look) but it looks to me like your query with that function is basically doing a join but because the inner side of the join is in your function's index lookup it's effectively forcing the use of a "nested loop" join. That's usually a good choice for small queries against big tables but if you're joining a lot of data there are other join types which are much faster. You might find the planner can do a better job if you write your query as a plain SQL query and let the optimizer figure out the best way instead of forcing its hand. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Identifying the nature of blocking I/O
[for the purpose of this post, 'blocking' refers to an I/O operation taking a long time for reasons other than the amount of work the I/O operation itself actually implies; not to use of blocking I/O calls or anything like that] Hello, I have a situation in which deterministic latency is a lot more important than throughput. I realize this is a hugely complex topic and that there is inteaction between many different things (pg buffer cache, os buffer cache, raid controller caching, wal buffers, storage layout, etc). I already know several things I definitely want to do to improve things. But in general, it would be very interesting to see, at any given moment, what PostgreSQL backends are actually blocking on from the perspective of PostgreSQL. So for example, if I have 30 COMMIT:s that are active, to know whether it is simply waiting on the WAL fsync or actually waiting on a data fsync because a checkpoint is being created. or similarly, for non-commits whether they are blocking because WAL buffers is full and writing them out is blocking, etc. This would make it easier to observe and draw conclusions when tweaking different things in pg/the os/the raid controller. Is there currently a way of dumping such information? I.e., asking PG "what are backends waiting on right now?". -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org pgpv5jmsijPCh.pgp Description: PGP signature
Re: [PERFORM] The state of PG replication in 2008/Q2?
Hi Mathias, On Aug 22, 2008, at 8:35 AM, Mathias Stjernström wrote: I Agree with Robert but i never heard of Cybercluster before. Does anyone have any experience with Cybercluster? It sounds really interesting! Some months ago i took a look into cybercluster. At that point cybercluster was basically postgres-source 8.3 patched already with pgcluster sources. Best regards, Jan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Nested Loop join being improperly chosen
Hello, I'm having trouble with a Nested Loop being selected for a rather complex query; it turns out this is a pretty bad plan as the nested loop's row estimates are quite off (1 estimated / 1207881 actual). If I disable enable_nestloop, the query executes much faster (42 seconds instead of 605). The tables in the query have all been ANALYZEd just before generating these plans. Here are the plans with and without enable_nestloop: http://pastie.org/258043 The inventory table is huge; it currently has about 1.3 x 10^9 tuples. The items table has around 10,000 tuples, and the other tables in the query are tiny. Any ideas or suggestions would be greatly appreciated. Thanks! -- Brad Ediger smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Optimizing a VIEW
On Aug 20, 2008, at 1:18 PM, Tom Lane wrote: If you're worried about storage space, I wouldn't go for arrays of composite :-(. The tuple header overhead is horrendous, almost certainly a lot worse than hstore. Oh holy cow, I didn't realize we had a big header in there. Is that to allow for changing the definition of the composite type? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Postgres not using array
Gregory Stark escreveu: André Volpato <[EMAIL PROTECTED]> writes: I think we almost reached the tuning limit, without changing the schema. It's hard to tell from the plan you posted (and with only a brief look) but it looks to me like your query with that function is basically doing a join but because the inner side of the join is in your function's index lookup it's effectively forcing the use of a "nested loop" join. That's usually a good choice for small queries against big tables but if you're joining a lot of data there are other join types which are much faster. You might find the planner can do a better job if you write your query as a plain SQL query and let the optimizer figure out the best way instead of forcing its hand. Thanks Greg, I rewrote the query with a explicit join, removing the function. The planner uses a nestloop, becouse its only a few rows, none in the end. (A HashAggregate is used to join the same query, running against a bigger database) The good side about the function is the facility to write in a dinamic application. We´re gonna change it and save some bucks... Its an impressive win, look: HashAggregate (cost=19773.60..19773.61 rows=1 width=160) (actual time=0.511..0.511 rows=0 loops=1) -> Nested Loop (cost=19143.21..19773.58 rows=1 width=160) (actual time=0.509..0.509 rows=0 loops=1) Join Filter: ((b.benef_cod_arquivo)::text = (internacoes.cod_benef)::text) -> Bitmap Heap Scan on internacoes (cost=13.34..516.70 rows=1 width=8) (actual time=0.507..0.507 rows=0 loops=1) Recheck Cond: ano * 100) + mes) >= 200805) AND (((ano * 100) + mes) <= 200806)) Filter: (tipo_internacao = 'P'::bpchar) -> Bitmap Index Scan on iinternacoes4 (cost=0.00..13.34 rows=708 width=0) (actual time=0.143..0.143 rows=708 loops=1) Index Cond: ano * 100) + mes) >= 200805) AND (((ano * 100) + mes) <= 200806)) -> Limit (cost=19129.87..19209.26 rows=2117 width=48) (never executed) -> HashAggregate (cost=19129.87..19209.26 rows=2117 width=48) (never executed) -> Bitmap Heap Scan on bds_beneficiario b (cost=822.41..18009.61 rows=56013 width=48) (never executed) Recheck Cond: ((benef_referencia >= 200805) AND (benef_referencia <= 200806)) -> Bitmap Index Scan on ibds_beneficiario2 (cost=0.00..808.41 rows=56013 width=0) (never executed) Index Cond: ((benef_referencia >= 200805) AND (benef_referencia <= 200806)) Total runtime: 0.642 ms -- []´s, ACV
Re: [PERFORM] Slow query with a lot of data
On Fri, Aug 22, 2008 at 2:31 AM, Moritz Onken <[EMAIL PROTECTED]> wrote: > Am 21.08.2008 um 19:08 schrieb Merlin Moncure: > >> On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken <[EMAIL PROTECTED]> >> wrote: >>> >>> Am 21.08.2008 um 16:39 schrieb Scott Carey: >>> It looks to me like the work_mem did have an effect. Your earlier queries had a sort followed by group aggregate at the top, and now its a hash-aggregate. So the query plan DID change. That is likely where the first 10x performance gain came from. >>> >>> But it didn't change as I added the sub select. >>> Thank you guys very much. The speed is now ok and I hope I can finish >>> tihs >>> work soon. >>> >>> But there is another problem. If I run this query without the limitation >>> of >>> the user id, postgres consumes about 150GB of disk space and dies with >>> >>> ERROR: could not write block 25305351 of temporary file: No space left >>> on >>> device >>> >>> After that the avaiable disk space is back to normal. >>> >>> Is this normal? The resulting table (setup1) is not bigger than 1.5 GB. >> >> Maybe the result is too big. if you explain the query, you should get >> an estimate of rows returned. If this is the case, you need to >> rethink your query or do something like a cursor to browse the result. >> >> merlin > > There will be a few million rows. But I don't understand why these rows > bloat up so much. If the query is done the new table is about 1 GB in size. > But while the query is running it uses >150GB of disk space. can we see explain? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why do my hash joins turn to nested loops?
On Thu, 21 Aug 2008, Tom Lane wrote: > I think you need to raise from_collapse_limit and/or > join_collapse_limit. Ahah, that was it.. a much simpler solution than I was fearing. I had already re-written the queries to get around it, but ran into another snag with that method, so this was good timing. Thanks! -- Ian Smith www.ian.org -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Big delete on big table... now what?
Hi list. I have a table with over 30 million rows. Performance was dropping steadily so I moved old data not needed online to an historic table. Now the table has about 14 million rows. I don't need the disk space returned to the OS but I do need to improve performance. Will a plain vacuum do or is a vacuum full necessary? ¿Would a vacuum full improve performance at all? Thanks for your hindsight. Regards, Fernando. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big delete on big table... now what?
>>> "Fernando Hevia" <[EMAIL PROTECTED]> wrote: > I have a table with over 30 million rows. Performance was dropping steadily > so I moved old data not needed online to an historic table. Now the table > has about 14 million rows. I don't need the disk space returned to the OS > but I do need to improve performance. Will a plain vacuum do or is a vacuum > full necessary? > *Would a vacuum full improve performance at all? If this database can be out of production for long enough to run it (possibly a few hours, depending on hardware, configuration, table width, indexes) your best option might be to CLUSTER and ANALYZE the table. It gets more complicated if you can't tolerate down-time. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big delete on big table... now what?
"Fernando Hevia" <[EMAIL PROTECTED]> wrote: > > Hi list. > > I have a table with over 30 million rows. Performance was dropping steadily > so I moved old data not needed online to an historic table. Now the table > has about 14 million rows. I don't need the disk space returned to the OS > but I do need to improve performance. Will a plain vacuum do or is a vacuum > full necessary? > ¿Would a vacuum full improve performance at all? If you can afford the downtime on that table, cluster would be best. If not, do the normal vacuum and analyze. This is unlikely to improve the performance much (although it may shrink the table _some_) but regular vacuum will keep performance from getting any worse. You can also reindex pretty safely. Any queries that run during the reindex will just have to do so without the indexes. Longer term, if you remove smaller groups of rows more frequently, you'll probably be able to maintain performance and table bloat at a reasonable level with normal vacuum. -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big delete on big table... now what?
"Bill Moran" <[EMAIL PROTECTED]> writes: > "Fernando Hevia" <[EMAIL PROTECTED]> wrote: >> >> Hi list. >> >> I have a table with over 30 million rows. Performance was dropping steadily >> so I moved old data not needed online to an historic table. Now the table >> has about 14 million rows. I don't need the disk space returned to the OS >> but I do need to improve performance. Will a plain vacuum do or is a vacuum >> full necessary? >> ¿Would a vacuum full improve performance at all? > > If you can afford the downtime on that table, cluster would be best. > > If not, do the normal vacuum and analyze. This is unlikely to improve > the performance much (although it may shrink the table _some_) but > regular vacuum will keep performance from getting any worse. Note that CLUSTER requires enough space to store the new and the old copies of the table simultaneously. That's the main reason for VACUUM FULL to still exist. There is also the option of doing something like (assuming id is already an integer -- ie this doesn't actually change the data): ALTER TABLE x ALTER id TYPE integer USING id; which will rewrite the whole table. This is effectively the same as CLUSTER except it doesn't order the table according to an index. It will still require enough space to hold two copies of the table but it will be significantly faster. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Large number of tables slow insert
Hi, I use Postgresql 8.3.1-1 to store a lot of data coming from a large amount of sensors. In order to have good performances on querying by timestamp on each sensor, I partitionned my measures table for each sensor. Thus I create a lot of tables. I simulated a large sensor network with 3000 nodes so I have ~3000 tables. And it appears that each insert (in separate transactions) in the database takes about 300ms (3-4 insert per second) in tables where there is just few tuples (< 10). I think you can understand that it's not efficient at all because I need to treat a lot of inserts. Do you have any idea why it is that slow ? and how can have good insert ? My test machine: Intel p4 2.4ghz, 512mb ram, Ubuntu Server (ext3) iostat tells me that I do : 0.5MB/s reading and ~6-7MB/s writing while constant insert Here is the DDL of the measures tables: --- CREATE TABLE measures_0 ( "timestamp" timestamp without time zone, storedtime timestamp with time zone, count smallint, "value" smallint[] ) WITH (OIDS=FALSE); CREATE INDEX measures_0_1_idx ON measures_0 USING btree ((value[1])); -- Index: measures_0_2_idx CREATE INDEX measures_0_2_idx ON measures_0 USING btree ((value[2])); -- Index: measures_0_3_idx CREATE INDEX measures_0_3_idx ON measures_0 USING btree ((value[3])); -- Index: measures_0_count_idx CREATE INDEX measures_0_count_idx ON measures_0 USING btree (count); -- Index: measures_0_timestamp_idx CREATE INDEX measures_0_timestamp_idx ON measures_0 USING btree ("timestamp"); -- Index: measures_0_value_idx CREATE INDEX measures_0_value_idx ON measures_0 USING btree (value); --- Regards Loïc Petit