Re: [PERFORM] postgres performance
On 06/12/13 17:36, chidamparam muthusamy wrote: I rather think Alan is right - you either want a lot more RAM or faster disks. Have a look at your first query... Query: EXPLAIN (analyze, buffers) SELECT text(client) as client, text(gateway) as gateway,count(*)::bigint as total_calls, (avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2) as pdd, sum(call_duration_recv)/1000.0 as duration_recv, sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as call_amount_recv, sum(call_amount_pay) as call_amount_ pay FROM detailed_report WHERE end_time='2013-05-01 00:00' and end_time'2013-07-01 00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway ORDER BY call_amount_recv DESC; QUERY PLAN -- Sort (cost=3422863.06..3422868.69 rows=2254 width=44) (actual time=137852.474..137852.474 rows=5 loops=1) Sort Key: (sum(call_amount_recv)) Sort Method: quicksort Memory: 25kB Buffers: shared read=2491664 - HashAggregate (cost=3422664.28..3422737.53 rows=2254 width=44) (actual time=137852.402..137852.454 rows=5 loops=1) Buffers: shared read=2491664 - Bitmap Heap Scan on detailed_report (cost=644828.11..3399506.87 rows=1029218 width=44) (actual time=4499.558..125443.122 rows=5248227 loops=1) Recheck Cond: ((end_time = '2013-05-01 00:00:00+00'::timestamp with time zone) AND (end_time '2013-07-01 00:00:00+00'::timestamp with time zone) AND ((group_id)::text = 'adm in'::text) AND ((client)::text = 'CHOICE'::text)) Buffers: shared read=2491664 - Bitmap Index Scan on endtime_groupid_client_tsidx_detail_report (cost=0.00..644570.81 rows=1029218 width=0) (actual time=3418.754..3418.754 rows=5248227 loops=1) Index Cond: ((end_time = '2013-05-01 00:00:00+00'::timestamp with time zone) AND (end_time '2013-07-01 00:00:00+00'::timestamp with time zone) AND ((group_id)::text = 'admin'::text) AND ((client)::text = 'CHOICE'::text)) Buffers: shared read=95055 Total runtime: *137868.946 ms* (13 rows) The index is being used, but most of your time is going on the Bitmap Heap Scan. You're processing 5.2 million rows in about 120 seconds - that's about 43 rows per millisecond - not too bad. It's not getting any cache hits though, it's having to read all the blocks. Looking at the number of blocks, that's ~2.5 million at 8KB each or about 20GB. You just don't have the RAM to cache that. If you have lots of similar reporting queries to run, you might get away with dropping the index and letting them run in parallel. Each individual query would be slow but they should be smart enough to share each other's sequential scans - the disks would basically be looping through you data continuously. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 9.2.4 very slow on laptop with windows 8
On 22/08/13 12:30, girish subbaramu wrote: I am running PostgreSQL 9.2.4 on windows 8 , 64 bit operating system , 4GB RAM. A laptop with i3 - 3110M , 2.4 GHZ . The database came bundled with wapp stack 5.4.17-0. We have an php application that serves data from PostgreSQL 9.2.4. The configuration runs with very good performance (3 sec response php + db ) on windows 7 32, 64 bit OS , 4GB RAM (desktops with i3-2100 3.10 GHZ ) . But take around 25 seconds to render on windows 8 , the laptop. I was able to eliminate php , as the performance was as expected. (without DB calls) On the other part the database calls take more than 100 ms for simple queries (Example a table with just 10 row sometimes takes around 126 ms). This information i was able to collect from the pg_log. First step - check the antivirus / security tools aren't interfering. That can slow you down immensely. Second step - have a quick look in your performance monitoring (you can get to it through -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to properly index hstore tags column to faster search for keys
On 07/07/13 08:28, Radu-Stefan Zugravu wrote: Each node has a geometry column called geom and a hstore column called tags. I need to extract nodes along a line that have certain keys in the tags column. To do that I use the following query: SELECT id, tags FROM nodes WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001) AND tags ? '{$type}'; CREATE INDEX nodes_tags_idx ON nodes USING GIN(tags); After creating the index I searched again for nodes using the same first query but there is no change in performance. How can I properly use GIN and GIST to index tags column so I can faster search for nodes that have a certain key in tags column? Your index definition looks OK. Try showing the output of EXPLAIN ANALYSE for your query - that way we'll see if the index is being used. You can always paste explain output to: http://explain.depesz.com/ if it's too long for the email. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to properly index hstore tags column to faster search for keys
On 08/07/13 10:20, Radu-Stefan Zugravu wrote: Any improvement is welcomed. The overall performance of the application is not very good. It takes about 200 seconds to compute a path for not so far star and end points. So you have to call this query 1000 times with different start and end points? I want to improve this query as much as I can. There's only two ways I can see to get this much below 20ms. This will only work if you want a very restricted range of tags. Drop the tag index and create multiple geometry indexes instead: CREATE INDEX node_geo_tourist_idx index details WHERE tags ? 'tourist'; CREATE INDEX node_geo_tourist_idx index details WHERE tags ? 'history'; etc. This will only work if you have a literal WHERE clause that checks the tag. It should be fast though. The second way would be to delete all the nodes that aren't tagged tourist or history. That assumes you are never interested in them of course. How exactly should I post the explain without the index? Do I have to drop all created indexes for the tags column? It takes some time to create them back. Not important - I was just curious. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] WHERE 1 = 2 OR ... makes planner choose a very inefficient plan
On 18/04/13 15:20, dmitry potapov wrote: Hello, I recently stumbled upon on what could be a planner bug or a corner case. If false condition OR ... is added to WHERE clause of SELECT query, then the planner chooses a very inefficient plan. Consider a query: If I comment out 1=2 OR, then the plan changes dramatically: What happens if you substitute: 1. 1=3 OR 2. false OR -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PsqL8.3
On 11/04/13 22:09, Thiyagarajan, Palaniappan wrote: All, We have performance issue in Psql8.3 vacuum run. The CPU usage going 300+ and application getting slow. How do we avoid high cpu and smooth vacuum tables. I'm afraid this isn't nearly enough information for anyone to help. 1. Full version details, some idea of hardware and database size might be useful. Exactly when this happens etc. 2. Is this a manual vacuum or autovacuum? 3. Are you familiar with the manual page regarding the vacuum settings? http://www.postgresql.org/docs/8.3/static/runtime-config-autovacuum.html 4. If so, what changes have you made? 5. You are aware that 8.3 is end-of-life? And you are running 8.3.23 until you upgrade, aren't you? http://www.postgresql.org/support/versioning/ Typically you'd expect disk i/o to be the limiting factor with vacuum rather than CPU. However, it might just be that I've misunderstood your description. More details please. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fighting the planner :-(
On 01/02/13 17:54, Casey Allen Shobe wrote: My apologies - I included the wrong version of the query before...during testing I had tried deparameterizing a few of the input parameters. I also accidentally left out the schema for the network_config_tot2 table from the initial paste. Here is an updated paste, which shows the correct query in a prepare statements. The explain plans are from explain execute hewitt_test (...): http://pgsql.privatepaste.com/00c582c840 Here is the correct explain plan for this statement (still bad): http://explain.depesz.com/s/c46 Three quick observations before the weekend. 1. You said config_id was now smallint in your email, but it reads int in the pastes above. Doesn't matter much which, but just checking we've got the right pastes. 2. The total estimated cost of both queries is about the same (477,225.19 for the varchar, 447,623.86 for the int). This suggests something about your configuration doesn't match the performance of your machine, since presumably the int version is taking at least twice as long as the varchar one. 3. Interestingly, the config_id search on both plans seems to be using a Bitmap Index, so I'm not sure that's the root cause. However, the varchar version seems to have a literal string it's matching against. If you've manually substituted in a literal value, that could be skewing the tests. And two things for you to try if you would: 1. Can you just check and see if any of the row estimates are horribly off for any particular clause in the query? 2. You mention your config settings are mostly at default. What's your work_mem and can you increase it? You can issue a SET for the current session, no need to change it globally. If you've got the RAM try doubling it, then double it again. See what happens to your plan then. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Triggers and transactions
On 28/01/13 18:54, Craig James wrote: If I drop and then recreate a trigger inside of a single transaction, how does it affect other processes trying to use the same table? Can they just merrily go along their way using the table, or will they be blocked by an exclusive lock? I *think* it blocks, but in any case, read on... We have a trigger that detects illegal drugs and dangerous chemicals (such as explosives and flammable compounds that can't be shipped by air). pedantry mode=fulldetects a reference to illegal... (unless you've hooked your RDBMS up to some sort of x-ray scanner, in which case I salute you sir)/pedantry Unfortunately, the trigger's function is necessarily heavyweight and slow. The drop-and-restore-trigger operation is needed when we're copying data one server to another. Run the copy as a different user than ordinary applications (a good idea anyway). Then the function can just check current_user and exit for the copy. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Savepoints in transactions for speed?
On 27/11/12 22:04, Mike Blackwell wrote: I need to delete about 1.5 million records from a table and reload it in one transaction. The data to reload the table is coming from a Perl DBI connection to a different database (not PostgreSQL) so I'm not sure the COPY alternative applies here. No reason why it shouldn't. https://metacpan.org/module/DBD::Pg#COPY-support -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] transactions start time
On 24/07/12 12:14, Aleksei Arefjev wrote: Hi, In statistical reports gathered by PgBadger on our PostgreSQL databases almost always we have in Queries that took up the most time report table information about transactions start time ('BEGIN;' command). Something like that in example below: 23h34m52.26s48,556,1670.00sBEGIN; 0.82s | BEGIN; 0.82s | BEGIN; 0.82s | BEGIN; 0.81s | BEGIN; 0.81s | BEGIN; 0.81s | BEGIN; 0.80s | BEGIN; 0.80s | BEGIN; 0.79s | BEGIN; 0.79s | BEGIN; I'm not sure if I'm reading this right, but are there more than 48 million BEGINs that took 0s each (presumably rounded down) and then a handful taking about 0.8s? If so, then it's likely nothing to do with the BEGIN and just that the machine was busy doing other things when you started a transaction. Databases placed on different hardware, OS - Debian GNU/Linux, PostgreSQL 9.1 So, questions are: 1. Is this a normal situation with transactions start time ( BEGIN method) ? See above 2. How can we reduce transactions start time if it's possible in principle? Below 0.00? Probably not 3. What happens in PostgreSQL on transaction starting time? Can someone describe this process in detail? (of course, I saw in PostgreSQL source code, for example, definition such kind functions, like StartTransaction function, but it's not so easy to understand for third-party researcher, that all of these operations mean in real for performance) Well there are two important things to understand: 1. All* commands run in a transaction 2. I think most of the work in getting a new snapshot etc gets pushed back until it's needed. So - the overall impact of issuing BEGIN should be close to zero. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Create tables performance
On 06/07/12 16:15, Sylvain CAILLET wrote: Hi to all, I run Postgresql 8.3.9 on a dedicated server running with Debian 5.0.4, a strong bi quad-proc with RAM 16Go. My biggest db contains at least 100 000 tables. That is a *lot* of tables and it's probably going to be slow whatever you do. Last time, I started a Java process I use to make some change on it, it created 170 new tables and it took one full minute. What are you using all these tables for? I'm assuming most of them have identical structure. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scale up (postgresql vs mssql)
On 15/04/12 13:43, Eyal Wilde wrote: version;PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit You've probably checked this, but if not it's worth making sure your disk I/O is roughly equivalent for the two operating-systems. It might be poor drivers on the CentOs system. Do you have two equivalent machines, or are you dual-booting? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
On 23/02/12 08:39, Reuven M. Lerner wrote: (4) I tried chunking the deletes, such that instead of trying to delete all of the records from the B table, I would instead delete just those associated with 100 or 200 rows from the R table. On a 1 GB subset of the data, this seemed to work just fine. But on the actual database, it was still far too slow. This is the approach I'd take. You don't have enough control / access to come up with a better solution. Build a temp table with 100 ids to delete. Time that, and then next night you can increase to 200 etc until it takes around 3 hours. Oh - and get the Windows admins to take a look at disk activity - the standard performance monitor can tell you more than enough. If it is swapping constantly, performance will be atrocious but even if the disks are just constantly busy then updates and deletes can be very slow. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] unlogged tables
On 14/11/11 08:10, Sergey Konoplev wrote: Hi, On 12 November 2011 00:18, Stephen Frostsfr...@snowman.net wrote: In a crash, unlogged tables are automatically truncated. BTW I wonder what for they are truncated in a crash case? Because they bypass the transaction-log (WAL), hence unlogged. There's no way to know whether there were partial updates applied when the system restarts. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] unlogged tables
On 14/11/11 10:08, Sergey Konoplev wrote: On 14 November 2011 12:58, Richard Huxtond...@archonet.com wrote: Because they bypass the transaction-log (WAL), hence unlogged. There's no way to know whether there were partial updates applied when the system restarts. I probably did not understand the truncate meaning correct. It truncates all the records of the table or several recent records only? All. Let's say you were doing something like UPDATE unlogged_table SET x=1 WHERE y=2. If a crash occurs during this command, there's no guarantee that the affected disk pages were all updated. Worse, a single page might be partially updated or even have rubbish in it (depending on the nature of the crash). Without the WAL there's no way to check whether the table is good or not, or even to know what the last updates were. So - the only safe thing to do is truncate the unlogged tables. In the event of a normal shutdown, we can flush all the writes to disk so we know all the data has been written, so there is no need to truncate. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] where clause + function, execution order
On 11/11/11 15:54, Julius Tuskenis wrote: On 2011.11.11 17:38, Sorin Dudui wrote: I have the following function: CREATE OR REPLACE FUNCTION xxx(text) [snip] LANGUAGE sql STABLE Function execute plan is prepared when creating it, so the where clause should check the function result not altering its execution.. Not true for SQL functions. They can be inlined, but I'm not sure if this one will be. What does EXPLAIN ANALYSE show for this query? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] where clause + function, execution order
On 11/11/11 16:28, Sorin Dudui wrote: Hi, this is the EXPLAIN ANALYSE output: Merge Left Join (cost=0.00..2820.34 rows=23138 width=777) (actual time=0.049..317.935 rows=26809 loops=1) Merge Cond: ((a.admin10)::text = (b.link_id)::text) - Index Scan using admin_lookup_admin10 on admin_lookup a (cost=0.00..845.04 rows=5224 width=742) (actual time=0.015..40.263 rows=8100 loops=1) Filter: (((admin40)::text '-1'::text) AND (((admin40)::text = 'ITA10'::text) OR ((admin40)::text = 'ITA15'::text) OR ((admin40)::text = 'ITA19'::text) OR ((admin40)::text = 'ITA04'::text) OR ((admin40)::text = 'ITA09'::text) OR ((admin40)::text = 'ITA03'::text) OR ((admin40)::text = 'ITA08'::text) OR ((admin40)::text = 'ITA17'::text) OR ((admin40)::text = 'ITA02'::text) OR ((admin40)::text = 'ITA18'::text) OR ((admin40)::text = 'ITA01'::text) OR ((admin40)::text = 'ITA20'::text) OR ((admin40)::text = 'ITA13'::text) OR ((admin40)::text = 'ITA11'::text) OR ((admin40)::text = 'ITA14'::text) OR ((admin40)::text = 'ITA16'::text) OR ((admin40)::text = 'ITA07'::text) OR ((admin40)::text = 'ITA06'::text) OR ((admin40)::text = 'ITA12'::text) OR ((admin40)::text = 'ITA05'::text))) - Index Scan using reg_data_a08id_copy on registrations_data b (cost=0.00..1496.89 rows=24174 width=45) (actual time=0.008..70.408 rows=24174 loops=1) Total runtime: 372.765 ms That certainly looks like it's been inlined. You are testing for ITA10, ITA15 etc outside the function-call, no? It's pushing those tests down, using index admin_lookup_admin10 to test for them then joining afterwards. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
On 14/02/11 06:59, Gnanakumar wrote: How can we boost performance of queries containing pattern matching characters? QUERY: DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%' As it is clear from the above query, email is matched partially and case-insensitively, which my application requirement demands. Well, for that exact pattern you're not going to find an index that's much help. Do you really need something so wide-ranging though? The above will match all of the following: us...@domain.com us...@sub.domain.com us...@domain.com.au us...@unrelated-domain.com us...@unrelated-domain.com.au us...@sub.domain.com.au us...@sub.unrelated-domain.com us...@sub.unrelated-domain.com.au us...@sub.unrelated-domain.completely-wrong.com Is that really what you are after? Or, did you just want to match: us...@domain.com us...@sub.domain.com -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
On 14/02/11 07:28, Gnanakumar wrote: Is that really what you are after? Or, did you just want to match: us...@domain.com us...@sub.domain.com I understand that because I've (%) at the beginning and end, it's going to match unrelated domains, etc., which as you said rightly, it is wide-ranging. But my point here is that how can I improve performance of the queries containing pattern matching characters. If you really need to match all those options, you can't use an index. A substring-matching index would need to have multiple entries per character per value (since it doesn't know what you will search for). The index-size becomes unmanageable very quickly. That's why I asked what you really wanted to match. So, I'll ask again: do you really want to match all of those options? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
On 14/02/11 07:38, Artur Zając wrote: I had almost the same problem. To resolve it, I created my own text search parser (myftscfg) which divides text in column into three letters parts, for example: someem...@domain.com is divided to som, ome,mee,eem,ema,mai,ail,il@, l@d,@do,dom,oma,mai,ain,in.,n.c,.co,com There should be also index on email column: CREATE INDEX email _fts on mytable using gin (to_tsvector('myftscfg'::regconfig, email)) Every query like email ilike '%domain.com%' should be rewrited to: WHERE to_tsvector('myftscfg',email) @@ to_tsquery('dom') AND to_tsvector('myftscfg',email) @@ to_tsquery('oma') AND to_tsvector('myftscfg',email) @@ to_tsquery('mai') AND ... Looks like you've almost re-invented the trigram module: http://www.postgresql.org/docs/9.0/static/pgtrgm.html -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
On 14/02/11 07:46, Gnanakumar wrote: If you really need to match all those options, you can't use an index. A substring-matching index would need to have multiple entries per character per value (since it doesn't know what you will search for). The index-size becomes unmanageable very quickly. That's why I asked what you really wanted to match. To be more specific, in fact, our current application allows to delete email(s) with a minimum of 3 characters. There is a note/warning also given for application Users' before deleting, explaining the implication of this delete action (partial case-insensitive, and it could be wide-ranging too). So, I'll ask again: do you really want to match all of those options? Yes, as explained above, I want to match all those. Then you can't use a simple index. If you did use an index it would probably be much slower for com or yah or gma and so on. The closest you can do is something like Artur's option (or the pg_trgm module - handy since you are looking at 3-chars and up) to select likely matches combined with a separate search on '%domain.com%' to confirm that fact. P.S. - I'd be inclined to just match the central domain parts, so for us...@europe.megacorp.com you would index europe and megacorp and only allow matching on the start of each string. Of course if your application spec says you need to match on p.c too then that's what you have to do. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance libpq vs JDBC
On 16/12/10 09:21, Werner Scholtes wrote: I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right? Sounds wrong to me. The libpq client is the default reference implementation of the protocol. If there were large efficiencies that could be copied, they would be. Anyway - you don't need to assume what's in the protocol. It's documented here: http://www.postgresql.org/docs/9.0/static/protocol.html I'd stick wireshark or some other network analyser on the two sessions - see exactly what is different. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance libpq vs JDBC
On 16/12/10 12:28, Werner Scholtes wrote: Thanks a lot for your advice. I found the difference: My Java program sends one huge SQL string containing 1000 INSERT statements separated by ';' (without using prepared statements at all!), whereas my C++ program sends one INSERT statement with parameters to be prepared and after that 1000 times parameters. Now I refactured my C++ program to send also 1000 INSERT statements in one call to PQexec and reached the same performance as my Java program. So - it was the network round-trip overhead. Like Divakar suggested, COPY or VALUES (),(),() would work too. You mention multiple updates/deletes too. Perhaps the cleanest and fastest method would be to build a TEMP table containing IDs/values required and join against that for your updates/deletes. I just wonder why anyone should use prepared statements at all? Not everything is a simple INSERT. Preparing saves planning-time on repeated SELECTs. It also provides some SQL injection safety since you provide parameters rather than building a SQL string. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get FK to use new index without restarting the database
On 16/12/10 12:12, Eric Comeau wrote: The problem surfaced at one of our client installs where a maintenance DELETE query was running for over 24 hrs. We have since then identified the missing indexes and have sent the client a script to create them, but in our testing we could not been able to get postgres to use the new index for the FK cascade delete without bouncing the database. Well, an ongoing DELETE isn't going to see a new index. I'd have thought a new connection should though. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using more tha one index per table
On 22/07/10 03:27, Greg Smith wrote: Steve Atkins wrote: If http://postgresql.org/docs/9.0/* were to 302 redirect to http://postgresql.org/docs/current/* while 9.0 is the current release (and similarly for 9.1 and so on) I suspect we'd find many more links to current and fewer links to specific versions after a year or two. True, but this would leave people with no way to bookmark a permanent link to whatever is the current version, which will represent a regression for how some people want the site to work. Having a quick look at the website, a simple change might be to have a large CURRENT MANUALS link above all the versioned links. That should help substantially. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance Tuning Large PL/PGSQL Stored Procedure
On 26/03/10 03:56, Eliot Gable wrote: I really am chasing milliseconds here, and I appreciate all your feedback. You've given me a relatively large number of possible optimizations I can try out. I will definitely try out the libpqtypes. That sounds like a promising way to further cut down on execution time. I think most of my performance penalty is in transfering the results back to the C++ application. In addition to all of Merlin's good advice, if the client is on a different machine to the server then try sticking wireshark or similar onto the connection. That should make it pretty clear where the main costs are in getting your data back. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple data base on same server
On 26/02/10 09:37, elias ghanem wrote: Hi, I'm using postgresql 8.4 I need to install multiple postgresql dbs on one server but I have some questions: -Is there any problems (performance wise or other) if I have 10 to 15 DBs on the same server? Clearly that's going to depend on what they're all doing and how big a server you have. There's no limitation in PostgreSQL that stops you though. -Each DB needs 10 tablespaces, so if I create 10 different tablespaces for each DB I will have 100 to 150 table space on the same server. So can this also cause any problems? Do you have 200-300+ disks to put these tablespaces on? If not, I'm not clear what you are trying to do. Why does each DB need 10 tablespaces? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple data base on same server
On 26/02/10 12:45, elias ghanem wrote: Hi, Thanks for your answer, Concerning the second point, each db have different table that are logically related (for ex, tables for configuration, tables for business...) plus I'm planning to put the indexes on their own tablespaces. Concerning the disks I will maybe stored on multiple disks (but surely not 200-300). So I'm just wondering If this big number of tablespaces on a same db server may cause problems, If the tablespaces aren't on different disks, I'm not sure what the point is. Do you perhaps mean schemas? So you have e.g. a system schema with tables users, activity_log etc? There's no problem with 20-30 schemas per database. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Extracting superlatives - SQL design philosophy
On 24/02/10 23:37, Dave Crooke wrote: 1. The city temps table is a toy example, not meant to be realistic :-) You knew that and I guessed it, but it's worth stating these things for people who read the archives a year from now. 2. Yes, my (Java) algorithm is deterministic ... it will return exactly one row per city, and that will be the row (or strictly, *a* row) containing the highest temp. Temp value ties will break in favour of earlier rows in Guinness Book of Records tradition :-) It's equivalent to a HashAggregate implementation. But not when you add in other columns (which is what you're trying to do). The following two query plans (from my real schema) illustrate the itch I am trying to scratch I want the functionality of the 2nd one, but with the execution plan structure of the first: # explain analyse select a, max(b) from perf_raw_2010_02_23 group by a; QUERY PLAN -- HashAggregate (cost=117953.09..117961.07 rows=639 width=8) (actual time=10861.845..10863.008 rows=1023 loops=1) - Seq Scan on perf_raw_2010_02_23 (cost=0.00..91572.39 rows=5276139 width=8) (actual time=0.038..4459.222 rows=5276139 loops=1) Total runtime: 10863.856 ms (3 rows) Time: 10864.817 ms # explain analyse select distinct on (a) * from perf_raw_2010_02_23 order by a, b desc ; One big bit of the cost difference is going to be the ordering you need to get a repeatable result. QUERY PLAN - Unique (cost=1059395.04..1085775.73 rows=639 width=28) (actual time=46011.204..58428.210 rows=1023 loops=1) - Sort (cost=1059395.04..1072585.39 rows=5276139 width=28) (actual time=46011.200..53561.112 rows=5276139 loops=1) Sort Key: a, b Sort Method: external merge Disk: 247584kB -- actually OS RAM buffers Even if the sort never actually reaches a physical disk, you should still see an increase by increasing sort_mem for the duration of the one query. It's not going to be the magnitude you want, but probably worth doing. - Seq Scan on perf_raw_2010_02_23 (cost=0.00..91572.39 rows=5276139 width=28) (actual time=0.047..6491.036 rows=5276139 loops=1) Total runtime: 58516.185 ms (6 rows) Time: 58517.233 ms The only difference between these two is that the second query returns the whole row. The *ratio* in cost between these two plans increases in proportion to log(n) of the table size ... at 5.5m rows its livable, at 500m it's probably not :-! If performance on this query is vital to you, and the table doesn't change after initial population (which I'm guessing is true) then try an index on (a asc, b desc) and CLUSTER that index. Depending on the ratio of distinct a:b values that could be what you're after. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Extracting superlatives - SQL design philosophy
On 24/02/10 22:47, Dave Crooke wrote: I'd imagine it would be possible to have a query planner optimization that would convert Garrett's DISTINCT ON syntax to do what I was trying to, by realizing that DISTINCT ON X ... ORDER BY Y DESC is going to return the the one row for each X which has the highest value of Y, and so use a MAX-structured accumulation instead of a sort. Why is there only one row? For city temperatures, that seems unlikely. In the event of more than one row does your algorithm give repeatable results? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On 22/01/10 19:06, Tory M Blue wrote: Here is the explain plan for the query. Actual rows that the query returns is 6369 Actually, it processes 19,799 rows (see the actual rows= below). SLOW - Bitmap Heap Scan on userstats (cost=797.69..118850.46 rows=13399 width=8) (actual time=281.604..31190.290 rows=19799 loops=1) Total runtime: 31219.536 ms FAST - Bitmap Heap Scan on userstats a (cost=802.66..118855.43 rows=33276 width=23) (actual time=55.400..3807.908 rows=2606 loops=1) Total runtime: 3813.626 ms OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per row) The second processes 2,606 rows in 3,813 ms (about 1.3ms per row). You are asking for DISTINCT user-ids, so it's seems reasonable that it will take slightly longer to check a larger set of user-ids. Otherwise, both queries are the same. I'm still a little puzzled by the bitmap scan, but the planner probably knows more about your data than I do. The main time is spent in the bitmap heap scan which is where it's grabbing actual row data (and presumably building a hash over the uid column). you can see how long in the actual time the first number (e.g. 281.604) is the time spent before it starts, and the second is the total time at finish (31190.290). If loops was greater than 1 you would multiply the times by the number of loops to get a total. So - there's nothing wrong in the sense that the second query does the same as the first. Let's take a step back. What you really want is your reports to be faster. You mentioned you were running this query thousands of times with a different makeid each time. Running it once for all possible values and stashing the results in a temp table will probably be *much* faster. The planner can just scan the whole table once and build up its results as it goes. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On 22/01/10 18:03, Tory M Blue wrote: On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxtond...@archonet.com wrote: On 21/01/10 22:15, Tory M Blue wrote: 2. If it's mostly tagged=true you are interested in you can always use a partial index: CREATE INDEX ... (makeid) WHERE tagged This might be a win even if you need a second index with WHERE NOT tagged. Partial index doesn't seem to fit here due to the fact that there are 35-40% Marked True. Didn't think about creating a second index for false, may give that a shot. If you're mostly search tagged=true, try the partial index - it'll mean the planner is just scanning the index for the one term. Also, either I've not had enough cofee yet, or a bitmap scan is an odd choice for only ~ 13000 rows out of 100 million. *-Bitmap Index Scan on idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)* * Index Cond: ((makeid = 'b1mw-ferman'::text) AND (tagged = true))* Otherwise, see what Craig said. I'm assuming this isn't the query that is CPU bound for a long time. Unless your table is horribly bloated, there's no reason for that judging by this plan. It is, but not always, only when there are 10K more matches. And the explain unfortunately is sometimes way high or way low, so the expalin is hit and miss. But the same sql that returns maybe 500 rows is pretty fast, it's the return of 10K+ rows that seems to stall and is CPU Bound. Hmm - might be able to push that cross-over point up a bit by tweaking various costs, but you've got to be careful you don't end up making all your other queries worse. It'd be good to figure out what the problem is first. Looking at the query there are four stages: 1. Scan the index, build a bitmap of heap pages with matching rows 2. Scan those pages, find the rows that match 3. Run DISTINCT on the uids 4. Count them I wonder if it could be the DISTINCT. What happens with a count(*) or count(uid) instead? Also - you might find EXPLAIN ANALYZE more useful than straight EXPLAIN here. That will show actual times for each stage. On Craig's branch of this thread, you say you call it 6000 times with different makeids. Any reason why you can't join to a temp table and just do it in one query? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?
On 19/01/10 10:50, fka...@googlemail.com wrote: However, the deeper question is (sounds ridiculous): Why am I I/O bound *this much* here. To recall: The write performance in pg is about 20-25% of the worst case serial write performance of the disk (and only about 8-10% of the best disk perf) even though pg_xlog (WAL) is moved to another disk, only 10 simple INSERT commands, a simple table of 5 columns (4 unused, one bytea) and one index for OID, no compression since STORAGE EXTERNAL, ntfs tweaks (noatime etc), ... I'm no Windows expert, but the sysinternals tools (since bought by Microsoft) have always proved useful to me. Diskmon should show you what's happening on your machine: http://technet.microsoft.com/en-us/sysinternals/bb896646.aspx Be aware that this will generate a *lot* of data very quickly and you'll need to spend a little time analysing it. Try it without PG running to see what your system is up to when idle first to get a baseline. Unfortunately it doesn't show disk seek times (which is probably what you want to measure) but it should let you decode what reads/writes are taking place when. If two consecutive disk accesses aren't adjacent then that implies a seek of course. Worst case you get two or more processes each accessing different parts of the disk in an interleaved arrangement. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
jes...@krogh.cc wrote: That structure isn't exposed to the planner though, so it doesn't benefit from any re-ordering the planner would normally do for normal (exposed) AND/OR clauses. Now, to_tsquery() can't re-order the search terms because it doesn't know what column it's being compared against. In fact, it might not be a simple column at all. I cant follow this logic based on explain output, but I may have misunderstood something. The only difference in these two query-plans is that we have an additional or'd term in the to_tsquery(). Hmm - I've had a poke through the source. I've slightly misled you... What we see is that, the query-planner indeed has knowledge about changes in the row estimates based on changes in the query to to_tsquery(). Yes, new in 8.4 - sorry, thought that hadn't made it in. The two plan-nodes in question are in: backend/executor/nodeBitmapIndexscan.c backend/executor/nodeBitmapHeapscan.c The actual tsearch stuff is in src/backend/utils/adt/ts*.c It looks like TS_execute (tsvector_op.c) is the bit of code that handles the tsquery tree. That uses a callback to actually check values (checkcondition_gin). The gin_extract_tsquery function is presumably the extractQuery function as described in the manuals (Ch 52). So, I'm guessing you would want to do is generate a reduced query tree for the indexscan (A B C = A if A is an uncommon word) and use the full query tree for the heap check. Now, what isn't clear to me on first glance is how to determine which phase of the bitmap scan we are in. HTH Just checking, because I don't think it's useful in this case. But, you don know about gin_fuzzy_search_limit? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
Jeff Davis wrote: On Fri, 2009-10-23 at 09:26 +0100, Richard Huxton wrote: That structure isn't exposed to the planner though, so it doesn't benefit from any re-ordering the planner would normally do for normal (exposed) AND/OR clauses. I don't think that explains it, because in the second plan you only see a single index scan with two quals: Index Cond: ((ftsbody_body_fts @@ to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@ to_tsquery('spellerror'::text))) So it's entirely up to GIN how to execute that. http://www.postgresql.org/docs/8.4/static/gin-extensibility.html Datum *extractQuery(...) Returns an array of keys given a value to be queried; that is, query is the value on the right-hand side of an indexable operator whose left-hand side is the indexed column So - that is presumably two separate arrays of keys being matched against, and the AND means if the first fails it'll never check the second. What I'm not sure about is if tsquery('commonterm spellerror') produces two sets of keys or if it just produces one. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance problems with DISTINCT ON
Sgarbossa Domenico wrote: I need to retrieve the most recent prices per products from a price list table: select distinct on (articolo) articolo,data_ent,prezzo from listini_anagrafici order by articolo, data_ent desc but it seems that this query runs slowly... about 5/6 seconds. the table contains more or less 500K records, PostgreSQL version is 8.1.11 and the server has 4gb of RAM entirely dedicate to the db. 'Unique (cost=73893.89..76551.25 rows=88312 width=24) (actual time=4022.578..5076.206 rows=193820 loops=1)' ' - Sort (cost=73893.89..75222.57 rows=531472 width=24) (actual time=4022.574..4505.538 rows=531472 loops=1)' 'Sort Key: articolo, data_ent' '- Seq Scan on listini_anagrafici (cost=0.00..16603.72 rows=531472 width=24) (actual time=0.009..671.797 rows=531472 loops=1)' 'Total runtime: 5217.452 ms' You've got 531472 rows in the table and the query is going to output 193820 of them. Scanning the whole table is almost certainly the way to go. If the table doesn't change much, you could try running a CLUSTER on the index you've created. That will lock the table while it re-orders the physical layout of the rows based on your index though, so it's no good if the table is updated much. Failing that, you could try issuing set work_mem = ... before the query with increasing sizes for work_mem. That might make the sort faster too. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?
Vlad Romascanu wrote: Problem occurs when running (in production) Postgres 8.3.7 64-bit (from RPM) on Ubuntu 8.04.2, on an Amazon EC2 (xen) Large instance (8GB RAM), with the DB on a 50GB EC2 block device. Hmm - don't know what the characteristics of running PG on EC2 are. This might be something peculiar to that. Problem does not occur when running (in staging/pre-production) Postgres 8.3.5 32-bit (from RPM) on Ubuntu 8.04.1, on a less beefy Amazon EC2 (xen) Small instance, with the DB on a 5GB EC2 block device. I am running with slow query logging on, and somewhat inexplicably I've been getting the following slow UPDATE query several times in the past weeks (I'm also including some context lines above and below): 2009-09-14 08:12:30.363 UTC u...@database pid=26474 ip=127.0.0.1(58364) sid=4aadf58d.676a:1 LOG: duration: 13472.892 ms statement: UPDATE document_sets SET status = E'rejected', This is one of the faster occurrences; at times the query has been logged as having taken 100+ seconds: That's *very* slow indeed, and clearly the query itself is simple enough. Typically in a situation like this you might suspect checkpointing was the problem. Lots of dirty disk pages being flushed to disk before a checkpoint. The stats for disk activity you give don't support that idea, although 10 minute intervals is quite far apart. Your table-stats show this is a small table. If it's updated a lot then it might be that your autovacuum settings aren't high enough for this table. The log_autovacuum_min_duration setting might be worth enabling too - to see if autovacuum is taking a long time over anything. Another thing that can cause irregular slowdowns is if you have a trigger with some custom code that takes an unexpectedly long time to run (takes locks, runs a query that plans badly occasionally). I don't know if that's the case here. Oh, if you don't have indexes on status or updated_at then you might want to read up on HOT and decrease your fill-factor on the table too. That's unrelated to this though. It looks like the problem is common enough that you could have a small script check pg_stat_activity once every 10 seconds and dump a snapshot of pg_locks, vmstat etc. If you can catch the problem happening that should make it easy to diagnose. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Under the hood of views
David Kerr wrote: create view test as select a,b,c,d,e,f,g from testtable; select a from test; (does the engine retrieve b-g?) Shouldn't - the query just gets rewritten macro-style. I don't think it eliminates joins if you don't need any columns, but that's not possible without a bit of analysis. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Greenplum MapReduce
Suvankar Roy wrote: Hi all, Has anybody worked on Greenplum MapReduce programming ? I am facing a problem while trying to execute the below Greenplum Mapreduce program written in YAML (in blue). The other poster suggested contacting Greenplum and I can only agree. The error is thrown in the 7th line as: Error: YAML syntax error - found character that cannot start any token while scanning for the next token, at line 7 (in red) There is no red, particularly if viewing messages as plain text (which most people do on mailing lists). Consider indicating a line some other way next time (commonly below the line you put something like this is line 7 ^) The most common problem I get with YAML files though is when a tab is accidentally inserted instead of spaces at the start of a line. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Greenplum MapReduce
Suvankar Roy wrote: Hi Richard, I sincerely regret the inconvenience caused. No big inconvenience, but the lists can be very busy sometimes and the easier you make it for people to answer your questions the better the answers you will get. %YAML 1.1 --- VERSION: 1.0.0.1 DATABASE: test_db1 USER: gpadmin DEFINE: - INPUT: #** This the line which is causing the error **# NAME: doc TABLE: documents If it looks fine, always check for tabs. Oh, and you could have cut out all the rest of the file, really. I have learnt that unnecessary TABs can the cause of this, so trying to overcome that, hopefully the problem will subside then I'm always getting this. It's easy to accidentally introduce a tab character when reformatting YAML. It might be worth checking if your text editor has an option to always replace tabs with spaces. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Incr/Decr Integer
William Scott Jordan wrote: Hey all! Is there a better way to increase or decrease the value of an integer than doing something like: UPDATE the_table SET the_int = the_int + 1 WHERE the_id = 123 ; No. We seem to be getting a lot of deadlocks using this method under heavy load. Just wondering if we should be doing something different. You can't get deadlocks with that - it only references one table. What is the purpose of this query - how are you using it? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very big insert/join performance problem (bacula)
Marc Cousin wrote: Your effective_cache_size is really small for the system you seem to have - its the size of IO caching your os is doing and uses no resources itself. And 800MB of that on a system with that amount of data seems a bit unlikely ;-) Using `free` you can see the amount of io caching your OS is doing atm. in the 'cached' column. That possibly might tip some plans in a direction you prefer. What kind of machine are you running this on? I played with this parameter too, and it didn't influence the plan. Anyway, the doc says it's the OS cache available for one query, No they don't. I'm guessing you're getting mixed up with work_mem. and there may be a lot of insert queries at the same time, so I chose to be conservative with this value. I tried it with 8GB too, the plans were the same. The OS cache is around 8-10GB by the way. That's what you need to set effective_cache_size to then. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very big insert/join performance problem (bacula)
Marc Cousin wrote: Temporarily I moved the problem at a bit higher sizes of batch by changing random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an apprentice sorcerer with this, as I told postgreSQL that fetching rows from disk are much cheaper than they are. These values are, I think, completely abnormal. They certainly don't have anything to do with reality. Try putting them back to (say) seq_page_cost=1 and random_page_cost=2. So, finally, to my questions : - Is it normal that PostgreSQL is this off base on these queries (sorry I don't have the plans, if they are required I'll do my best to get some, but they really are the two obvious plans for this kind of query). What could make it choose the hash join for too small batch tables ? No point in speculating without plans. - Is changing the 2 costs the way to go ? Not the way you have. - Is there a way to tell postgreSQL that it's more costly to sort than it thinks ? (instead of telling it that fetching data from disk doesn't cost anything). That's what the configuration settings do. But if you put a couple way off from reality it'll be pure chance if it gets any estimates right. Here are the other non-default values from my configuration : shared_buffers = 2GB work_mem = 64MB Set this *much* higher when you are running your bulk imports. You can do it per-connection. Try 256MB, 512MB, 1GB (but keep an eye on total memory used). maintenance_work_mem = 256MB max_fsm_pages = 1500 # There are quite big deletes with bacula ... effective_cache_size = 800MB See other emails on this one. default_statistics_target = 1000 Probably don't need this for all columns, but it won't cause problems with these queries. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data caching
Martin Chlupac wrote: Hello everybody, I have a simple query which selects data from not very large table ( 434161 rows) and takes far more time than I'd expect. I believe it's due to a poor disk performance because when I execute the very same query for a second time I get much better results (caching kicks in?). Can you please confirm my theory or do you see any other possible explanation? Yep - it's the difference between fetching from memory and from disk. - Bitmap Heap Scan on records_f4f23ca0-9c35-43ac-bb0d-1ef3784399ac (cost=76.75..3819.91 rows=1912 width=206) (actual time=329.416..3677.521 rows=2161 loops=1) - Bitmap Heap Scan on records_f4f23ca0-9c35-43ac-bb0d-1ef3784399ac (cost=76.75..3819.91 rows=1912 width=206) (actual time=1.616..10.369 rows=2161 loops=1) The plan scans the index, and builds up a bitmap of which disk-blocks contain (potential) matches. It then has to read the blocks (the heap scan above), confirm they match and then return the rows. If you look at the actual time above you can see about 90% of the slow query is spent doing this. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow DELETE on 12 M row table
Greg Stark wrote: waiting means it's blocked trying to acquire a lock. Some open transaction has the table you're trying to index locked. Look in pg_locks and pg_stat_activity to find out who. Or you might find CREATE INDEX CONCURRENTLY fits your situation. http://www.postgresql.org/docs/8.3/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] EXPLAIN understanding? (restarted from Censorship)
On Wed, 10 Jun 2009, Richard Huxton wrote: Send it to the list again, and cc: me directly if you like. If it doesn't show up in the next 20 minutes, I'll try sending it. Okay, here we go. I have (per Tom's advice) found some acknowledgement knobs on Majordomo. Here follows my original rejected mail: Hi. I thought by now I would be fairly good at understanding EXPLAIN ANALYSE results, but I can't quite figure this one out. Perhaps someone could help me. EXPLAIN ANALYSE SELECT * FROM GeneGoAnnotation a1, GOAnnotation a2, OntologyTermRelations a3 WHERE a1.GoAnnotation = a2.id AND a2.ontologyTermId = a3.OntologyTerm; QUERY PLAN --- Nested Loop (cost=0.00..673587.67 rows=330437962 width=95) (actual time=0.056..1924645.797 rows=344491124 loops=1) - Merge Join (cost=0.00..28369.58 rows=361427 width=87) (actual time=0.039..4620.912 rows=361427 loops=1) Merge Cond: (a1.goannotation = a2.id) - Index Scan using genegoannotation__goannotation on genegoannotation a1 (cost=0.00..9710.32 rows=361427 width=8) (actual time=0.015..840.547 rows=361427 loops=1) - Index Scan using goannotation_pkey on goannotation a2 (cost=0.00..13133.12 rows=403323 width=79) (actual time=0.014..1427.179 rows=403323 loops=1) - Index Scan using ontologytermrelations__ontologyterm on ontologytermrelations a3 (cost=0.00..1.20 rows=47 width=8) (actual time=0.022..1.908 rows=953 loops=361427) Index Cond: (a3.ontologyterm = a2.ontologytermid) Total runtime: 2524647.064 ms (8 rows) If I look at the actual results of the outer-most join, the nested loop, then I can take the number rows=344491124 and divide it by loops=361427 to get rows=953. Clearly this means that on average each index scan on a3 returned 953 rows. However, if I apply the same logic to the estimated results, it all falls apart. The total estimated number of rows is remarkably accurate, as is the estimated number of loops (results from the merge join). However the average number of rows expected to be returned from the index scan is only 47. I don't know how the planner is getting its accurate final estimate of rows=330437962, because it is not from multiplying rows=361427 by rows=47. That would only give 16987069 rows. Any ideas/explanations? Matthew -- Now, you would have thought these coefficients would be integers, given that we're working out integer results. Using a fraction would seem really stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to use complex numbers.-- Computer Science Lecturer -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] EXPLAIN understanding? (restarted from Censorship)
Please ignore - Matthew has discovered what was blocking this message. Use his thread instead. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using index for bitwise operations?
Shaul Dar wrote: Hi, I have at column that is a bit array of 16, each bit specifying if a certain property, out of 16, is present or not. Our typical query select 300 random rows (could be located in different blocks) from the table based on another column+index, and then filters them down to ~50 based on this the bit field. [snip] W/o an index this might be overly expensive, even as a filter (on selected 300 rows). Have you _tried_ just not having an index at all? Since you are only accessing a relatively small number of rows to start with, even an infinitely efficient index isn't going to make that much difference. Combine that with the fact that you're going to have the indexes competing with the table for cache space and I'd see how much difference it makes just not having it. Failing that, perhaps have an index on a single bit if there is one you always/mostly check against. The relational way to do this would be one or more property tables joined to your main table. If the majority of your properties are not set then this could be faster too. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi Chris, the only problem I see here is it's 2 times slower vs InnoDB How do you know? This isn't just based on the explain values reported, is it? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi Richard, no, of course it's not based on explain :-) I've run several tests before and now going in depth to understand if there is nothing wrong. Due such a single query time difference InnoDB is doing 2-3 times better TPS level comparing to PostgreSQL.. And you are satisfied that it is the planned query time that is the dominant factor here, and not parsing time, connection time, data transport, disk bandwidth etc? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: I'll try to answer all mails at once :-)) - query is running fully in RAM, no I/O, no network, only CPU time - looping 100 times the same query gives 132ms total time (~1.32ms per query), while it's 44ms on InnoDB (~0.44ms per query) Well, assuming you're happy that PG is tuned reasonably for your machine and that MySQL's query cache isn't returning the results here it looks like MySQL is faster for this particular query. The only obvious place there could be a big gain is with the hashing algorithm. If you remove the ORDER BY and the query-time doesn't fall by much then it's the hash phase. The other thing to try is to alter the query to be a SELECT count(*) rather than returning rows - that will let you measure the time to transfer the result rows. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
da...@lang.hm wrote: On Tue, 21 Apr 2009, Stephen Frost wrote: * da...@lang.hm (da...@lang.hm) wrote: while I fully understand the 'benchmark your situation' need, this isn't that simple. It really is. You know your application, you know it's primary use cases, and probably have some data to play with. You're certainly in a much better situation to at least *try* and benchmark it than we are. rsyslog is a syslog server. it replaces (or for debian and fedora, has replaced) your standard syslog daemon. it recieves log messages from every app on your system (and possibly others), filters, maniulates them, and then stores them somewhere. among the places that it can store the logs are database servers (native support for MySQL, PostgreSQL, and Oracle. plus libdbi for others) Well, from a performance standpoint the obvious things to do are: 1. Keep a connection open, do NOT reconnect for each log-statement 2. Batch log statements together where possible 3. Use prepared statements 4. Partition the tables by day/week/month/year (configurable I suppose) The first two are vital, the third takes you a step further. The fourth is a long-term admin thing. And possibly 5. Have two connections, one for fatal/error etc and one for info/debug level log statements (configurable split?). Then you can use the synchronous_commit setting on the less important ones. Might buy you some performance on a busy system. http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS other apps then search and report on the data after it is stored. what apps?, I don't know either. pick your favorite reporting tool and you'll be a step ahead of me (I don't know a really good reporting tool) as for sample data, you have syslog messages, just like I do. so you have the same access to data that I have. how would you want to query them? how would people far less experianced that you want to query them? I can speculate that some people would do two columns (time, everything else), others will do three (time, server, everything else), and others will go further (I know some who would like to extract IP addresses embedded in a message into their own column). some people will index on the time and host, others will want to do full-text searches of everything. Well, assuming it looks much like traditional syslog, I would do something like: (timestamp, host, facility, priority, message). It's easy enough to stitch back together if people want that. PostgreSQL's full-text indexing is quite well suited to logfiles I'd have thought, since it knows about filenames, urls etc already. If you want to get fancy, add a msg_type column and one subsidiary table for each msg_type. So - you might have smtp_connect_from (hostname, ip_addr). A set of perl regexps can match and extract the fields for these extra tables, or you could do it with triggers inside the database. I think it makes sense to do it in the application. Easier for users to contribute new patterns/extractions. Meanwhile, the core table is untouched so you don't *need* to know about these extra tables. If you have subsidiary tables, you'll want to partition those too and perhaps stick them in their own schema (logs200901, logs200902 etc). -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
Anne Rosset wrote: EXPLAIN ANALYZE SELECT audit_change.id AS id, audit_change.audit_entry_id AS auditEntryId, audit_entry.object_id AS objectId, audit_change.property_name AS propertyName, audit_change.property_type AS propertyType, audit_change.old_value AS oldValue, audit_change.new_value AS newValue, audit_change.flexfield AS flexField FROM audit_entry audit_entry, audit_change audit_change WHERE audit_change.audit_entry_id = audit_entry.id AND audit_entry.object_id = 'artf414029'; [query reformatted to make it more readable] Not quite clear why you are aliasing the tables to their own names... - Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual time=4612.674..6683.158 rows=4 loops=1) Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) - Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852 width=123) (actual time=0.009..2838.216 rows=7584852 loops=1) - Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049 rows=4 loops=1) - Index Scan using audit_entry_object on audit_entry (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1) Index Cond: ((object_id)::text = 'artf414029'::text) Total runtime: 6683.220 ms Very odd. It knows the table is large and that the seq-scan is going to be expensive. Try issuing set enable_seqscan = off and run the explain analyse again. That should show the cost of using the indexes. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIMIT confuses the planner
Kouber Saparev wrote: db=# EXPLAIN ANALYZE SELECT * FROM login_attempt WHERE username='kouber' ORDER BY login_attempt_sid DESC; QUERY PLAN -- Sort (cost=1415.15..1434.93 rows=7914 width=38) (actual time=0.103..0.104 rows=2 loops=1) Sort Key: login_attempt_sid Sort Method: quicksort Memory: 25kB - Index Scan using login_attempt_username_idx on login_attempt (cost=0.00..902.71 rows=7914 width=38) (actual time=0.090..0.091 rows=2 loops=1) Index Cond: ((username)::text = 'kouber'::text) Total runtime: 0.140 ms It's expecting 7914 rows returned and is getting only 2. That is probably the root of the problem. However when I add a LIMIT clause to the same query the planner no longer uses the right index, hence the query becomes very slow: db=# EXPLAIN ANALYZE SELECT * FROM login_attempt WHERE username='kouber' ORDER BY login_attempt_sid DESC LIMIT 20; Since it's expecting 7914 rows for kouber it thinks it will find the 20 rows you want fairly quickly by just looking backward through the login_attempt_pkey index. Try increasing the stats on the username column. ALTER TABLE login_attempt ALTER COLUMN username SET STATISTICS 100; ANALYZE login_attempt; You can try different values of statistics up to 1000, but there's no point in setting it too high. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] col1 ILIKE 'foo%' not behaving the same as lower(col1) LIKE 'foo%'
milos d wrote: Hello, I have a table 'foo_bar' with a column 'col1' defined as 'col1 varchar(512)'. This column is indexed using an expression index defined as CREATE INDEX ix_foo_bar_by_col1 ON foo_bar(lower(col1) col1 varchar_pattern_ops) The problem is when I try matching using ILIKE, (col1 ILIKE 'foo%') PostgreSQL does not use an index scan but a Seq scan of the whole table, but when I try (lower(col1) LIKE 'foo%') PostgreSQL uses an index scan. Why should it use the index? They're not even equivalent queries: SELECT ... WHERE lower(col1) LIKE 'FOO%' SELECT ... WHERE col1 ILIKE 'FOO%' One is guaranteed to return no rows, the other not. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Can't locate Test/Parser/Dbt2.pm in DBT2 tests
Rohan Pethkar wrote: Hi All, I am conductingDBT2 tests on PostgreSQL. After completing the test while analyzing and creating the results I am getting following error: ./dbt2-run-workload: line 514: 731 Terminated dbt2-client ${CLIENT_COMMAND_ARGS} -p ${PORT} -o ${CDIR} ${CLIENT_OUTPUT_DIR}/`hostname`/client-${SEG}.out 21 waiting for server to shut down done server stopped Can't locate Test/Parser/Dbt2.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/rohan/NEW_DBT2/Installer/DBT2_SETUP/bin/dbt2-post-process line 13. Well, if Test::Parser::Dbt2 isn't in somewhere in that list of directories, you'll need to tell perl where to look. Simplest is probably just to: export PERL5LIB=/path/to/extra/libs before running your tests. Can't exec gnuplot: No such file or directory at /home/rohan/NEW_DBT2/Installer/DBT2_SETUP/bin/dbt2-pgsql-analyze-stats line 113. It also looks like you're missing gnuplot for your charts. I ma not sure why it doesn't find Test/Parser/Dbt2.pm even if I have installed DBT2 completely. Did I miss any steps? Do I need to install some extra packages? If any then please let me know. You can always perldoc perlrun for more info (google it if you don't have docs installed locally). -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
Phoenix Kiula wrote: [Ppsted similar note to PG General but I suppose it's more appropriate in this list. Apologies for cross-posting.] Hi. Further to my bafflement with the count(*) queries as described in this thread: http://archives.postgresql.org/pgsql-general/2009-01/msg00804.php It seems that whenever this question has come up, Postgresql comes up very short in terms of count(*) functions. Sorry - I'm confused. That thread doesn't seem to contain a slow count(*) query. You seem to be saying you're having problems with the query taking 10-15 seconds, but the example takes less then half a second. How have you identified the count() as being the problem here? The performance is always slow, because of the planner's need to guess and such. I don't fully understand how the statistics work (and the explanation on the PG website is way too geeky) but he columns I work with already have a stat level of 100. Not helping at all. But your own email says it's slow sometimes: My queries are fast in general *except* the first time I'm not sure how the planner comes into this. We are now considering a web based logging functionality for users of our website. This means the table could be heavily INSERTed into. We get about 10 million hits a day, and I'm guessing that we will have to keep this data around for a while. My question: with that kind of volume and the underlying aggregation functions (by product id, dates, possibly IP addresses or at least countries of origin..) will PG ever be a good choice? A good choice compared to what? Or should I be looking at some other kind of tools? I wonder if OLAP tools would be overkill for something that needs to look like a barebones version of google analytics limited to our site.. Typically you'd summarise the data by hour/day via triggers / a scheduled script if you weren't going towards a pre-packaged OLAP toolkit. Otherwise you're going to have to scan the hundreds of millions of rows you've accumulated. Appreciate any thoughts. If possible I would prefer to tone down any requests for MySQL and such! I'm not sure MySQL is going to help you here - if you were running lots of small, simple queries it might make sense. If you want to aggregate data by varying criteria I don't think there is any sensible optimisation (other than pre-calculating summaries). -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sort causes system to freeze
Don't reply to another message when starting a new thread. People will miss your message. Craig James wrote: Maybe this is an obviously dumb thing to do, but it looked reasonable to me. Looks reasonable here too - except I'm not sure what I'd do with 2 million rows of sorted table in my console. I'm guessing you're piping the output into something. The problem is, the seemingly simple sort below causes a fairly powerful computer to completely freeze for 5-10 minutes. During the sort, you can't login, you can't use any shell sessions you already have open, the Apache server barely works, and even if you do nice -20 top before you start the sort, the top(1) command comes to a halt while the sort is proceeding! As nearly as I can tell, the sort operation is causing a swap storm of some sort -- nothing else in my many years of UNIX/Linux experience can cause a nice -20 process to freeze. Nothing should cause that to your machine. I've never seen top just freeze unless you set up some sort of fork-bomb and ramp the load up so fast it can't cope. Oh, and nice-ing the client isn't going to do anything to the backend actually doing the sorting. The sort operation never finishes -- it's always killed by the system. Once it dies, everything returns to normal. You're running out of memory then. It'll be the out-of-memory killer (assuming you're on Linux). This is 8.3.0. (Yes, I'll upgrade soon.) Make soon more urgent than it has been up to now - no point in risking all your data to some already fixed bug is there? Unless you've been carefully tracking the release notes and have established that there's no need in your precise scenario. Is this a known bug, or do I have to rewrite this query somehow? Maybe add indexes to all four columns being sorted? Indexes won't necessarily help if you're sorting the whole table. Maybe if you had one on all four columns. = explain select * from plus order by supplier_id, compound_id, units, price; max_connections = 1000 shared_buffers = 2000MB work_mem = 256MB So can you support (1000 * 256 * 2) + 2000 MB of RAM? effective_cache_size = 4GB ...while leaving 4GB free for disk caching? Machine: Dell, 8x64-bit CPUs, 8GB ram, Perc6i battery-backed RAID controller, 8 disks as RAID10 It appears not. Remember that work_mem is not only per-connection, a single query can use multiples of it (hence the *2 above). If you genuinely have a lot of connections I'd drop it down to (say) 4MB to make sure you don't swap on a regular basis (should probably be even lower to be truly safe). Then, for the odd case when you need a large value, issue a SET work_mem before the query. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Andrus wrote: So patter index is 10 .. 20 times (!) faster always. No idea why. Because you don't have a normal index on the product_id column? You can't use xxx_pattern_ops indexes for non-pattern tests. Test data creation script: The only change to the script was the obvious char(nn) = varchar(nn) and I didn't use TEMP tables (so I could see what I was doing). Then, I created the standard index on order_products.product_id. EXPLAIN ANALYSE from my slow dev box are listed below. Database is in LATIN9 encoding with locale=C. QUERY PLAN Aggregate (cost=2993.69..2993.70 rows=1 width=0) (actual time=2.960..2.960 rows=1 loops=1) - Nested Loop (cost=10.81..2993.23 rows=182 width=0) (actual time=0.972..2.901 rows=189 loops=1) - Index Scan using products_pkey on products (cost=0.00..8.27 rows=1 width=18) (actual time=0.017..0.019 rows=1 loops=1) Index Cond: ((product_id)::text = '3370'::text) - Nested Loop (cost=10.81..2983.14 rows=182 width=18) (actual time=0.951..2.785 rows=189 loops=1) - Bitmap Heap Scan on orders_products (cost=10.81..942.50 rows=251 width=22) (actual time=0.296..0.771 rows=261 loops=1) Recheck Cond: ((product_id)::text = '3370'::text) - Bitmap Index Scan on order_product_pattern_eq_idx (cost=0.00..10.75 rows=251 width=0) (actual time=0.230..0.230 rows=261 loops=1) Index Cond: ((product_id)::text = '3370'::text) - Index Scan using orders_pkey on orders (cost=0.00..8.12 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=261) Index Cond: (orders.order_id = orders_products.order_id) Filter: (orders.order_date '2006-01-01'::date) Total runtime: 3.051 ms (13 rows) QUERY PLAN --- Aggregate (cost=25.56..25.57 rows=1 width=0) (actual time=8.244..8.245 rows=1 loops=1) - Nested Loop (cost=0.00..25.55 rows=1 width=0) (actual time=1.170..8.119 rows=378 loops=1) - Nested Loop (cost=0.00..17.17 rows=1 width=4) (actual time=0.043..4.167 rows=522 loops=1) - Index Scan using order_product_pattern_eq_idx on orders_products (cost=0.00..8.88 rows=1 width=22) (actual time=0.029..1.247 rows=522 loops=1) Index Cond: (((product_id)::text = '3370'::text) AND ((product_id)::text '3371'::text)) Filter: ((product_id)::text ~~ '3370%'::text) - Index Scan using products_pkey on products (cost=0.00..8.27 rows=1 width=18) (actual time=0.004..0.004 rows=1 loops=522) Index Cond: ((products.product_id)::text = (orders_products.product_id)::text) - Index Scan using orders_pkey on orders (cost=0.00..8.37 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=522) Index Cond: (orders.order_id = orders_products.order_id) Filter: (orders.order_date '2006-01-01'::date) Total runtime: 8.335 ms (12 rows) -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Andrus wrote: Richard and Mario, You can't use xxx_pattern_ops indexes for non-pattern tests. I missed regular index. Sorry for that. Now issue with testcase is solved. Thank you very much. I researched issue in live 8.1.4 db a bit more. Performed vacuum and whole db reindex. Tried several times to run two same pattern queries in quiet db. And the results were? additonal condition One problem at a time. Let's get the pattern-matching speed problems on your live server sorted, then we can look at different queries. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Andrus wrote: Richard, And the results were? Results are provided in bottom of the message to which you replied. No - the explains there were contrasting a date test BETWEEN versus =. One problem at a time. Let's get the pattern-matching speed problems on your live server sorted, then we can look at different queries. First message in this thread described the issue with query having additional condition AND dok.kuupaev BETWEEN '2008-11-21' AND '2008-11-21' Ah, I think I understand. The test case was *missing* this clause. It seems that this problem occurs when pattern matching and BETWEEN conditions are used in same query. According to Scott Garey great recommendation I added ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000; This fixes testcase in live server, see my other message. Is it OK to run ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000 in prod database or should I try to decrease 1000 to smaller value ? rid is big increasing table and is changed frequently, mostly by adding rows. This will try to track the 1000 most-common values of toode, whereas the default is to try to track the most common 10 values. Tracking more values means the planner has more accurate information but makes ANALYSE take longer to run, and also makes planning each query take slightly longer. Try 100, 200, 500 and see if they work *for a range of queries* - there's no point in having it much higher than it needs to be. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Andrus wrote: Richard, Results are provided in bottom of the message to which you replied. No - the explains there were contrasting a date test BETWEEN versus =. I changed rid.toode statitics target to 100: ALTER TABLE firma2.rid ALTER COLUMN toode SET STATISTICS 100; analyze firma2.rid; Analyze takes 3 seconds and testcase rans fast. I'm planning to monitor results by looking log file for queries which take longer than 10 seconds. Sensible. I don't know if 10 seconds is the right value for your database, but there will be a point that filters out most of your traffic but still gives enough to find problems. Do you still need results ? If yes, which query and how many times should I run? If changing the statistics seems to help, you're not going to want to go back just to repeat tests. Ah, I think I understand. The test case was *missing* this clause. I added this clause to testcase. Also added char(70) colums containing padding characters to all three tables. Cannot still reproduce this issue in testcase in fast devel 8.3 notebook. In testcase order_products contains product_id values in a very regular order, maybe this affects the results. No idea how to use random() to generate random products for every order. Ideally you don't even want random products. You want a distribution of products that matches the same shape as you have in your production database. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Andrus wrote: Both queries return same result (19) and return same data. Pattern query is a much slower (93 sec) than equality check (13 sec). How to fix this ? Using 8.1.4, utf-8 encoding, et-EE locale They're different queries. The fact that they return the same results is a coincidence. This WHERE rid.toode = '9910' Is a different condition to this WHERE rid.toode like '9910%' You aren't going to get the same plans. Anyway, I think the problem is in the dok JOIN rid bit look: Aggregate (cost=43.09..43.10 rows=1 width=0) (actual time=12674.675..12674.679 rows=1 loops=1) - Nested Loop (cost=29.57..43.08 rows=1 width=0) (actual time=2002.045..12673.645 rows=19 loops=1) - Nested Loop (cost=29.57..37.06 rows=1 width=24) (actual time=2001.922..12672.344 rows=19 loops=1) Aggregate (cost=15.52..15.53 rows=1 width=0) (actual time=92966.501..92966.505 rows=1 loops=1) - Nested Loop (cost=0.00..15.52 rows=1 width=0) (actual time=24082.032..92966.366 rows=19 loops=1) - Nested Loop (cost=0.00..9.50 rows=1 width=24) (actual time=24081.919..92965.116 rows=19 loops=1) These are the same but the times are different. I'd be very surprised if you can reproduce these times reliably. Can I give you some wider-ranging suggestions Andrus? 1. Fix the vacuuming issue in your hash-join question. 2. Monitor the system to make sure you know if/when disk activity is high. 3. *Then* start to profile individual queries and look into their plans. Change the queries one at a time and monitor again. Otherwise, it's very difficult to figure out whether changes you make are effective. HTH -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Andrus wrote: - what's the size of the dataset relative to the RAM ? Db size is 7417 MB relevant table sizes in desc by size order: 140595 dok 2345 MB 2 1214 pg_shdepend 2259 MB 6 1232 pg_shdepend_depender_index 795 MB 7 1233 pg_shdepend_reference_index 438 MB These three are highly suspicious. They track dependencies between system object (so you can't drop function F because trigger T depends on it). http://www.postgresql.org/docs/8.3/static/catalog-pg-shdepend.html You've got 3.5GB of data there, which is a *lot* of dependencies. Try SELECT count(*) FROM pg_shdepend. If it's not a million rows, then the table is bloated. Try (as postgres or some other db superuser) vacuum full pg_shdepend and a reindex pg_shdepend. If it is a million rows, you'll need to find out why. Do you have a lot of temporary tables that aren't being dropped or something similar? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Andrus wrote: Query below seems to use indexes everywhere in most optimal way. dokumnr column is of type int Speed of this query varies rapidly: In live db fastest response I have got is 8 seconds. Re-running same query after 10 seconds may take 60 seconds. Re-running it again after 10 seconds may take 114 seconds. Any idea how to speed it up ? Is it possible to optimize it, will upgrading to 8.3.5 help or should I require to add more RAM, disk or CPU speed ? At a quick glance, the plans look the same to me. The overall costs are certainly identical. That means whatever is affecting the query times it isn't the query plan. Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=62164.496..62164.500 rows=1 loops=1) Total runtime: 62164.789 ms Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=40185.499..40185.503 rows=1 loops=1) Total runtime: 40186.102 ms Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=29650.398..29650.402 rows=1 loops=1) Total runtime: 29650.696 ms Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=11131.392..11131.396 rows=1 loops=1) Total runtime: 11131.694 ms So - what other activity is happening on this machine? Either other queries are taking up noticeable resources, or some other process is (it might be disk activity from checkpointing, logging some other application). -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Andrus wrote: Richard, At a quick glance, the plans look the same to me. The overall costs are certainly identical. That means whatever is affecting the query times it isn't the query plan. So - what other activity is happening on this machine? Either other queries are taking up noticeable resources, or some other process is (it might be disk activity from checkpointing, logging some other application). Thank you. This is dedicated server running only PostgreSql which serves approx 6 point of sales at this time. Maybe those other clients make queries which invalidate lot of data loaded into server cache. In next time server must read it again from disk which causes those perfomance differences. In addition to top below, you'll probably find vmstat 5 useful. top output is currently: top - 13:13:10 up 22 days, 18:25, 1 user, load average: 0.19, 0.12, 0.19 Tasks: 53 total, 2 running, 51 sleeping, 0 stopped, 0 zombie Cpu(s): 13.7% us, 2.0% sy, 0.0% ni, 78.3% id, 6.0% wa, 0.0% hi, 0.0% si Mem: 2075828k total, 2022808k used,53020k free,0k buffers Swap: 3911816k total, 88k used, 3911728k free, 1908536k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5382 postgres 15 0 144m 43m 40m S 15.0 2.2 0:00.45 postmaster 5358 postgres 15 0 152m 87m 75m S 0.3 4.3 0:00.97 postmaster 1 root 16 0 1480 508 444 S 0.0 0.0 0:01.35 init Looks pretty quiet. in few seconds later: top - 13:14:01 up 22 days, 18:26, 1 user, load average: 1.72, 0.53, 0.32 Tasks: 52 total, 2 running, 50 sleeping, 0 stopped, 0 zombie Cpu(s): 5.3% us, 3.0% sy, 0.0% ni, 0.0% id, 91.0% wa, 0.0% hi, 0.7% si Mem: 2075828k total, 2022692k used,53136k free,0k buffers Swap: 3911816k total, 88k used, 3911728k free, 1905028k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1179 postgres 18 0 155m 136m 122m D 6.7 6.7 1:32.52 postmaster 4748 postgres 15 0 145m 126m 122m D 1.3 6.2 0:14.38 postmaster 5358 postgres 16 0 160m 98m 81m D 0.7 4.9 0:01.21 postmaster 1 root 16 0 1480 508 444 S 0.0 0.0 0:01.35 init Here you're stuck waiting for disks (91.0% wa). Check out vmstat and iostat to see what's happening. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Difference in query plan
Patrice Beliveau wrote: I have a database in a production server (8.1.9) with to schema containing the sames table same index, same every thing, but with different data. When I execute a query in one schema, it take much more time to execute then the other schema. [snip] I'm wondering where to start searching to fix this problem Production server schema 1 query plan: Nested Loop (cost=569.23..634.43 rows=1 width=121) (actual time=1032.811..1032.811 rows=0 loops=1) [snip] Total runtime: 1034.204 ms Production server schema 2 query plan: Nested Loop (cost=133.42..793.12 rows=1 width=123) (actual time=0.130..0.130 rows=0 loops=1) [snip] Total runtime: 0.305 ms Well there's something strange - the estimated costs are fairly similar (643.43 vs 793.12) but the times are clearly very different (1034 vs 0.3ms) The suspicious line from the first plan is: - Seq Scan on mrp m (cost=0.00..119.92 rows=5892 width=39) (actual time=0.343..939.462 rows=5892 loops=1) This is taking up almost all the time in the query and yet only seems to be scanning 5892 rows. Run a vacuum verbose against table mrp and see if it's got a lot of dead rows. If it has, run VACUUM FULL and REINDEX against it and see if that solves your problem. I'm guessing you have / had a long-running transaction interfering with vacuum on this table, or perhaps a bulk update/delete? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Difference in query plan
Patrice Beliveau wrote: Thanks, I'm already doing a vacuum full every night on all database, but the REINDEX fix it and now it's working fine Are you sure it was the REINDEX? The plan was using a sequential scan. But this raise a question 1) This table is cleared every night and recomputed, does this mean that I should REINDEX every night also Looks like you should. Or drop the indexes, load the data, re-create the indexes, that can be quicker. 2) Why this thing didn't happen in the other schema Have you re-loaded schema1 more often? It might even be the particular order that rows are loaded - a btree can become unbalanced sometimes. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow full table update
[EMAIL PROTECTED] wrote: Hi, I've changed settings, but with no effect on speed. I try explain query with this result for 10.000 rows update songs set views = 0 where sid 2 and sid 3 Bitmap Heap Scan on songs (cost=151.59..6814.29 rows=8931 width=526) (actual time=4.848..167.855 rows=8945 loops=1) This query says t is taking 167 milli-seconds, not 10 minutes as your first message said. Is this query actually slow? Recheck Cond: ((sid 2) AND (sid 3)) - Bitmap Index Scan on pk_songs2 (cost=0.00..151.59 rows=8931 width=0) (actual time=4.071..4.071 rows=9579 loops=1) Index Cond: ((sid 2) AND (sid 3)) Is there a way to run this query on sigle throughpass with no Recheck Cond? Only a sequential scan. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oddity with view
Jim 'Decibel!' Nasby wrote: On Nov 10, 2008, at 7:06 AM, Tom Lane wrote: Jim 'Decibel!' Nasby [EMAIL PROTECTED] writes: loan_tasks effectively does SELECT * FROM loan_tasks_committed UNION ALL SELECT * FROM loan_tasks_pending;. You seem to have neglected to mention a join or two. Yeah, though I did show them at the end of the message... SELECT true AS committed, loan_tasks_committed.id, ..., loan_tasks_committed.task_amount FROM loan_tasks_committed UNION ALL SELECT false AS committed, ltp.id, ..., NULL::unknown AS task_amount FROM loan_tasks_pending ltp JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id; Thing is, there's no data to be had on that side. All of the time is going into the seqscan of loan_tasks_committed. But here's what's really disturbing... - Seq Scan on loan_tasks_committed (cost=0.00..929345.35 rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689 loops=1) - Seq Scan on loan_tasks_committed (cost=0.00..929345.35 rows=26112135 width=162) (actual time=0.014..22531.902 rows=26115689 loops=1) It's the width - the view is fetching all the rows. Is the true as committed bit confusing it? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] server space increasing very fast but transaction are very low
1. Don't email people directly to start a new thread (unless you have a support contract with them of course). 2. Not much point in sending to two mailing lists simultaneously. You'll just split your responses. brahma tiwari wrote: Hi all My database server db01 is on linux environment and size of base folder increasing very fast unexpectedly(creating renamed files of 1 GB in base folder like 1667234568.10) details as below These are the files containing your tables / indexes. When a file gets larger than 1GB the file gets split and you get .1, .2 etc on the end) what is significance of these files and how can i avoid it.can i delete these renamed files from base folder or any thing else. Please help NEVER delete any files in .../data/base. Since these files all seem to have the same number they are all the same object (table or index). You can see which by looking in pg_class. You'll want to use the number 1662209326 of course. = SELECT relname,relpages,reltuples,relfilenode FROM pg_class WHERE relfilenode=2336591; relname | relpages | reltuples | relfilenode -+--+---+- outputs |3 | 220 | 2336591 (1 row) This is the table outputs on mine which occupies 3 pages on disk and has about 220 rows. You can find out the reverse (size of any table by name) with some useful functions: select pg_size_pretty(pg_total_relation_size('my_table_name')); I'm guessing what you've got is a table that's not being vacuumed because you've had a transaction that's been open for weeks. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL OR performance
Віталій Тимчишин wrote: As you can see from other plans, it do have all the indexes to perform it's work fast (when given part by part). It simply do not wish to use them. My question: Is this a configuration problem or postgresql optimizer simply can't do such a query rewrite? I must admit, I haven't managed to figure out what your query is trying to do, but then that's a common problem with autogenerated queries. The main question that needs answering is why the planner thinks you're going to get 1.3 billion rows in the or query: Nested Loop (cost=4588.13..960900482668.95 rows=1386158171 width=32) You don't show explain analyse for this query, so there's no way of knowing how many rows get returned but presumably you're expecting around 88000. What does explain analyse return? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7.4 - basic tuning question
Simon Waters wrote: The best advice is to upgrade at your earliest convenience with performance questions and 7.4 - you're missing a *lot* of improvements. You say you're planning to anyway, and I'd recommend putting effort into the upgrade rather than waste effort on tuning a system you're leaving. I assume that the histogram_bounds for strings are alphabetical in order, so that DEMOSTART falls between DELETE and IDEMAIL. Even on a worst case of including both these common values, the planner ought to have assumed that less than 10% of records were likely covered by the value selected, so it seems unlikely to me that not using the index would be a good idea. Well, the real question is how many blocks need to be read to find those DEMOSTART rows. At some point around 5-10% of the table it's easier just to read the whole table than go back and fore between index and table. The precise point will depend on how much RAM you have, disk speeds etc. = SELECT COUNT(*) FROM log WHERE event='DEMOSTART'; (...lots of time passes...) count --- 1432 (1 row) OK, not many. The crucial bit is below though. These are the 10 values it will hold stats on, and all it knows is that DEMOSTART has less than 57000 entries. OK, it's more complicated than that, but basically there are values it tracks and everything else. So - it assumes that all other values have the same chance of occuring. = SELECT COUNT(*), event FROM log GROUP BY event ORDER BY count; count | event +--- [snip] 57022 | NEWUSR 64907 | PUBREC0 65449 | UNPUBLISH 92843 | LOGOUT 99018 | KILLSESS 128900 | UPLOAD 134994 | LOGIN 137608 | NEWPAGE 447556 | PUBREC1 489572 | PUBLISH Which is why it guesses 20436 rows below. If you'd done SET enable_seqscan = off then run the explain again it should have estimated a cost for the index that was more than 54317.14 = EXPLAIN SELECT * FROM log WHERE event='DEMOSTART'; QUERY PLAN Seq Scan on log (cost=0.00..54317.14 rows=20436 width=93) Filter: (event = 'DEMOSTART'::text) (2 rows) = ALTER TABLE log ALTER COLUMN events SET STATISTICS 50; ANALYSE LOG(event); ALTER TABLE ANALYZE = EXPLAIN SELECT COUNT(*) FROM log WHERE event='DEMOSTART'; QUERY PLAN --- Aggregate (cost=5101.43..5101.43 rows=1 width=0) - Index Scan using log_event on log (cost=0.00..5098.15 rows=1310 width=0) Index Cond: (event = 'DEMOSTART'::text) (3 rows) Not bad - now it knows how many rows it will find, and it sees that the index is cheaper. It's not completely accurate - it uses a statistical sampling (and of course it's out of date as soon as you update the table). HTH -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bizarre query performance question
H. William Connors II wrote: fa_assignment has 44184945 records fa_assignment_detail has 82196027 records explain select * from fa_assignment fa JOIN fa_assignment_detail fad ON (fad.assignment_id = fa.assignment_id) where fa.scenario_id = 0; QUERY PLAN --- Hash Join (cost=581289.72..4940729.76 rows=9283104 width=91) Are you really expecting 9 million rows in the result? If so, this is probably a reasonable plan. Hash Cond: (fad.assignment_id = fa.assignment_id) - Seq Scan on fa_assignment_detail fad (cost=0.00..1748663.60 rows=82151360 width=61) - Hash (cost=484697.74..484697.74 rows=4995439 width=30) - Bitmap Heap Scan on fa_assignment fa (cost=93483.75..484697.74 rows=4995439 width=30) Recheck Cond: (scenario_id = 0) - Bitmap Index Scan on fa_assignment_idx2 (cost=0.00..92234.89 rows=4995439 width=0) Index Cond: (scenario_id = 0) It's restricting on scenario_id, building a bitmap to identify which disk-blocks will contain one or more matching rows and then scanning those. If those 5 million scenario_id=0 rows are spread over 10% of the blocks then that's a good idea. If it was expecting only a handful of rows with scenario_id=0 then I'd expect it to switch to a standard index scan. If your work_mem is small try something like: set work_mem = '50MB'; before running the query - maybe even larger. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dedicated server postgresql 8.1 conf tunning
[EMAIL PROTECTED] wrote: Thanks, Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable. That's why backports.org was invented :-) Or does can't mean not allowed to? So i'm going to play with work_mem shared_buffers. With big shared_buffers pgsql tells me shmget(cle=5432001, taille=11183431680, 03600). so i do echo 13183431680 /proc/sys/kernel/shmmax ( 10Go + 2Go just in case) but pgsql tells me again that it there's not enought shm.. How can i compute the go shmmax for my server ? I'm not seeing anything terribly wrong there. Are you hitting a limit with shmall? Oh - and I'm not sure there's much point in having more shared-buffers than you have data. Try much larger work_mem first, I think that's the biggest gain for you. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dedicated server postgresql 8.1 conf tunning
Tommy Gildseth wrote: Richard Huxton wrote: [EMAIL PROTECTED] wrote: Thanks, Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable. That's why backports.org was invented :-) Or does can't mean not allowed to? Well, running production servers from backports can be a risky proposition too, and can land you in situations like the one discussed in Debian packages for Postgres 8.2 from the General list. Well, there's a reason why stable is a popular choice for production servers. I must admit that I build from source for my PostgreSQL packages (because I care which version I run). I was reading one of the Perl fellows recommending the same. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dedicated server postgresql 8.1 conf tunning
[EMAIL PROTECTED] wrote: Hello I'm running pgsql 8.1.11 (from debian stable) on a server with 16GB RAM (Linux helios 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux). Unless you're committed to this version, I'd seriously look into 8.3 from backports (or compiled yourself). I'd expect some serious performance improvements for the workload you describe. I have a table tickets with 1 000 000 insert by month ( ~2600 each 2hours ) (for the moment 1300 rows for 5GB ) and i have to extract statistics ( number of calls, number of calls less than X seconds, number of news calles, number of calls from the new callers, ...) OK, so not a lot of updates, but big aggregation queries. You might want to pre-summarise older data as the system gets larger. 1°) The server will handle max 15 queries at a time. So this is my postgresql.conf max_connections = 15 Well, I'd allow 20 - just in case. shared_buffers = 995600 # ~1Go temp_buffers = 1000 work_mem = 512000 # ~512Ko I'd be tempted to increase work_mem by a lot, possibly even at the expense of shared_buffers. You're going to be summarising large amounts of data so the larger the better, particularly as your database is currently smaller than RAM. Start with 5MB then try 10MB, 20MB and see what difference it makes. maintenance_work_mem = 1048576 # 1Mo max_fsm_pages = 41522880 # ~40Mo max_fsm_relations = 8000 See what a vacuum full verbose says for how much free space you need to track. checkpoint_segments = 10 checkpoint_timeout = 3600 With your low rate of updates shouldn't matter. effective_cache_size = 13958643712 # 13Go Assuming that's based on what top or free say, that's fine. Don't forget it will need to be reduced if you increase work_mem or shared_buffers. stats_start_collector = on stats_command_string = on stats_block_level = on stats_row_level = on autovacuum = off Make sure you're vacuuming if autovacuum is off. How can i optimize the configuration? Looks reasonable, so far as you can tell from an email. Try playing with work_mem though. 2°) My queries look like SELECT tday AS n, COUNT(DISTINCT(a.appelant)) AS new_callers, COUNT(a.appelant) AS new_calls FROM cirpacks.tickets AS a WHERE LENGTH(a.appelant) 4 AND a.service_id IN ( 95, 224, 35, 18 ) AND a.exploitant_id = 66 AND a.tyear = 2008 AND a.tmonth = 08 Index on (tyear,tmonth) might pay off, or one on exploitant_id perhaps. AND EXISTS ( SELECT 1 FROM cirpacks.clients AS b WHERE b.appelant = a.appelant AND b.service_id IN ( 95, 224, 35, 18 ) AND b.heberge_id = 66 HAVING to_char(MIN(b.premier_appel), 'MMDD') = to_char(a.date, 'MMDD') ) It looks like you're comparing two dates by converting them to text. That's probably not the most efficient way of doing it. Might not be an issue here. GROUP BY n ORDER BY n; or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM cirpacks.tickets WHERE tyear = ... and tmonth = ... and tday = ... AND audiotel IN ( '...', '...' ); or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM cirpacks.tickets WHERE '2007-01-01' = date AND date = '2008-08-31' AND audiotel IN ( '...', '...' ); which indexes are the best ? The only way to find out is to test. You'll want to run EXPLAIN after adding each index to see what difference it makes. Then you'll want to see what impact this has on overall workload. Mostly though, I'd try out 8.3 and see if that buys you a free performance boost. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Restoration of datas
dforums wrote: COuld you advice me on which restoration method is the faster. To upgrade from postgresql 8.1.11 to 8.3.3. Using the pg_dump from your 8.3 package, dump the database using -Fc to get a nicely compressed dump. Then use pg_restore to restore it. If you add a --verbose flag then you will be able to track it. You might want to set fsync=off while doing the restore. This is safe since if the machine crashes during restore you just start again. Oh, and increase work_mem too - there's only going to be one process. What will take the most time is the creating of indexes etc. It will take a long time to do a full restore though - you've got 64GB of data and slow disks. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpectedly Long DELETE Wait
Volkan YAZICI wrote: Hi, Below command has been running since ~700 minutes in one of our PostgreSQL servers. DELETE FROM mugpsreglog WHERE NOT EXISTS (SELECT 1 FROM mueventlog WHERE mueventlog.eventlogid = mugpsreglog.eventlogid); Seq Scan on mugpsreglog (cost=0.00..57184031821394.73 rows=6590986 width=6) Filter: (NOT (subplan)) SubPlan - Seq Scan on mueventlog (cost=0.00..4338048.00 rows=1 width=0) Filter: (eventlogid = $0) Ouch - look at the estimated cost on that! And there isn't any constraints (FK/PK), triggers, indexes, etc. on any of the tables. (We're in the phase of a migration, many DELETE commands similar to above gets executed to relax constraints will be introduced.) Well there you go. Add an index on eventlogid for mugpsreglog. Alternatively, if you increased your work_mem that might help. Try SET work_mem='64MB' (or even higher) before running the explain and see if it tries a materialize. For situations like this where you're doing big one-off queries you can afford to increase resource limits. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Plz Heeeelp! performance settings
dforum wrote: Tx for your reply. You mean that RAID use fsync method for keeping data's copy. No, Merlin means PostgreSQL will issue a sync to force WAL to actual disk. So you invite me to desactivate fsync to increase the performance ? He means you might have to if you can't afford new hardware. Is disk activity the problem? Have you looked at the output of vmstat to check? Desactivating fsync. my second disk will not be uptodate, No - the RAID stuff is happening in the operating-system. so if the machine crash, I wont be able to get the server working quickly??? Not quickly, perhaps not at all. But if I use a second machine to replicate the database, I escape this problem isn't it ? You reduce the chance of a single failure causing disaster. If I understand right, could you tell me how to do desactivate fsync please ? There's an fsync = on setting in your postgresql.conf, but don't change it yet. I have a database of 38Go and take 6Go per week. What do you mean by take 6Go per week? You update/delete that much data? It's growing by that amount each week? I have a lot of update and insert, especially in 8 tables. 2 tables are using for temporary storage, so I right something like 15000 request per 2 minutes and empty it into 10 min. I'm not sure what 15000 request per 2 minutes and empty it into 10 min means. Do you have 7500 requests per minute? Are these updates? To the temporary storage? What is this temporary storage - an ordinary table? I'm making some update or select on tables including more than 20 millions of entrance. Again, I'm not sure what this means. Oh - *important* - which version of PostgreSQL are you running? Is an upgrade practical? Looking at your postgresql.conf settings: max_connections = 624 That's an odd number. Do you usually have that many connections? What are they doing? They can't all be active, the machine you've got wouldn't cope. shared_buffers = 25 work_mem = 9000 temp_buffers = 500 These three are important. The shared_buffers are workspace shared between all backends, and you've allocated about 2GB. You've also set work_mem=9MB, which is how much each backend can use for a single sort. That means it can use double or triple that in a complex query. If you're using temporary tables, then you'll want to make sure the temp_buffers setting is correct. I can't say whether these figures are good or bad without knowing how the database is being used. effective_cache_size = 625000 That's around 5GB - is that roughly the amount of memory used for caching (what does free -m say for buffers/cache)? max_prepared_transactions = 200 Do you use a lot of prepared transactions in two-phase commit? I'm guessing that you don't. I'm sure that it could be more optimised. I don't know any thing on WAL, autovacuum, fsm, bgwriter, kernel process, geqo or planner cost settings. If you run a vacuum verbose it will recommend fsm settings at the end of its output. I think you probably need to make your autovacuum more aggressive, but that's something you'll be able to tell by monitoring your database. It's quite likely that Merlin's right, and you need better hardware to cope with the number of updates you're making - that's something where you need fast disks. However, he's just guessing because you've not told us enough to tell where the problem really lies. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Plan choice with timestamps
Giorgio Valoti wrote: Hi, I have a timestamptz field that I want to use with a query, but I don’t need the full timestamp resolution, so I’ve created a day_trunc(timestamptz) immutable function which I’ll use with the query and with a new index: logs= create index test_idx on blackbox (day_trunc(ts)); However, the query plan doesn’t use the index: Does it use it ever? e.g. with SELECT * FROM blackbox WHERE day_trunk(ts) = '...' -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Plz Heeeelp! performance settings
dforums wrote: vmstat is giving : procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 2 1540 47388 41684 757897600 131 2590 1 9 3 82 7 This system is practically idle. Either you're not measuring it at a useful time, or there isn't a performance problem. But if I use a second machine to replicate the database, I escape this problem isn't it ? You reduce the chance of a single failure causing disaster. Not clear this reply. It's scare me If server A fails, you still have server B. If server A fails so that replication stops working and you don't notice, server B won't help any more. What do you mean by take 6Go per week? You update/delete that much data? It's growing by that amount each week? YES That wasn't a yes/no question. Please choose one of: Are you updating 6Go per week? Are you adding 6Go per week? I'm not sure what 15000 request per 2 minutes and empty it into 10 min means. I insert 15000 datas every 2 min and delete 15000 every 10 min in those tables Do you have 7500 requests per minute? should be that, But in fact I'm not treating the datas in real time, and I buffer the datas and push the data into the database every 2 min Are these updates? during the delete the data are aggregated in other tables which make updates OK, so every 2 minutes you run one big query that adds 15000 rows. Every 10 minutes you run one big query that deletes 15000 rows. To the temporary storage? What is this temporary storage - an ordinary table? Yes, I thied to use temporary tables but I never been able to connect this tables over 2 different session/connection, seems that is a functionnality of postgresql, or a misunderstanding from me. That's correct - temporary tables are private to a backend (connection). I'm making some update or select on tables including more than 20 millions of entrance. Again, I'm not sure what this means. To aggregate the data, I have to check the presence of others information that are stores in 2 tables which includes 24 millions of entrance. OK. I assume you're happy with the plans you are getting on these queries, since you've not provided any information about them. Oh - *important* - which version of PostgreSQL are you running? 8.1.11 Is an upgrade practical? We are working of trying to upgrade to 8.3.3, but we are not yet ready for such migration OK Looking at your postgresql.conf settings: max_connections = 624 That's an odd number. Now we could decrease this number, it's not so much usefull for now. we could decrease is to 350. I don't believe you've got 350 active connections either. It will be easier to help if you can provide some useful information. effective_cache_size = 625000 That's around 5GB - is that roughly the amount of memory used for caching (what does free -m say for buffers/cache)? total used free sharedbuffers cached Mem: 7984 7828156 0 38 7349 -/+ buffers/cache:440 7544 Swap: 509 1508 Not far off - free is showing 7349MB cached. You're not running 350 clients there though - you're only using 440MB of RAM. I don't see anything to show a performance problem from these emails. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Plz Heeeelp! performance settings
dforums wrote: The performance problem is really only on the insertion and even more on the treatment for the aggregation. To treat the 3000 entrances and to insert, or update the tables it needs 10 minutes. As I told you I inject 14000 query every 2 minutes, and it needs 10 minutes to treat 3000 of those query. Sorry - I still don't understand. What is this treatment you are doing? OK. I assume you're happy with the plans you are getting on these queries, since you've not provided any information about them. The plan seems ok as it use index as well. here is the plan : explain analyse SELECT insertUpdateTracks(137,2605, 852, ('2008-08-06 19:28:54'::text)::date,3,'dailydisplay',2,NULL); INFO: method 1 QUERY PLAN Result (cost=0.00..0.01 rows=1 width=0) (actual time=1.151..1.151 rows=1 loops=1) Total runtime: 1.160 ms There's nothing to do with an index here - this is a function call. Has you can see the runtime processs for an update in this table. multiplying this per 1, it is too long. So - are you calling this function 14000 times to inject your data? You're doing this in one transaction, yes? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Plz Heeeelp! performance settings
dforums wrote: The delete is global, the procedure is called for each line/tracks. So - are you calling this function 14000 times to inject your data? You're doing this in one transaction, yes? NO I have to make it 14000 times cause, I use some inserted information for other insert to make links between data. Why does that stop you putting all 14000 calls in one transaction? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nls sorting in Postgresql-8.3.3
Praveen wrote: Hello, I installed postgresql-8.3.3 in our local server with option --enable-nls . After successful installion , I create database and import data.But I am not aware how to use nls sort in this postgresql-8.3.3 . Please tell me syntax how to use nls sort in query , if some one know. What is nls sort? What do you expect --enable-nls to do? It looks like it's for multi-language message display rather than sorting. The locale options are already built-in. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
Miernik wrote: Might be worth turning off autovacuum and running a manual vacuum full overnight if your database is mostly reads. I run autovacum, and the database has a lot of updates all the time, also TRUNCATING tables and refilling them, usually one or two INSERTS/UPDATES per second. OK Finally, I think it's worth looking at pgpool or pgbouncer (as Alvaro said) and set them to allow only one connection in the pool. I know that pgbouncer offers per-transaction connection sharing which will make this more practical. Even so, it will help if your application can co-operate by closing the connection as soon as possible. I just installed pgpool2 and whoaaa! Everything its like about 3 times faster! My application are bash scripts using psql -c UPDATE Probably spending most of their time setting up a new connection, then clearing it down again. I plan to rewrite it in Python, not sure if it would improve performance, but will at least be a cleaner implementation. Careful of introducing any more overheads though. If libraries end up using another 2.5MB of RAM then that's 10% of your disk-cache gone. In /etc/pgpool.conf I used: # number of pre-forked child process num_init_children = 1 # Number of connection pools allowed for a child process max_pool = 1 Might need to increase that to 2 or 3. Wanted to install pgbouncer, but it is broken currently in Debian. And why is it in contrib and not in main (speaking of Debian location)? Not well known enough on the Debian side of the fence? It's simple enough to install from source though. Takes about one minute. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
Miernik wrote: Richard Huxton [EMAIL PROTECTED] wrote: I just installed pgpool2 and whoaaa! Everything its like about 3 times faster! My application are bash scripts using psql -c UPDATE Probably spending most of their time setting up a new connection, then clearing it down again. If I do it in Python it could do all queries in the same connection, so should be faster? Besides that 'psql' is written in perl, so its also heavy, by not using psql I get rid of perl library in RAM. Nope - C all through. Also the script uses wget to poll some external data sources a lot, also needlessly opening new connection to the webserver, so I want to make the script save the http connection, which means I must get rid of wget. Maybe I should write some parts in C? BTW, doesn't there exist any tool does what psql -c does, but is written in plain C, not perl? I was looking for such psql replacement, but couldn't find any Well ECPG lets you embed SQL directly in your C. # Number of connection pools allowed for a child process max_pool = 1 Might need to increase that to 2 or 3. Why? The website says: max_pool The maximum number of cached connections in pgpool-II children processes. pgpool-II reuses the cached connection if an incoming connection is connecting to the same database by the same username. But all my connections are to the same database and the same username, and I only ever want my application to do 1 connection to the database at a time, so why would I want/need 2 or 3 in max_pool? From the subject line of your question: how to fix problem then when two queries run at the same time... Of course if you don't actually want to run two simultaneous queries, then max_pool=1 is what you want. Not well known enough on the Debian side of the fence? It's simple enough to install from source though. Takes about one minute. But is there any advantage for me compared to pgpool2, which works really nice? Can't say. Given your limited RAM, it's probably worth looking at both and seeing which leaves you more memory. Your main concern has got to be to reduce wasted RAM. In some parts, like doing some count(*) stuff, it now does things in about one second, which took a few minutes to finish before (if the other part of the scripts where doing something else on the database at the same time). That will be because you're only running one query, I'd have thought. Two queries might be sending you into swap. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
Miernik wrote: Theo Kramer [EMAIL PROTECTED] wrote: file `which psql` /usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped [EMAIL PROTECTED]:~$ file `which psql` /usr/bin/psql: symbolic link to `../share/postgresql-common/pg_wrapper' [EMAIL PROTECTED]:~$ file /usr/share/postgresql-common/pg_wrapper /usr/share/postgresql-common/pg_wrapper: a /usr/bin/perl -w script text executable That's not psql though, that's Debian's wrapper around it which lets you install multiple versions of PostgreSQL on the same machine. Might be worth bypassing it and calling it directly. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Database size Vs performance degradation
Dave North wrote: Morning folks, Long time listener, first time poster. Hi Dave Postgres 8.1.8 shared_buffers = 2000 max_fsm_pages = 40 Redhat Enterprise 4 Running on HP DL380 w/ 4GB RAM, dual 10K HDDs in RAID 0+1 Also running on the server is a tomcat web server and other ancillaries The value of 2000 seems a bit low for shared_buffers perhaps. Oh, and 8.1.13 seems to be the latest bugfix for 8.1 too. Now, the problem. We have an application that continually writes a bunch of data to a few tables which is then deleted by a batch job each night. We're adding around 80,000 rows to one table per day and removing around 75,000 that are deemed to be unimportant. [snip] We had this problem around a month ago and again yesterday. Because the application needs reasonably high availability, we couldn't full vacuum so what we did was a dump and load to another system. What I found here was that after the load, the DB size was around 2.7GB - a decrease of 5GB. Re-loading this back onto the main system, and the world is good. Well, that's pretty much the definition of bloat. Are you sure you're vacuuming enough? I don't have an 8.1 to hand at the moment, but a vacuum verbose in 8.2+ gives some details at the end about how many free-space slots need to be tracked. Presumably you're not tracking enough of them, or your vacuuming isn't actually taking place. Check the size of your database every night. It will rise from 2.7GB, but it should stay roughly static (apart from whatever data you add of course). If you can keep it so that most of the working-set of your database fits in RAM speed will stay just fine. Yes, I know we need to upgrade to 8.3 but that's going to take some time :) I think you'll like some of the improvements, but it's probably more important to get 8.1.13 installed soon-ish. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Database size Vs performance degradation
Dave North wrote: -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Well, that's pretty much the definition of bloat. Are you sure you're vacuuming enough? DN: Well, the auto-vac is kicking off pretty darn frequently...around once every 2 minutes. However, you just made me think of the obvious - is it actually doing anything?! The app is pretty darn write intensive so I wonder if it's actually able to vacuum the tables? If you've got a big batch delete, it can't hurt to manually vacuum that table immediately afterwards. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Less rows - better performance?
Andreas Hartmann wrote: Dear PostgreSQL community, first some info about our application: - Online course directory for a University - Amount of data: complete dump is 27 MB - Semester is part of primary key in each table - Data for approx. 10 semesters stored in the DB - Read-only access from web application (JDBC) Our client has asked us if the performance of the application could be improved by moving the data from previous years to a separate archive application. If you had 27GB of data maybe, but you've only got 27MB - that's presumably all sitting in memory. What in particular is slow? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Less rows - better performance?
Andreas Hartmann wrote: Here's some info about the actual amount of data: SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database where pg_database.datname = 'vvz_live_1'; datname| size ---+- vvz_live_1| 2565 MB I wonder why the actual size is so much bigger than the data-only dump - is this because of index data etc.? I suspect Guillame is right and you've not been vacuuming. That or you've got a *LOT* of indexes. If the database is only 27MB dumped, I'd just dump/restore it. Since the database is read-only it might be worth running CLUSTER on the main tables if there's a sensible ordering for them. What in particular is slow? There's no particular bottleneck (at least that we're aware of). During the first couple of days after the beginning of the semester the application request processing tends to slow down due to the high load (many students assemble their schedule). The customer upgraded the hardware (which already helped a lot), but they asked us to find further approaches to performance optimiziation. 1. Cache sensibly at the application (I should have thought there's plenty of opportunity here). 2. Make sure you're using a connection pool and have sized it reasonably (try 4,8,16 see what loads you can support). 3. Use prepared statements where it makes sense. Not sure how you'll manage the interplay between this and connection pooling in JDBC. Not a Java man I'm afraid. If you're happy with the query plans you're looking to reduce overheads as much as possible during peak times. 4. Offload more of the processing to clients with some fancy ajax-ed interface. 5. Throw in a spare machine as an app server for the first week of term. Presumably your load is 100 times average at this time. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Posible planner improvement?
Albert Cervera Areny wrote: I've got a query similar to this: select * from t1, t2 where t1.id 158507 and t1.id = t2.id; That took 84 minutes (the query was a bit longer but this is the part that made the difference) after a little change the query took ~1 second: select * from t1, t2 where t1.id 158507 and t2.id 158507 and t1.id = t2.id; Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and we'll see why it's better at the second one. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Posible planner improvement?
Luke Lonergan wrote: The problem is that the implied join predicate is not being propagated. This is definitely a planner deficiency. IIRC only equality conditions are propagated and gt, lt, between aren't. I seem to remember that the argument given was that the cost of checking for the ability to propagate was too high for the frequency when it ocurred. Of course, what was true for code and machines of 5 years ago might not be so today. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big O notation for postgres?
Jonah H. Harris wrote: On Wed, May 21, 2008 at 10:10 AM, H. Hall [EMAIL PROTECTED] wrote: Does anyone know if there is a source that provides Big O notation for postgres's aggregate functions and operations? For example is count(*) = O(1) or O(n)? I don't know of any document containing the complexity of each aggregate, but it's sometimes left as a comment in the souce code. Recent max() and min() can be O(n) or O(1) depending on the where-clause and presence of an index too, just to muddy the waters. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space
kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes 10 hours and eventually runs out of disk space on a 1.4TB file system QUERY PLAN --- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) Dumb question Kevin, but are you really expecting 3.2 billion rows in the result-set? Because that's approaching 400GB of result-set without any overheads. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space
kevin kempter wrote: I'm expecting 9,961,914 rows returned. Each row in the big table should have a corresponding key in the smaller tale, I want to basically expand the big table column list by one, via adding the appropriate key from the smaller table for each row in the big table. It's not a cartesion product join. Didn't seem likely, to be honest. What happens if you try the query as a cursor, perhaps with an order-by on customer_id or something to encourage index use? Do you ever get a first row back? In fact, what happens if you slap an index over all your join columns on xsegment_dim? With 7,000 rows that should make it a cheap test. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space
kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) Something else is puzzling me with this - you're joining over four fields. from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; --- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) - Sort (cost=1570.35..1579.46 rows=3643 width=40) - Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) Here it's still expecting 320 matches against each row from the large table. That's ~ 10% of the small table (or that fraction of it that PG expects) which seems very high for four clauses ANDed together. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] db size
Adrian Moisey wrote: Hi Running VACUUM VERBOSE will give you a detailed view of space usage of each individual table. I did that. Not too sure what I'm looking for, can someone tell me what this means: INFO: blahxxx: scanned 27 of 27 pages, containing 1272 live rows and 0 dead rows; 1272 rows in sample, 1272 estimated total rows This is a small table that takes up 27 pages and it scanned all of them. You have 1272 rows in it and none of them are dead (i.e. deleted/updated but still taking up space). INFO: free space map contains 4667977 pages in 1199 relations DETAIL: A total of 4505344 page slots are in use (including overhead). 4505344 page slots are required to track all free space. Current limits are: 15537488 page slots, 1200 relations, using 91172 kB. You are tracking ~ 4.6 million pages and have space to track ~ 15.5 million, so that's fine. You are right up against your limit of relations (tables, indexes etc) being tracked though - 1200. You'll probably want to increase max_fsm_relations - see manual for details (server configuration / free space map). -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] db size
Adrian Moisey wrote: Hi INFO: blahxxx: scanned 27 of 27 pages, containing 1272 live rows and 0 dead rows; 1272 rows in sample, 1272 estimated total rows This is a small table that takes up 27 pages and it scanned all of them. You have 1272 rows in it and none of them are dead (i.e. deleted/updated but still taking up space). I had a look through a few other tables...: INFO: table1: scanned 22988 of 22988 pages, containing 2713446 live rows and 895662 dead rows; 45000 rows in sample, 2713446 estimate d total rows INFO: table2: scanned 24600 of 24600 pages, containing 270585 live rows and 65524 dead rows; 45000 rows in sample, 270585 estimated total rows Is that dead rows an issue? Should I try clean it out? Will it improve performance ? What you're hoping to see is that figure remain stable. The point of the free-space-map is to track these and allow the space to be re-used. If you find that the number of dead rows is increasing then either you are: 1. Just deleting rows 2. Not vacuuming enough - check your autovacuum settings The effect on performance is that when you read in a page from disk you're reading dead rows along with the data you are after. Trying to keep 0 dead rows in a constantly updated table isn't worth the effort though - you'd end up wasting your disk I/O on maintenance rather than queries. The figures above look high to me - 90,000 out of 270,000 and 65,000 out of 270,000. Of course, if these tables have just had bulk updates/deletes then that's fine. If there's a steady stream of updates though, you probably want to up your autovacuum settings. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance