Re: [PERFORM] Postgres not using all RAM (Huge Page activated on a 96GB RAM system)
postgres 20 0 24.918g 214300 12688 R 99.3 0.2 2865:59 > postgres > > > 9063 postgres 20 0 24.722g 14812 12956 S 0.3 0.0 0:07.36 > postgres > > > 9068 postgres 20 0 24.722g 6380 4232 S 0.3 0.0 0:02.15 > postgres > > > 9065 postgres 20 0 24.727g 10368 3516 S 0.0 0.0 0:04.24 > postgres > > > 9066 postgres 20 0 24.722g 4100 2248 S 0.0 0.0 0:06.04 > postgres > > > 9067 postgres 20 0 24.722g 4100 2248 S 0.0 0.0 0:01.37 > postgres > > > 9069 postgres 20 0 161740 4596 2312 S 0.0 0.0 0:04.48 > postgres > > What’s wrong with this? There isn’t something wrong in RAM usage? > > Thank you all >Pietro > -- Andrew W. Kerber 'If at first you dont succeed, dont take up skydiving.'
Re: [PERFORM] jsonb_agg performance
On 01/29/2016 05:06 PM, jflem...@kispring.com wrote: The jsonb_agg function seems to have significantly worse performance than its json_agg counterpart: => explain analyze select pa.product_id, jsonb_agg(attributes) from product_attributes2 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1127.54..1231.62 rows=3046 width=380) (actual time=28.632..241.647 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1127.54..1149.54 rows=8800 width=380) (actual time=28.526..32.826 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3360kB -> Seq Scan on product_attributes2 pa (cost=0.00..551.00 rows=8800 width=380) (actual time=0.010..7.231 rows=8800 loops=1) Planning time: 0.376 ms Execution time: 242.963 ms (8 rows) => explain analyze select pa.product_id, json_agg(attributes) from product_attributes3 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1136.54..1240.62 rows=3046 width=387) (actual time=17.731..30.126 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1136.54..1158.54 rows=8800 width=387) (actual time=17.707..20.705 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3416kB -> Seq Scan on product_attributes3 pa (cost=0.00..560.00 rows=8800 width=387) (actual time=0.006..5.568 rows=8800 loops=1) Planning time: 0.181 ms Execution time: 31.276 ms (8 rows) The only difference between the two tables is the type of the attributes column (jsonb vs json). Each table contains the same 8800 rows. Even running json_agg on the jsonb column seems to be faster: => explain analyze select pa.product_id, json_agg(attributes) from product_attributes2 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1127.54..1231.62 rows=3046 width=380) (actual time=30.626..62.943 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1127.54..1149.54 rows=8800 width=380) (actual time=30.590..34.157 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3360kB -> Seq Scan on product_attributes2 pa (cost=000..551.00 rows=8800 width=380) (actual time=0.014..7.388 rows=8800 loops=1) Planning time: 0.142 ms Execution time: 64.504 ms (8 rows) Is it expected that jsonb_agg performance would be that much worse than json_agg? I do expect it to be significantly worse. Constructing jsonb is quite a lot more expensive than constructing json, it's the later processing that provides the performance benefit of jsonb. For 99 out of 100 uses that I have seen there is no need to be using jsonb_agg, since the output is almost always fed straight back to the client, not stored or processed further in the database. Rendering json to the client is extremely cheap, since it's already just text. Rendering jsonb as text to the client involves a lot more processing. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] problem with select *
On 08/24/2015 03:04 AM, bhuvan Mitra wrote: Hello, I have a table with 12 columns and 20 Million rows. While writing the table I do not find any problem but when reading that I have some issues faced. When I perform a 'select * from table limit 1400;' (selecting 14million rows), it is working fine. If the limit value is 1500, it is throwing the error as 'out of memory'. If the query is 'select * from table' , The process is getting killed by displaying the message 'killed'. Kindly tell me where it is going wrong. I have 6MB cache, 1.6GHz CPU, linux 14.04 OS, 8GB RAM. You should be using a cursor. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg bouncer issue what does sv_used column means
Please do not cross-post on the PostgreSQL lists. Pick the most appropriate list to post to and just post there. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fastest Backup Restore for perf testing
On 05/27/2015 04:24 PM, Wes Vaske (wvaske) wrote: Hi, I’m running performance tests against a PostgreSQL database (9.4) with various hardware configurations and a couple different benchmarks (TPC-C TPC-H). I’m currently using pg_dump and pg_restore to refresh my dataset between runs but this process seems slower than it could be. Is it possible to do a tar/untar of the entire /var/lib/pgsql tree as a backup restore method? If not, is there another way to restore a dataset more quickly? The database is dedicated to the test dataset so trashing rebuilding the entire application/OS/anything is no issue for me—there’s no data for me to lose. Thanks! Read all of this chapter. http://www.postgresql.org/docs/current/static/backup.html cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL disk fragmentation causes performance problems on Windows
On 04/29/2015 10:35 AM, k...@rice.edu wrote: On Wed, Apr 29, 2015 at 07:07:04AM -0700, Joshua D. Drake wrote: On 04/29/2015 01:08 AM, Andres Freund wrote: Which OS and filesystem is this done on? Because many halfway modern systems, like e.g ext4 and xfs, implement this in the background as 'delayed allocation'. Oh, it's in the subject. Stupid me, sorry for that. I'd consider testing how much better this behaves under a different operating system, as a shorter term relief. This is a known issue on the Windows platform. It is part of the limitations of that environment. Linux/Solaris/FreeBSD do not suffer from this issue in nearly the same manner. jD You might consider a CLUSTER or VACUUM FULL to re-write the table with less fragmentation. Or pg_repack if you can't handle the lockup time that these involve. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Number of Columns and Update
On 12/22/2014 03:53 PM, Robert DiFalco wrote: This may fall into the category of over-optimization but I've become curious. I have a user table with about 14 columns that are all 1:1 data - so they can't be normalized. When I insert a row all columns need to be set. But when I update, I sometimes only update 1-2 columns at a time. Does the number of columns impact update speed? For example: UPDATE users SET email = ? WHERE id = ?; I can easily break this up into logical tables like user_profile, user_credential, user_contact_info, user_summary, etc with each table only having 1-4 columns. But with the multiple tables I would often be joining them to bring back a collection of columns. I know I'm over thinking this but I'm curious of what the performance trade offs are for breaking up a table into smaller logically grouped tables. An update rewrites the whole row, not just the updated columns. I think you are overthinking it. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: [SQL] querying with index on jsonb slower than standard column. Why?
On 12/12/2014 04:44 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Yeah, I believe the core problem is that Postgres currently doesn't have any way to have variadic return times from a function which don't match variadic input types. Returning a value as an actual numeric from JSONB would require returning a numeric from a function whose input type is text or json. So a known issue but one which would require a lot of replumbing to fix. Well, it'd be easy to fix if we were willing to invent distinct operators depending on which type you wanted out (perhaps - for text output as today, add -# for numeric output, etc). That was my immediate reaction. Not sure about the operator name. I'd tentatively suggest -# (taking an int or text argument) and ## taking a text[] argument, both returning numeric, and erroring out if the value is a string, boolean, object or array. Doesn't seem terribly nice from a usability standpoint though. The usability issue could be fixed by teaching the planner to fold a construct like (jsonb - 'foo')::numeric into (jsonb -# 'foo'). But I'm not sure how we do that except in a really ugly and ad-hoc fashion. I would be inclined to add the operator and see how cumbersome people find it. I suspect in many cases it might be sufficient. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: [SQL] querying with index on jsonb slower than standard column. Why?
On 12/12/2014 08:20 PM, Tom Lane wrote: We can't just add the operator and worry about usability later; if we're thinking we might want to introduce such an automatic transformation, we have to be sure the new operator is defined in a way that allows the transformation to not change any semantics. What that means in this case is that if (jsonb - 'foo')::numeric would have succeeded, (jsonb -# 'foo') has to succeed; which means it'd better be willing to attempt conversion of string values to numeric, not just throw an error on sight. Well, I'm not 100% convinced about the magic transformation being a good thing. Json numbers are distinct from strings, and part of the justification for this is to extract a numeric datum from jsonb exactly as stored, on performance grounds. So turning round now and making that turn a string into a number if possible seems to me to be going in the wrong direction. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres does not use indexes with OR-conditions
On 11/07/2014 12:06 AM, Vlad Arkhipov wrote: It was just a minimal example. The real query looks like this. select * from commons.financial_documents fd where fd.creation_time '2011-11-07 10:39:07.285022+08' or (fd.creation_time = '2011-11-07 10:39:07.285022+08' and fd.financial_document_id 100) order by fd.creation_time desc limit 200 I need to rewrite it in the way below to make Postgres use the index. select * from commons.financial_documents fd where fd.creation_time = '2011-11-07 10:39:07.285022+08' and ( fd.creation_time '2011-11-07 10:39:07.285022+08' or (fd.creation_time = '2011-11-07 10:39:07.285022+08' and fd.financial_document_id 100) ) order by fd.creation_time desc limit 200 First, please do not top-post on the PostgreSQL lists. See http://idallen.com/topposting.html Second, the last test for fd.creation_time in your query seems redundant. Could you not rewrite it as something this?: where fd.creation_time = '2011-11-07 10:39:07.285022+08' and (fd.creation_time '2011-11-07 10:39:07.285022+08' or fd.financial_document_id 100) cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] assignment vs SELECT INTO
I found out today that direct assignment to a composite type is (at least in my test) about 70% faster than setting it via SELECT INTO. That seems like an enormous difference in speed, which I haven't really been able to account for. Test case: andrew=# \d abc Table public.abc Column | Type | Modifiers +-+--- x | text| y | text| z | integer | andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop select 'a','b',i into r.x,r.y,r.z; end loop; end; $x$; DO Time: 63731.434 ms andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop r := ('a','b',i); end loop; end; $x$; DO Time: 18744.151 ms Is it simply because the SELECT is in effect three assignments, so it takes nearly 3 times as long? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] assignment vs SELECT INTO
On 11/03/2014 03:24 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop select 'a','b',i into r.x,r.y,r.z; end loop; end; $x$; DO Time: 63731.434 ms andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop r := ('a','b',i); end loop; end; $x$; DO Time: 18744.151 ms Is it simply because the SELECT is in effect three assignments, so it takes nearly 3 times as long? I think it's more likely that the second example is treated as a simple expression so it has less overhead than a SELECT. Well, I accidetally left out this case: andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop select row('a','b',i) into r; end loop; end; $x$; DO Time: 81919.721 ms which is slower still. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query with large number of joins
On 10/21/2014 12:09 PM, Marco Di Cesare wrote: I did not mean to imply this works any better on SQL Server. We never tried. I just meant to say this is the first time we are using Postgres so we don't have much experience with it. We tried with work_mem set to 1GB (even as high as 3GB) but it didn't change the GroupAggregate and Sort or query run time. Sorry, I had to sanitize the query and a few of the relevant tables so hopefully I got it all right. SELECT foxtrot_india.juliet_alpha, foxtrot_india.foxtrot_yankee, foxtrot_india.hotel_sierra, foxtrot_india.juliet_alpha, foxtrot_india.bravo_romeo, oscar_bravo.golf_foxtrot, seven_kilo.november_lima, foxtrot_india.echo_six, uniform_six.seven_six, oscar_charlie.foxtrot_charlie, COUNT(DISTINCT foxtrot_india.bravo_romeo) FROM public.seven_kilo seven_kilo INNER JOIN public.papa_sierra papa_sierra ON (seven_kilo.golf_bravo = papa_sierra.golf_bravo) LEFT JOIN public.golf_two golf_two ON (seven_kilo.lima = golf_two.lima) LEFT JOIN public.bravo_xray bravo_xray ON (seven_kilo.lima = bravo_xray.lima) LEFT JOIN public.foo1 foo1 ON ((seven_kilo.bar1 = foo1.bar1) AND (seven_kilo.golf_bravo = foo1.golf_bravo)) INNER JOIN public.oscar_charlie oscar_charlie ON (seven_kilo.lima = oscar_charlie.lima) INNER JOIN public.oscar_bravo oscar_bravo ON (oscar_charlie.foxtrot_four = oscar_bravo.foxtrot_four) INNER JOIN public.foxtrot_india foxtrot_india ON (oscar_bravo.sierra = foxtrot_india.sierra) INNER JOIN public.hotel_romeo hotel_romeo ON (oscar_charlie.foxtrot_charlie = hotel_romeo.foxtrot_charlie) INNER JOIN public.uniform_six uniform_six ON (hotel_romeo.hotel_lima = uniform_six.hotel_lima) LEFT JOIN public.lookup foo2 ON (foxtrot_india.bar2 = foo2.lookup_id) LEFT JOIN public.uniform_two uniform_two ON (foxtrot_india.sierra = uniform_two.sierra) INNER JOIN public.lookup four_xray ON (uniform_two.quebec = four_xray.quebec) LEFT JOIN public.papa_four papa_four ON (foxtrot_india.sierra = papa_four.sierra) INNER JOIN public.lookup romeo_bravo ON (papa_four.quebec = romeo_bravo.quebec) LEFT JOIN public.juliet_two juliet_two ON (foxtrot_india.sierra = juliet_two.sierra) INNER JOIN public.lookup four_delta ON (juliet_two.quebec = four_delta.quebec) LEFT JOIN public.foo3 foo3 ON (foxtrot_india.bar3 = foo3.bar3) INNER JOIN public.xray xray ON (seven_kilo.lima = xray.lima) INNER JOIN public.romeo_echo romeo_echo ON (xray.echo_sierra = romeo_echo.echo_sierra) WHERE (((xray.echo_sierra = 'november_foxtrot') AND (romeo_echo.hotel_oscar = 'zulu') AND (oscar_charlie.five = 6) AND (oscar_charlie.whiskey = 'four_romeo') AND (oscar_charlie.charlie_romeo = 2014))) GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Please don't top-post on the PostgreSQL lists. See http://idallen.com/topposting.html Have you tried a) either turning off geqo or setting geqo_threshold fairly high b) setting join_collapse_limit fairly high (assuming all the above join targets are tables and not views, setting it to something like 25 should do the trick. You also haven't told us what settings you have for things like effective_cache_size, which can dramatically affect query plans. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] char(N), varchar(N), varchar, text
On 10/08/2014 10:22 AM, Emi Lu wrote: Good morning, For performance point of view, are there big differences between: char(N), varchar(N), varchar, text? Some comments from google shows: No difference, under the hood it's all varlena. Check this article from Depesz: http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/ A couple of highlights: To sum it all up: * char(n) – takes too much space when dealing with values shorter than n, and can lead to subtle errors because of adding trailing spaces, plus it is problematic to change the limit * varchar(n) – it's problematic to change the limit in live environment * varchar – just like text * text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name So, can I assume no big performance differences? Thanks alot! Emi Why do you need to ask if you already have the answer? Depesz is right. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] help: function failing
On 10/07/2014 04:41 PM, Sergey Konoplev wrote: On Thu, Oct 2, 2014 at 4:00 PM, George Neuner gneun...@comcast.net wrote: --- code CREATE OR REPLACE FUNCTION gen_random() RETURNS double precision AS $BODY$ DECLARE num float8 := 0; den float8 := 281474976710655; -- 0x bytes bytea[6]; BEGIN -- get random bytes from crypto module bytes := ext.gen_random_bytes(6); -- assemble a double precision value num := num + get_byte( bytes, 0 ); FOR i IN 1..5 LOOP num := num * 256; num := num + get_byte( bytes, i ); END LOOP; -- normalize value to range 0.0 .. 1.0 RETURN num / den; END; $BODY$ LANGUAGE plpgsql VOLATILE; --- code The error is: ERROR: array value must start with { or dimension information SQL state: 22P02 Context: PL/pgSQL function gen_random() line 8 at assignment which, if I'm counting correctly, is bytes := ext.gen_random_bytes(6); Guessing on the name of ext.gen_random_bytes(6) it returns a value that is incompatible with bytea[] array representation time from time, so take a closer look at ext.gen_random_bytes() first. You can test the case using DO block. If I comment out that line, it then tells me get_byte() is undefined, which should be impossible because it's built in. Feels like somewhere inside ext.gen_random_bytes() you set a search_path that allows to see get_byte() and the search_path that was set before the gen_random() call doesn't allow it. Why does this code want an array of byteas? It looks like the code thinks bytea[6] is a declaration of a bytea of length 6, which of course it is not. Shouldn't it just be declared as: bytes bytea; ? Oh, and pgsql-performance is completely the wrong forum for this query. usage questions should be on pgsql-general. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgres 9.3 vs. 9.4
On 09/18/2014 03:09 PM, Mkrtchyan, Tigran wrote: - Original Message - From: Josh Berkus j...@agliodbs.com To: pgsql-performance@postgresql.org Sent: Thursday, September 18, 2014 7:54:24 PM Subject: Re: [PERFORM] postgres 9.3 vs. 9.4 On 09/18/2014 08:09 AM, Mkrtchyan, Tigran wrote: 9.4beta2: ... 0.957854END; Looks like IO. Postgres internal IO? May be. We get 600MB/s on this SSDs. While it's possible that this is a Postgres issue, my first thought is that the two SSDs are not actually identical. The 9.4 one may either have a fault, or may be mostly full and heavily fragmented. Or the Dell PCIe card may have an issue. We have tested both SSDs and they have identical IO characteristics and as I already mentioned, both databases are fresh, including filesystem. You are using scale 1 which is a 1MB database, and one client and 1 thread, which is an interesting test I wouldn't necessarily have done myself. I'll throw the same test on one of my machines and see how it does. this scenario corresponds to our use case. We need a high transaction rate per for a single client. Currently I can get only ~1500 tps. Unfortunately, posgtress does not tell me where the bottleneck is. Is this is defensively not the disk IO. This is when you dig out tools like perf, maybe. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning Postgres for Single connection use
On 04/14/2014 05:46 PM, Nick Eubank wrote: Any rules of thumb for |work_mem|, |maintenance_work_mem|, |shared_buffer|, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this http://wiki.postgresql.org/wiki/Performance_Optimization, this http://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf, this http://www.revsys.com/writings/postgresql-performance.html etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. First up would probably be don't run on Windows. shared_buffers above 512Mb causes performance to degrade on Windows, while that threshold is much higher on *nix. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] query against large table not using sensible index to find very small amount of data
I have a fairly large table (~100M rows), let's call it events, and among other things it has a couple of columns on it, columns that we'll call entity_type_id (an integer) and and published_at (a timestamp). It has, among others, indices on (published_at) and (entity_type_id, published_at). A very common query against this table is of the form... SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25; ... to get the most recent 25 events from the table for a given type of entity, and generally the query planner does the expected thing of using the two-part index on (entity_type_id, published_at). Every now and again, though, I have found the query planner deciding that it ought use the single column (published_at) index. This can, unsurprisingly, result in horrendous performance if events for a given entity type are rare, as we end up with a very long walk of an index. I had this happen again yesterday and I noticed something of particular interest pertaining to the event. Specifically, the query was for an entity type that the system had only seen for the first time one day prior, and furthermore the events table had not been analyzed by the statistics collector for a couple of weeks. My intuition is that the query planner, when working with an enormous table, and furthermore encountering an entity type that the statistics collector had never previously seen, would assume that the number of rows in the events table of that entity type would be very small, and therefore the two-part index on (entity_type_id, published_at) would be the right choice. Nonetheless, an EXPLAIN was showing usage of the (published_at) index, and since there were only ~20 rows in the entire events table for that entity type the queries were getting the worst possible execution imaginable, i.e. reading in the whole table to find the rows that hit, but doing it with the random I/O of an index walk. As an experiment, I ran a VACUUM ANALYZE on the events table, and then re-ran the EXPLAIN of the query, and... Same query plan again... Maybe for whatever issue I am having the random sampling nature of the statistics collector made it unhelpful, i.e. in its sampling of the ~100M rows it never hit a single row that had the new entity type specified? Other possibly relevant pieces of information... The entity type column has a cardinality in the neighborhood of a couple dozen. Meanwhile, for some of the entity types there is a large and ongoing number of events, and for other entity types there is a smaller and more sporadic number of events. Every now and again a new entity type shows up. I can't understand why the query planner would make this choice. Maybe it has gotten ideas into its head about the distribution of data? Or maybe there is a subtle bug that my data set is triggering? Or maybe I need to turn some knobs on statistics collection? Or maybe it's all of these things together? I worry that even if there is a knob turning exercise that helps that we're still going to get burned whenever a new entity type shows up until we re-run ANALYZE, assuming that I can find a fix that involves tweaking statistics collection. I just can't fathom how it would ever be the case that Postgres's choice of index usage in this case would make sense. It doesn't even slot cleanly into the problem space of why did Postgres do a sequential scan instead of an index scan?. If you're doing a query of the described form and the entity type is specified, wouldn't the two-part index theoretically _always_ yield better performance than the one-part index? Maybe I have a flawed understanding of the cost of using various indexes? Maybe there is something analogous between sequential-versus-index-scan and one-part-versus-two-part-index scan choices? FWIW, we're running on 8.4.X and using the out-of-the-box default_statistics_target setting and haven't dabbled with setting table level statistics configurations. Thoughts? Recommended reading? -- AWG -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query against large table not using sensible index to find very small amount of data
Your understanding of the utility of multi-part indices does not jive with my own. While I agree that a partial index might be in order here, that ought just be a performance optimization that lowers the footprint of the index from an index size and index maintenance standpoint, not something that governs when the index is used for an item whose entity type rarely comes up in the table. If a couple of the entity types were to constitute 80% of the events, then using a partial index would reduce the performance strain of maintaining the index by 80%, but this ought not govern the query planner's behavior when doing queries on entity types that were not among those. My general understanding of the utility of multi-part indices is that they will come into play when some number of the leading columns appear in the query as fixed values and furthermore if a subsequent column appears as part of a ranging operation. I know that a b-tree structure isn't exactly the same as a binary-tree, but it is roughly equivalent for the purposes of our conversation... I believe you can think of multi-part indices as (roughly) equivalent either to nested binary trees, or as equivalent to a binary tree whose keys are the concatenation of the various columns. In the former case, doing a range scan would be a matter of hopping through the nested trees until you got to the terminal range scan operation, and in the latter case doing a range scan would be a matter of finding the first node in the tree that fell within the values for your concatenation and then walking through the tree. Yes, that's not exactly what happens with a b-tree, but it's pretty similar, the main differences being performance operations, I believe. Given that, I don't understand how having a multi-part index with the column over which I intend to range comes _earlier_ than the column(s) that I intend to have be fixed would be helpful. This is especially true given that the timestamp columns are are the granularity of _milliseconds_ and my data set sees a constant stream of inputs with bursts up to ~100 events per second. I think what you are describing could only make sense if the date column were at a large granularity, e.g hours or days. Or maybe I have missed something... -- AWG On Tue, Apr 08, 2014 at 01:39:41PM +, Shaun Thomas wrote: Other possibly relevant pieces of information... The entity type column has a cardinality in the neighborhood of a couple dozen. Meanwhile, for some of the entity types there is a large and ongoing number of events, and for other entity types there is a smaller and more sporadic number of events. Every now and again a new entity type shows up. With that as the case, I have two questions for you: 1. Why do you have a low cardinality column as the first column in an index? 2. Do you have any queries at all that only use the entity type as the only where clause? I agree that the planner is probably wrong here, but these choices aren't helping. The low cardinality of the first column causes very large buckets that don't limit results very well at all. Combined with the order-by clause, the planner really wants to walk the date index backwards to find results instead. I would do a couple of things. First, remove the type/date index. Next, do a count of each type in the table with something like this: SELECT type_id, count(1) FROM my_table GROUP BY 2 Any type that is more than 20% of the table will probably never be useful in an index. At this point, you have a choice. You can create a new index with date and type *in that order* or create a new partial index on date and type that also ignores the top matches. For instance, if you had a type that was 90% of the values, this would be my suggestion: CREATE INDEX idx_foo_table_date_event_type_part ON foo_table (event_date, event_type) WHERE event_type != 14; Or whatever. If the IDs are basically evenly distributed, it won't really matter. In any case, index order matters. The planner wants to restrict data as quickly as possible. If you provide an order clause, it wants to read the index in that order. Your specified type as the first column disrupts that, so it has to fetch the values first, which is usually more expensive. Even if that's wrong in your particular case, planner stats are not precise enough to know that. Either way, try moving the indexes around. I can't think of many indexes in our database where I have the low cardinality value as the first column. Databases have an easier time managing many shallow buckets of values, than a few deep ones. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing
Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system
On 12/28/2013 12:19 AM, ankush upadhyay wrote: Hello All, I am using multi tenant system and doing performance testing of multi tenant application. In case of single tenant it is working fine but once I enable tenants, then some time database servers not responding. Any clue? I usually use the term multi-tenancy to refer to different postgres instances running on the same machine, rather than different databases within a single instance of postgres. So lease describe your setup in more detail. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system
On 12/28/2013 08:46 AM, ankush upadhyay wrote: On Sat, Dec 28, 2013 at 6:50 PM, Andrew Dunstan and...@dunslane.net mailto:and...@dunslane.net wrote: On 12/28/2013 12:19 AM, ankush upadhyay wrote: Hello All, I am using multi tenant system and doing performance testing of multi tenant application. In case of single tenant it is working fine but once I enable tenants, then some time database servers not responding. Any clue? I usually use the term multi-tenancy to refer to different postgres instances running on the same machine, rather than different databases within a single instance of postgres. So lease describe your setup in more detail. cheers andrew First of all Thanks Andrew for let me know email etiquette and extremely sorry for confusion. Here I meant to say that different postgres instances running on the same machine. Actually I have one application machine and one database server machine with multiple postgres instances running on it and accessing by application server. I hope this time I could explain it in more details. Why are you doing that, as opposed to running multiple databases in a single instance? Running more than a handful of instances in a single machine is almost always a recipe for poor performance. The vast majority of users in my experience run a single postgres instance per machine, possibly with a large number of databases. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Current query of the PL/pgsql procedure.
On 12/16/2013 05:26 AM, hubert depesz lubaczewski wrote: On Sun, Dec 15, 2013 at 04:18:18PM +, Yuri Levinsky wrote: Dear ALL, I am running PL/pgsql procedure with sql statements that taking a long time. I able to see them in the log just after their completion. How can I see currently running SQL statement? I am able to see in pg_stat_activity only my call to function. Many thanks in advance. pg_stat_activity and pg logs, can't see what your function does internally. What you can do, though, is to add some RAISE LOG to the function, so that it will log its progress. Check this for example: http://www.depesz.com/2010/03/18/profiling-stored-proceduresfunctions/ Also, the auto-explain module can peer inside functions. See http://www.postgresql.org/docs/current/static/auto-explain.html cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
On 11/26/2013 09:26 AM, Craig James wrote: On 25.11.2013 22:01, Lee Nguyen wrote: Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone have any metrics or benchmarks with the latest Postgres? For those of us with small (a few to a dozen servers), we'd like to get out of server maintenance completely. Can anyone with experience on a cloud VM solution comment? Do the VM solutions provided by the major hosting companies have the same good performance as the VM's that that several have described here? Obviously there's Amazon's new Postgres solution available. What else is out there in the way of instant on solutions with Linux/Postgres/Apache preconfigured systems? Has anyone used them in production? If you want a full stack including Postgres, Heroku might be your best bet. Depends a bit on your application and your workload. And yes, I've used it. Full disclosure: I have done work paid for by Heroku. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
On 11/26/2013 08:51 AM, Boszormenyi Zoltan wrote: 2013-11-25 21:19 keltezéssel, Heikki Linnakangas írta: On 25.11.2013 22:01, Lee Nguyen wrote: Hi, Having attended a few PGCons, I've always heard the remark from a few presenters and attendees that Postgres shouldn't be run inside a VM. That bare metal is the only way to go. Here at work we were entertaining the idea of running our Postgres database on our VM farm alongside our application vm's. We are planning to run a few Postgres synchronous replication nodes. Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone have any metrics or benchmarks with the latest Postgres? I've also heard people say that they've seen PostgreSQL to perform worse in a VM. In the performance testing that we've done in VMware, though, we haven't seen any big impact. So I guess the answer is that it depends on the specific configuration of CPU, memory, disks and the software. We at Cybertec tested some configurations about 2 months ago. The performance drop is coming from the disk given to the VM guest. When there is a dedicated disk (pass through) given to the VM guest, PostgreSQL runs at a speed of around 98% of the bare metal. When the virtual disk is a disk file on the host machine, we've measured 20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux kernel with EXT4 filesystem (this latter is sure, not IIRC). The effect was observed both under Qemu/KVM and Xen. The virtual disk was not pre-allocated, since it was the default setting, i.e. space savings preferred over speed. The figure might be better with a pre-allocated disk but the filesystem journalling done twice (both in the host and the guest) will have an effect. Not-pre-allocated disk-file backed is just about the worst case in my experience. Try pre-allocated VirtIO disks on an LVM volume group - you should get much better performance. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
On 11/25/2013 03:19 PM, Heikki Linnakangas wrote: On 25.11.2013 22:01, Lee Nguyen wrote: Hi, Having attended a few PGCons, I've always heard the remark from a few presenters and attendees that Postgres shouldn't be run inside a VM. That bare metal is the only way to go. Here at work we were entertaining the idea of running our Postgres database on our VM farm alongside our application vm's. We are planning to run a few Postgres synchronous replication nodes. Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone have any metrics or benchmarks with the latest Postgres? I've also heard people say that they've seen PostgreSQL to perform worse in a VM. In the performance testing that we've done in VMware, though, we haven't seen any big impact. So I guess the answer is that it depends on the specific configuration of CPU, memory, disks and the software. Synchronous replication is likely going to be the biggest bottleneck by far, unless it's mostly read-only. I don't know if virtualization will have a measurable impact on network latency, which is what matters for synchronous replication. So, I'd suggest that you try it yourself, and see how it performs. And please report back to the list, I'd also love to see some numbers! Yeah, and there are large numbers of public and/or private cloud-based offerings out there (from Amazon RDS, Heroku, EnterpriseDB and VMware among others.) Pretty much all of these are VM based, and can be suitable for many workloads. Maybe the advice is a bit out of date. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow sort
by a query generator. But a good generator should not just say grouo by everything that's not aggregated and think it's doing a good job. In your case it should be relatively straightforward. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 09/11/2013 02:35 PM, Josh Berkus wrote: All, We've confirmed that this issue is caused by having long-running idle transactions on the server. When we disabled their queueing system (which prodiced hour-long idle txns), the progressive slowness went away. Why that should affect 9.X far more strongly than 8.4, I'm not sure about. Does that mean that 8.4 was unsafe, or that this is something which *could* be fixed in later versions? I'm also confused as to why this would affect BIND time rather than EXECUTE time. One thing that this made me wonder is why we don't have transaction_timeout, or maybe transaction_idle_timeout. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 09/10/2013 08:20 AM, Andres Freund wrote: A backtrace for this would be useful. Alternatively you could recompile postgres using -fno-omit-frame-pointer in CFLAGS and use perf record -g. It's using a custom build, so this should be doable. Any chance you have older prepared xacts, older sessions or something like that around? I'd expect heap_prune* to be present in workloads that spend significant time in heap_hot_search_buffer... Not sure about prepared transactions. There are certainly probably old prepared statements around, and long running transactions alongside this one. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 09/10/2013 09:23 AM, Andres Freund wrote: On 2013-09-10 15:21:33 +0200, Andres Freund wrote: If I interpret things correctly you're using serializable? I guess there is no chance to use repeatable read instead? Err, that wouldn't help much. Read committed. That lets PGXACT-xmin advance these days and thus might help to reduce the impact of the longrunning transactions. Otherwise you will have to shorten those... Yeah, we're looking at eliminating them. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intermittent hangs with 9.2
On 09/10/2013 11:04 AM, David Whittaker wrote: Hi All, I've been seeing a strange issue with our Postgres install for about a year now, and I was hoping someone might be able to help point me at the cause. At what seem like fairly random intervals Postgres will become unresponsive to the 3 application nodes it services. These periods tend to last for 10 - 15 minutes before everything rights itself and the system goes back to normal. During these periods the server will report a spike in the outbound bandwidth (from about 1mbs to about 5mbs most recently), a huge spike in context switches / interrupts (normal peaks are around 2k/8k respectively, and during these periods they‘ve gone to 15k/22k), and a load average of 100+. CPU usage stays relatively low, but it’s all system time reported, user time goes to zero. It doesn‘t seem to be disk related since we’re running with a shared_buffers setting of 24G, which will fit just about our entire database into memory, and the IO transactions reported by the server, as well as the disk reads reported by Postgres stay consistently low. We‘ve recently started tracking how long statements take to execute, and we’re seeing some really odd numbers. A simple delete by primary key, for example, from a table that contains about 280,000 rows, reportedly took 18h59m46.900s. An update by primary key in that same table was reported as 7d 17h 58m 30.415s. That table is frequently accessed, but obviously those numbers don't seem reasonable at all. Some other changes we've made to postgresql.conf: synchronous_commit = off maintenance_work_mem = 1GB wal_level = hot_standby wal_buffers = 16MB max_wal_senders = 10 wal_keep_segments = 5000 checkpoint_segments = 128 checkpoint_timeout = 30min checkpoint_completion_target = 0.9 max_connections = 500 The server is a Dell Poweredge R900 with 4 Xeon E7430 processors, 48GB of RAM, running Cent OS 6.3. So far we‘ve tried disabling Transparent Huge Pages after I found a number of resources online that indicated similar interrupt/context switch issues, but it hasn’t resolve the problem. I managed to catch it happening once and run a perf which showed: | + 41.40% 48154 postmaster 0x347ba9 f 0x347ba9 + 9.55% 10956 postmaster 0x2dc820 f set_config_option + 8.64%9946 postmaster 0x5a3d4 f writeListPage + 5.75%6609 postmaster 0x5a2b0 f ginHeapTupleFastCollect + 2.68%3084 postmaster 0x192483 f build_implied_join_equality + 2.61%2990 postmaster 0x187a55 f build_paths_for_OR + 1.86%2131 postmaster 0x794aa f get_collation_oid + 1.56%1822 postmaster 0x5a67e f ginHeapTupleFastInsert + 1.53%1766 postmaster 0x1929bc f distribute_qual_to_rels + 1.33%1558 postmaster 0x249671 f cmp_numerics| I‘m not sure what 0x347ba9 represents, or why it’s an address rather than a method name. That's about the sum of it. Any help would be greatly appreciated and if you want any more information about our setup, please feel free to ask. I have seen cases like this with very high shared_buffers settings. 24Gb for shared_buffers is quite high, especially on a 48Gb box. What happens if you dial that back to, say, 12Gb? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
[.] 0x5898 + 0.30% postgres [.] LockAcquireExtended + 0.30% postgres [.] _bt_first + 0.29% postgres [.] add_paths_to_joinrel + 0.28% postgres [.] MemoryContextCreate + 0.28% postgres [.] appendBinaryStringInfo + 0.28% postgres [.] MemoryContextStrdup + 0.27% postgres [.] heap_hot_search_buffer + 0.27% postgres [.] GetSnapshotData + 0.26% postgres [.] hash_search + 0.26% postgres [.] heap_getsysattr + 0.26% [vdso] [.] 0x7fff681ff70c + 0.25% postgres [.] compare_scalars + 0.25% postgres [.] pg_verify_mbstr_len cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Varchar vs foreign key vs enumerator - table and index size
On 09/02/2013 05:53 AM, Łukasz Walkowski wrote: On 1 wrz 2013, at 05:10, Vitalii Tymchyshyn tiv...@gmail.com wrote: Well, there are some more options: a) Store int keys and do mapping in the application (e.g. with java enums). This can save you a join, that is especially useful if you are going to do paged output with limit/offset scenario. Optimizer sometimes produce suboptimal plans for join in offset/limit queries. b) Store small varchar values as keys (up to char type if you really want to save space) and do user display mapping in application. It's different from (a) since it's harder to mess with the mapping and values are still more or less readable with simple select. But it can be less efficient than (a). c) Do mixed approach with mapping table, loaded on start into application memory. This would be an optimization in case you get into optimizer troubles. Best regards, Vitalii Tymchyshyn I'd like to leave database in readable form because before I add some new queries and rest endpoints to the application, I test them as ad-hoc queries using command line. So variant a) isn't good for me. Variant b) is worth trying and c) is easy to code, but I still prefer having all this data in database independent of application logic. I think the possible use of Postgres enums has been too easily written off in this thread. Looking at the original problem description they look like quite a good fit, despite the OP's skepticism. What exactly is wanted that can't be done with database enums? You can add new values to the type very simply. You can change the values of existing labels in the type slightly less simply, but still without any great difficulty. Things that are hard to do include removing labels in the set and changing the sort order, because those things would require processing tables where the type is used, unlike the simple things. But neither of these is required for typical use cases. For most uses of this kind they are very efficient both in storage and processing. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hstore VS. JSON
On 07/16/2013 11:05 AM, Niels Kristian Schjødt wrote: Hi, I'm in the process of implementing a table for storing some raw data in the format of a hash containing one level of keys and values. The hash can be quite big (up to 50 keys pointing at values varying from one to several hundred characters) Now, I'm in doubt whether to use JSON or Hstore for this task. Here is the facts: - I'm not going to search a lot (if any) in the data stored in the column, i'm only going to load it out. - The data is going to be heavily updated (not only inserted). Keys and values are going to be added/overwritten quite some times. In both cases, each hstore/json is a single datum, and updating it means writing out the whole datum - in fact the whole row containing the datum. - My database's biggest current issue is updates, so i don't want that to be a bottle neck. - I'm on postgresql 9.2 So, question is: Which will be better performance wise, especially for updates? Does the same issues with updates on the MVCC structure apply to updates in Hstore? What is taking up most space on the HDD? MVCC applies to all updates on all kinds of data. Hstore and JSON are not different in this respect. You should test the storage effects with your data. On 9.2 for your data hstore might be a better bet, since in 9.2 hstore has more operators available natively. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Process in state BIND, authentication, PARSE
On 07/08/2013 12:22 PM, Jeison Bedoya wrote: Hi, i want to know why in my database the process stay in BID, PARSE, autentication, startup by a couple minuts, generating slow in the process, perhaps tunning parameters? or configuration of operating system (Linux RHEL 6). You haven't given us nearly enough information about your setup. We'd need to see your configuration settings and have some details of the machine and where connections are coming from to diagnose it further. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Dynamic queries in stored procedure
On 07/05/2013 09:57 AM, Greg Jaskiewicz wrote: Hey, We have a search method that depending on search params will join 3-5 tables, craft the joins and where section. Only problem is, this is done in rather horrible java code. So using pgtap for tests is not feasible. I want to move the database complexity back to database, almost writing the query construction in the plpgsql or python as stores procedure, any suggestions ? Unfortunately PostgreSQL won't eliminate unnecessary joins from a view, so I can't just create one view and simple code adding where's, order by, etc. No, I don't want to use orm. It's a matter of taste. Pretty much every PL has facilities for constructing and running dynamic sql. PLPgsql ,PLPerl, PLV8 ... cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On 06/27/2013 03:14 PM, Shaun Thomas wrote: On 06/27/2013 01:42 PM, Tom Lane wrote: That will break things: CURRENT_DATE will then be equivalent to just writing today's date as a literal. Interesting. I tested it by creating a view and a table with a default, and it always seems to get translated to: ('now'::text)::date But I'll take your explanation at face value, since that doesn't imply what the output would be. What's interesting is that EnterpriseDB has their own pg_catalog.current_date function that gets called by the CURRENT_DATE keyword. So unlike in vanilla PG, I could mark just the current_date function as immutable without affecting a lot of other internals. On EDB, this actually works: UPDATE pg_proc SET provolatile = 'i' WHERE proname = 'current_date'; But that's a lie, surely. If it breaks you have nobody to blame but yourself. There's a reason EDB haven't marked their function immutable - it's not. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL settings for running on an SSD drive
On 06/20/2013 05:23 PM, Shaun Thomas wrote: On 06/20/2013 03:32 PM, Josh Berkus wrote: Did you compare setting RPC to 1.0 vs. setting it to 1.1, or something else just slightly higher than SPC? Yes, actually. My favored setting when we were on 8.3 was 1.5. But something with the planner changed pretty drastically when we went to 9.1, and we were getting some really bad query plans unless we *strongly* suggested RPC was cheap. I was afraid I'd have to go lower, but 1 seemed to do the trick. That would be perverse, surely, but on Fusion-IO RPC = SPC seems to make sense unless you assume that cache misses will be higher for random reads than for sequential reads. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Thinking About Correlated Columns (again)
On 05/15/2013 12:23 PM, Craig James wrote: On Wed, May 15, 2013 at 8:31 AM, Shaun Thomas stho...@optionshouse.com mailto:stho...@optionshouse.com wrote: [Inefficient plans for correlated columns] has been a pain point for quite a while. While we've had several discussions in the area, it always seems to just kinda trail off and eventually vanish every time it comes up. ... Since there really is no fix for this aside from completely rewriting the query or horribly misusing CTEs (to force sequence scans instead of bloated nested loops, for example)... I'm worried that without an easy answer for cases like this, more DBAs will abuse optimization fences to get what they want and we'll end up in a scenario that's actually worse than query hints. Theoretically, query hints can be deprecated or have GUCs to remove their influence, but CTEs are forever, or until the next code refactor. I've seen conversations on this since at least 2005. There were even proposed patches every once in a while, but never any consensus. Anyone care to comment? It's a very hard problem. There's no way you can keep statistics about all possible correlations since the number of possibilities is O(N^2) with the number of columns. I don't see why we couldn't allow the DBA to specify some small subset of the combinations of columns for which correlation stats would be needed. I suspect in most cases no more than a handful for any given table would be required. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] template1 vacuuming consuming much time compared to another production DBs
On 04/22/2013 07:31 AM, pradeep singh wrote: Hi, I am using postgresql 8.1 DB. Why are you using a release of Postgres that is way out of date and unsupported? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speed of EXCECUTE in PL/PGSQL
On 03/14/2013 03:22 PM, Artur Zając wrote: Why speed of executing (or planning) some very simple query from string in pl/pgsql is dependent from whole query or why “FOR r IN EXECUTE q” is significally slower from “FOR r IN query”? The whole point of EXECUTE is that it's reparsed and planned each time. You should expect it to be quite a bit slower, and avoid using EXECUTE wherever possible. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage after partitioning
On 01/22/2013 09:21 AM, rudi wrote: On Tue, Jan 22, 2013 at 3:04 PM, Merlin Moncure mmonc...@gmail.com mailto:mmonc...@gmail.com wrote: let's see the query -- it's probably written in such a way so as to not be able to be optimized through CE. The query is pretty simple and standard, the behaviour (and the plan) is totally different when it comes to a partitioned table. Partioned table query = explain analyze SELECT sb_logs.* FROM sb_logs WHERE (device_id = 901 AND date_taken = (SELECT MAX(date_taken) FROM sb_logs WHERE device_id = 901)); And there you have it. Constraint exclusion does not work in cases like this. It only works with static expressions (such as a literal date in this case). cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage after partitioning
On 01/21/2013 10:05 AM, rudi wrote: Hello, I'm running postgresl 9.0. After partitioning a big table, CPU usage raised from average 5-10% to average 70-80%. - the table contains about 20.000.000 rows - partitions are selected using a trigger, based on an indexed field, a date (IF date_taken = x AND date_taken y) - I created 5 partitions, the 2012 one now contains most of the rows. The 2013 partition is the live partition, mostly insert, a few select based on the above indexed field. The 2013, 2014, 2015 partitions are empty - constraint execution is on. I have 2 weeks CPU usage reports and the pattern definately changed after I made the partitions. Any idea? Well, the first question that comes to my mind is whether it's the inserts that are causing the load or the reads. If it's the inserts then you should show us the whole trigger. Does it by any chance use 'execute'? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] backend suddenly becomes slow, then remains slow
On 12/26/2012 11:03 PM, Jeff Janes wrote: On Fri, Dec 14, 2012 at 10:40 AM, Andrew Dunstan andrew.duns...@pgexperts.com wrote: One of my clients has an odd problem. Every so often a backend will suddenly become very slow. The odd thing is that once this has happened it remains slowed down, for all subsequent queries. Zone reclaim is off. There is no IO or CPU spike, no checkpoint issues or stats timeouts, no other symptom that we can see. By no spike, do you mean that the system as a whole is not using an unusual amount of IO or CPU, or that this specific slow back-end is not using an unusual amount? both, really. Could you strace is and see what it is doing? Not very easily, because it's a pool connection and we've lowered the pool session lifetime as part of the amelioration :-) So it's not happening very much any more. The problem was a lot worse that it is now, but two steps have alleviated it mostly, but not completely: much less aggressive autovacuuming and reducing the maximum lifetime of backends in the connection pooler to 30 minutes. Do you have a huge number of tables? Maybe over the course of a long-lived connection, it touches enough tables to bloat the relcache / syscache. I don't know how the autovac would be involved in that, though. Yes, we do indeed have a huge number of tables. This seems a plausible thesis. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] backend suddenly becomes slow, then remains slow
One of my clients has an odd problem. Every so often a backend will suddenly become very slow. The odd thing is that once this has happened it remains slowed down, for all subsequent queries. Zone reclaim is off. There is no IO or CPU spike, no checkpoint issues or stats timeouts, no other symptom that we can see. The problem was a lot worse that it is now, but two steps have alleviated it mostly, but not completely: much less aggressive autovacuuming and reducing the maximum lifetime of backends in the connection pooler to 30 minutes. It's got us rather puzzled. Has anyone seen anything like this? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] backend suddenly becomes slow, then remains slow
On 12/14/2012 02:56 PM, Tom Lane wrote: Andrew Dunstan andrew.duns...@pgexperts.com writes: One of my clients has an odd problem. Every so often a backend will suddenly become very slow. The odd thing is that once this has happened it remains slowed down, for all subsequent queries. Zone reclaim is off. There is no IO or CPU spike, no checkpoint issues or stats timeouts, no other symptom that we can see. The problem was a lot worse that it is now, but two steps have alleviated it mostly, but not completely: much less aggressive autovacuuming and reducing the maximum lifetime of backends in the connection pooler to 30 minutes. It's got us rather puzzled. Has anyone seen anything like this? Maybe the kernel is auto-nice'ing the process once it's accumulated X amount of CPU time? That was my initial thought, but the client said not. We'll check again. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
On 12/13/2012 05:12 PM, AI Rumman wrote: Why does the number of rows are different in actual and estimated? Isn't that in the nature of estimates? An estimate is a heuristic guess at the number of rows it will find for the given query or part of a query. It's not uncommon for estimates to be out by several orders of magnitude. Guaranteeing estimates within bounded accuracy and in a given short amount of time (you don't want your planning time to overwhelm your execution time) isn't possible. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] track_activity_query_size
Is there a performance downside to setting track_activity_query_size to a significantly larger value than the default 1024 (say 10240), given that there's plenty of memory to spare? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Read rows deleted
On 12/12/2012 03:24 PM, Alejandro Carrillo wrote: Hi, Anybody knows how to create a table using a table file? It isn't a fdw, is a file that compose the table in postgresql and get with the pg_relation_filepath function. Ex: select pg_relation_filepath('pg_proc'); Anybody knows a JDBC or a multiplatform code that let read the delete rows of a table without writing of a table file? This isn't a performance related question. Please ask on the correct list (probably pgsql-general). cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] encouraging index-only scans
A client is testing a migration from 9.1 to 9.2, and has found that a large number of queries run much faster if they use index-only scans. However, the only way he has found to get such a plan is by increasing the seq_page_cost to insanely high levels (3.5). Is there any approved way to encourage such scans that's a but less violent than this? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] encouraging index-only scans
On 12/12/2012 04:32 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: A client is testing a migration from 9.1 to 9.2, and has found that a large number of queries run much faster if they use index-only scans. However, the only way he has found to get such a plan is by increasing the seq_page_cost to insanely high levels (3.5). Is there any approved way to encourage such scans that's a but less violent than this? Is the pg_class.relallvisible estimate for the table realistic? They might need a few more VACUUM and ANALYZE cycles to get it into the neighborhood of reality, if not. That was the problem - I didn't know this hadn't been done. Keep in mind also that small values of random_page_cost necessarily decrease the apparent advantage of index-only scans. If you think 3.5 is an insanely high setting, I wonder whether you haven't driven those numbers too far in the other direction to compensate for something else. Right. Thanks for the help. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] encouraging index-only scans
On 12/12/2012 05:12 PM, Andrew Dunstan wrote: On 12/12/2012 04:32 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: A client is testing a migration from 9.1 to 9.2, and has found that a large number of queries run much faster if they use index-only scans. However, the only way he has found to get such a plan is by increasing the seq_page_cost to insanely high levels (3.5). Is there any approved way to encourage such scans that's a but less violent than this? Is the pg_class.relallvisible estimate for the table realistic? They might need a few more VACUUM and ANALYZE cycles to get it into the neighborhood of reality, if not. That was the problem - I didn't know this hadn't been done. Actually, the table had been analysed but not vacuumed, so this kinda begs the question what will happen to this value on pg_upgrade? Will people's queries suddenly get slower until autovacuum kicks in on the table? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: xfs perform a lot better than ext4 [WAS: Re: [PERFORM] Two identical systems, radically different performance]
On 12/05/2012 11:51 AM, Jean-David Beyer wrote: I thought that postgreSQL did its own journalling, if that is the proper term, so why not use an ext2 file system to lower overhead? Postgres journalling will not save you from a corrupt file system. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres configuration for 8 CPUs, 6 GB RAM
On 11/27/2012 02:47 AM, Syed Asif Tanveer wrote: Hi, I am using PostgreSQL 9.1.5 for Data warehousing and OLAP puposes. Data size is around 100 GB and I have tuned my PostgreSQL accordingly still I am facing performance issues. The query performance is too low despite tables being properly indexed and are vacuumed and analyzed at regular basis. CPU usage never exceeded 15% even at peak usage times. Kindly guide me through if there are any mistakes in setting configuration parameters. Below are my system specs and please find attached my postgresql configuration parameters for current system. There is at least anecdotal evidence that Windows servers degrade when shared_buffers is set above 512Mb. Personally, I would not recommend using Windows for a high performance server. Also, it makes no sense to have a lower setting for maintenance_work_mem than for work_mem. You would normally expect maintenance_work_mem to be higher - sometimes much higher. Apart from that, it's going to be impossible to tell what your problem is without seeing actual slow running queries and their corresponding explain analyse output. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 11/21/2012 08:04 AM, Heikki Linnakangas wrote: On 21.11.2012 01:53, Tom Lane wrote: I think the more interesting question is what cases wouldn't be covered by such a rule. Typically you need to use OFFSET 0 in situations where the planner has guessed wrong about costs or rowcounts, and I think people are likely using WITH for that as well. Should we be telling people that they ought to insert OFFSET 0 in WITH queries if they want to be sure there's an optimization fence? Yes, I strongly feel that we should. Writing a query using WITH often makes it more readable. It would be a shame if people have to refrain from using it, because the planner treats it as an optimization fence. If we're going to do it can we please come up with something more intuitive and much, much more documented than OFFSET 0? And if/when we do this we'll need to have big red warnings all over then release notes, since a lot of people I know will need to do some extensive remediation before moving to such a release. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 11/21/2012 09:59 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: If we're going to do it can we please come up with something more intuitive and much, much more documented than OFFSET 0? And if/when we do this we'll need to have big red warnings all over then release notes, since a lot of people I know will need to do some extensive remediation before moving to such a release. The probability that we would actually *remove* that behavior of OFFSET 0 is not distinguishable from zero. I'm not terribly excited about having an alternate syntax to specify an optimization fence, but even if we do create such a thing, there's no need to break the historical usage. I wasn't talking about removing it. My point was that if the optimization fence around CTEs is removed a lot of people will need to rework apps where they have used them for that purpose. And I continue to think that spelling it OFFSET 0 is horribly obscure. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 11/21/2012 11:32 AM, Claudio Freire wrote: On Wed, Nov 21, 2012 at 1:24 PM, Andres Freund and...@2ndquadrant.com wrote: On 2012-11-21 13:16:25 -0300, Claudio Freire wrote: On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund and...@anarazel.de wrote: +1 WITH foo AS (SELECT ...) (barrier=on|off)? 9.3 introduces the syntax, defaulting to on 9.4 switches the default to off. Why syntax? What about a guc? collapse_cte_limit? Because there are very good reasons to want to current behaviour. A guc is a global either/or so I don't see it helping much. set collapse_cte_limit=8; with blah as (blah) select blah; Not global at all. Then you have to unset it again, which is ugly. You might even want it applying to *part* of a query, not the whole thing, so this strikes me as a dead end. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 11/21/2012 02:30 PM, Gavin Flower wrote: WITH FENCE foo AS (SELECT ...) default? WITHOUT FENCE foo AS (SELECT ...) :-) Nah! I prefer this, but it is too specific to 'WITH', and very unSQL standardish! Alternatively one of the following 1. WITH UNFENCED foo AS (SELECT ...) 2. WITH NO FENCE foo AS (SELECT ...) 3. WITH NOT FENCE foo AS (SELECT ...) I loke the firsat variant, but the 3rd is most SQL standardish! As Tom (I think) pointed out, we should not have a syntax tied to CTEs. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] partitioning versus clustering
Postgres Performance Wizards, I am a few years into developing and operating a system underpinned by Postgres that sees the arrival a significant number of events around the clock, not an epic amount of data, but enough to be challenging to me, in particular when system downtime is not very palatable and the data is retained quasi-indefinitely. I have various tables that house different kinds of events, and in addition to wanting to look at a small number of rows of data, users often want to generate summary reports on large swaths of data that span days or weeks. At present, these reports can trigger index scans that take minutes to service, and the parameters of the reports are user specified, making their pre-generation infeasible. Generally the rows in these tables are write-once, but they contain a pointer to the related BLOB from which they were constructed, and every now and again some new field in the originating BLOB becomes of interest, causing me to alter the table and then do a sweep of the full table with corresponding updates, violating the otherwise INSERT-only nature. These event tables generally have an event time column that is indexed and which is an obvious candidate for either partitioning or clustering of the table. I'm trying to make sense of which is the better option for me. As best I can tell, the decision points are as follows... PARTITIONING Pros: * no outage; data just starts flowing into new partitions seamlessly * allows more control over where the data goes, creating retrieval parallelization opportunities * clustering cannot be inadvertently undone in a way that requires scheduled downtime to repair * probably more resilient in the case of the event time being different from the time that I processed the event Cons: * does not deal with legacy data without extra migration (over time this becomes less relevant) * requires some kind of background process to manage partition creation * partition size will affect performance and choosing its size is not a science CLUSTERING Pros: * no particularly custom development work on my part * once done, it puts all existing data in a good state for efficient querying without extra work Cons: * will lock up the system for the duration of the CLUSTER command * somehow need to make sure that ANALYZE commands run often enough * does not give me much control of the underlying storage layout * may have problems when the occasional mass-UPDATE is done * unclear whether a VACUUM FULL is required to prevent subsequent un-clustered-ness despite having a fill factor of 100, stemming from the mass-UPDATE operations * could generate a huge number of WAL segments to archive * could possibly be sabotaged by the event time property not being well correlated with the time that the event is processed in the face of upstream systems have momentary issues As far as questions to the group go: * Is my understanding of the pros and cons of the options reasonably correct and comprehensive? * What has governed your decisions in making such a choice on past projects of your own? * If I go the clustering route, will the occasional mass update really mess with things, requiring a re-cluster and possibly even a full vacuum (to prevent re-un-clustering)? * Might it make more sense to cluster when the event time property is the time that I processed the event but partition when it is the time that the event occurred in some other system? * Is running a CLUSTER command actually necessary to get the performance benefits if the table ought already be in a good order, or is just running a CLUSTER command on a well ordered table enough to get query execution to yield nice sequential access to the disk? Many thanks in advance for your insights... -- AWG -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 11/14/2012 10:23 AM, David Greco wrote: Have a query using a CTE that is performing very poorly. The equivalent query against the same data in an Oracle database runs in under 1 second, in Postgres it takes 2000 seconds. The smp_pkg.get_invoice_charges queries fedexinvoices for some data and normalizes it into a SETOF some record type. It is declared to be STABLE. Fedexinvoices consists of about 1.3M rows of medium width. Fedexinvoices.id is the primary key on that table, and trim(fedexinvoices.trackno) is indexed via the function trim. The plan for the equivalent query in Oracle is much smaller and simpler. No sequential (or full table) scans on fedexinvoices. WITH charges as ( SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) charge_info from fedexinvoices fi2 ) select fedexinvoices.* from fedexinvoices inner join charges on charges.id = fedexinvoices.id AND (charges.charge_info).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION') where trim(fedexinvoices.trackno)='799159791643' ; Can you explain what you're actually trying to do here? The query looks rather odd. Why are you joining this table (or an extract from it) to itself? In any case, you could almost certainly recast it and have it run fast by first filtering on the tracking number. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 11/14/2012 10:56 AM, David Greco wrote: You're right. I was translating an oracle query , but looks like PG will allow some syntax that is different. Trying to find entries in fedexinvoices where smp_pkg.get_invoice_charges(id) returns a record containing charge_name in ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION'). Should return the fedexinvoices row and the row from smp_pkg.get_invoice_charges that contains the address correction. Something like this, though this is syntactically incorrect as smp_pkg.get_invoice_charges returns a set: select fedexinvoices.*, (smp_pkg.get_invoice_charges(id)).* from fedexinvoices WHERE trim(fedexinvoices.trackno)='799159791643' and (smp_pkg.get_invoice_charges(id)).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION') First, please don't top-post when someone has replied underneath your post. It makes the thread totally unreadable. See http://idallen.com/topposting.html You could do something like this: WITH invoices as ( select * from fedexinvoices where trim(fedexinvoices.trackno)='799159791643' ), charges as ( SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) charge_info from fedexinvoices fi2 join invoices i on i.id = f12.id ) select invoices.* from invoices inner join charges on charges.id = invoices.id AND (charges.charge_info).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION') ; Or probably something way simpler but I just did this fairly quickly and mechanically cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: SOLVED - RE: [PERFORM] Poor performance using CTE
On 11/14/2012 08:17 PM, Craig Ringer wrote: On 11/15/2012 12:29 AM, Tom Lane wrote: David Greco david_gr...@harte-hanks.com writes: Thanks, that did the trick. Though I'm still not clear as to why. PG treats WITH as an optimization fence --- the WITH query will be executed pretty much as-is. It may be that Oracle flattens the query somehow; though if you're using black-box functions in both cases, it's not obvious where the optimizer could get any purchase that way. I was looking through the latest spec drafts I have access to and couldn't find any reference to Pg's optimisation-fence-for-CTEs behaviour being required by the standard, though I've repeatedly seen it said that there is such a requirement. Do you know where it's specified? All I can see is that the optimised result must have the same effect as the original. That'd mean that wCTEs and CTE terms that use VOLATILE functions or functions with side-effects couldn't be optimised into other queries. Simple CTEs could be, though, and there are times I've really wished I could use a CTE but I've had to use a set-returning subquery to get reasonable plans. It cuts both ways. I have used CTEs a LOT precisely because this behaviour lets me get better plans. Without that I'll be back to using the offset 0 hack. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On 11/13/2012 10:12 PM, Denis wrote: Please don't think that I'm trying to nitpick here, but pg_dump has options for dumping separate tables and that's not really consistent with the idea that pg_dump is primarily designed for dumping entire databases. Sure it is. The word primarily is not just a noise word here. The fact that we have options to do other things doesn't mean that its primary design goal has changed. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Remote access to Postgresql slow
Is your network link between server and client across the public internet? You need to check bandwidth and latency characteristics of your network. A simple test run following on server host and run it again on the client host. time psql [connect details] -c 'select now()' I access postgresql database across the public internet (by tunnelling port 5432 across compressed ssh sessions). In my case latency is a significant penalty. Locally time response is for above is 10ms but remotely it is 30 times slower (350ms) You may need to install wireshark or similar and monitor client traffic in order to figure out the network penalty. Maybe your app goes back and forward to postgres multiple time; does lots of chatter. If so then latency cost becomes very significant. You want to try and minimise the number of postgresql calls; retrieve more data will less SQL operations. On Fri, Sep 14, 2012 at 7:02 PM, Manoj Agarwal m...@ockham.be wrote: ** Hi, I have a Centos 6.2 Virtual machine that contains Postgresql version 8.4 database. The application installed in this Virtual machine uses this database that is local to this Virtual machine. I have tried to offload the database, by installing it on a remote Virtual machine, on another server, and tried to connect to it from my local Virtual machine. The application configuration remains the same, only database is offloaded to a remote Virtual machine on another server and the connection parameters have changed. The connection is all fine and the application can access the remote database. I have observed that the Postgresql is responding extremely slow. What should I do to improve its performance? Please suggest. Kind Regards, Manoj Agarwal
Re: [PERFORM] add column with default value is very slow
On 09/11/2012 09:55 AM, AI Rumman wrote: I added the excel file for locks data. I was surprised to see that while I was updating a single column value for all records in a tables, all indexes are locked by the server. Any ALTER TABLE command locks the whole table in ACCESS EXCLUSIVE mode, indexes included. See the description of ACCESS EXCLUSIVE lock at http://www.postgresql.org/docs/current/static/explicit-locking.html cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best practice to avoid table bloat?
On 08/16/2012 04:33 PM, Anibal David Acosta wrote: Hi, if I have a table that daily at night is deleted about 8 millions of rows (table maybe has 9 millions) is recommended to do a vacuum analyze after delete completes or can I leave this job to autovacuum? This table is very active during the day but less active during night I think that the only only thing where Postgres is weak, is in this area (table and index bloat). For some reason for the same amount of data every day postgres consume a little more. Check out pg_reorg. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN
On 07/23/2012 04:41 AM, Jim Vanns wrote: We're seeing SELECT statements and even EXPLAIN (no ANAYLZE) statements hang indefinitely until *something* (we don't know what) releases some kind of resource or no longer becomes a massive bottle neck. These are the symptoms. I have seen this sort of behaviour on systems with massive catalogs (millions of tables and indexes). Could that be your problem? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] queries are fast after dump-restore but slow again after some days dispite vacuum
On 07/19/2012 07:33 AM, Felix Scheicher wrote: Hi, I am running a 9.1 server at Ubuntu. When I upgraded to the current version I did a pg_dump followed by pg_restore and found that the db was much faster. But slowed down again after two days. I did the dump-restore again and could now compare the two (actually identical) databases. This is a query of the old one directly after a VACUUM ANALYSE: ... No matter how much I vacuum or analyse the slow db, I don't get it faster. I also checked for dead tuples - there are none. Try running CLUSTER on the relevant tables and see if it makes a difference. If it does you might want to look into using pg_reorg periodically. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] queries are fast after dump-restore but slow again after some days dispite vacuum
On 07/19/2012 11:13 AM, Felix Scheicher wrote: Andrew Dunstan andrew at dunslane.net writes: Try running CLUSTER on the relevant tables and see if it makes a difference. If it does you might want to look into using pg_reorg periodically. That worked like a charm! Many thanks. But how comes, the queries are also fast after a restore without the cluster? There is probably a lot of unused space in your table. CLUSTER rewrites a fresh copy, as do restore and pg_reorg. You might also want to try changing the settings on the table so it gets much more aggressively auto-vacuumed, so that dead space is made available much more quickly, and the table has less chance to get bloated. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE vs TRUNCATE explanation
On 07/11/2012 04:47 PM, Shaun Thomas wrote: On 07/11/2012 03:18 PM, Craig James wrote: It strikes me as a contrived case rather than a use case. What sort of app repeatedly fills and truncates a small table thousands of times ... other than a test app to see whether you can do it or not? Test systems. Any company with even a medium-size QA environment will have continuous integration systems that run unit tests on a trash database hundreds or thousands of times through the day. Aside from dropping/creating the database via template, which would be *really* slow, truncate is the easiest/fastest way to reset between tests. Why is recreating the test db from a (populated) template going to be slower than truncating all the tables and repopulating from an external source? I had a client who achieved a major improvement in speed and reduction in load by moving to this method of test db setup. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Paged Query
On 07/09/2012 01:41 PM, Misa Simic wrote: From my experience users even very rare go to ending pages... easier to them would be to sort data by field to get those rows in very first pages... Yeah, the problem really is that most client code wants to know how many pages there are, even if it only wants one page right now. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Sequencial scan in a JOIN
Hi everyone, I am trying to run the following query: SELECT count(1) --DISTINCT l_userqueue.queueid FROM e_usersessions JOIN l_userqueue ON l_userqueue.userid = e_usersessions.entityid JOIN a_activity ON a_activity.activequeueid = l_userqueue.queueid AND a_activity.vstatus= 1 AND a_activity.ventrydate 0 AND a_activity.sbuid = e_usersessions.sbuid AND a_activity.assignedtoid = 0 AND a_activity.status '0' WHERE e_usersessions.sessionkeepalivedatetime 20120605082131943 Explain analyze: 'Aggregate (cost=100402.10..100402.11 rows=1 width=0) (actual time=2249.051..2249.051 rows=1 loops=1)' ' - Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual time=0.541..2249.027 rows=33 loops=1)' 'Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND (a_activity.sbuid = e_usersessions.sbuid))' '- Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167 width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)' ' Filter: ((ventrydate 0) AND ((status)::text '0'::text) AND (vstatus = 1) AND (assignedtoid = 0::numeric))' '- Hash (cost=10.86..10.86 rows=5 width=22) (actual time=0.053..0.053 rows=4 loops=1)' ' - Hash Join (cost=9.38..10.86 rows=5 width=22) (actual time=0.033..0.048 rows=4 loops=1)' 'Hash Cond: (l_userqueue.userid = e_usersessions.entityid)' '- Seq Scan on l_userqueue (cost=0.00..1.23 rows=23 width=27) (actual time=0.003..0.009 rows=23 loops=1)' '- Hash (cost=9.31..9.31 rows=5 width=21) (actual time=0.018..0.018 rows=2 loops=1)' ' - Index Scan using i06_e_usersessions on e_usersessions (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012 rows=2 loops=1)' 'Index Cond: (sessionkeepalivedatetime 20120605082131943::bigint)' 'Total runtime: 2249.146 ms' I am trying to understand the reason why the a sequencial scan is used on a_activity instead of using the index by activequeueid (i08_a_activity). If I run the this other query, I get a complete different results: SELECT * FROM a_activity WHERE a_activity.activequeueid = 123456 AND a_activity.vstatus= 1 AND a_activity.ventrydate 0 Explain analyze: 'Index Scan using i08_a_activity on a_activity (cost=0.00..303.57 rows=162 width=7287) (actual time=0.019..0.019 rows=0 loops=1)' ' Index Cond: ((activequeueid = 123456::numeric) AND (vstatus = 1) AND (ventrydate 0))' 'Total runtime: 0.076 ms' This is the definition of the index : CREATE INDEX i08_a_activity ON a_activity USING btree (activequeueid , vstatus , ventrydate ); a_activity table has 1,216,134 rows Thanks in advance, Andrew
Re: [PERFORM] Sequencial scan in a JOIN
the default_statistics_target is set to 200, and I have run the analyze and reindex on these tables before writing the email. Andrew Date: Tue, 5 Jun 2012 08:15:45 -0500 From: stho...@optionshouse.com To: andrewjai...@hotmail.com CC: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Sequencial scan in a JOIN On 06/05/2012 07:48 AM, Andrew Jaimes wrote: ' - Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual time=0.541..2249.027 rows=33 loops=1)' 'Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND (a_activity.sbuid = e_usersessions.sbuid))' '- Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167 width=22) (actual time=0.010..1662.142 I'd be willing to bet your stats are way, way off. It expected 242,803 rows in the hash, but only got 33. In that kind of scenario, I could easily see the planner choosing a sequence scan over an index scan, as doing that many index seeks would be much more expensive than scanning the table. What's your default_statistics_target, and when is the last time you analyzed the tables in this query? -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query optimization
On 04/26/2012 04:08 PM, Tom Lane wrote: Thomas Kellererspam_ea...@gmx.net writes: Tom Lane wrote on 26.04.2012 21:17: Um ... did you analyze all the tables, or just some of them? I get sub-millisecond runtimes if all four tables have been analyzed, but it does seem to pick lousy plans if, say, only a and b have been analyzed. Here it's similar to Richard's experience: Before analyzing the four tables, the first statement yields this plan: [ merge joins ] This continues to stay the plan for about 10-15 repetitions, then it turns to this plan [ hash joins ] Hmm. I see it liking the merge-join plan (with minor variations) with or without analyze data, but if just some of the tables have been analyzed, it goes for the hash plan which is a good deal slower. The cost estimates aren't that far apart though. In any case, the only reason the merge join is so fast is that the data is perfectly ordered in each table; on a less contrived example, it could well be a lot slower. It's not so terribly contrived, is it? It's common enough to have tables which are append-only and to join them by something that corresponds to the append order (serial field, timestamp etc.) cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scale up (postgresql vs mssql)
On 04/11/2012 06:11 PM, Eyal Wilde wrote: hi, i had a stored procedure in ms-sql server. this stored procedure gets a parameter (account-id), dose about 20 queries, fills some temporary tables, and finally, returns a few result-sets. this stored procedure converted to stored function in postgresql (9.1). the result-sets are being returned using refcursors. this stored function is logically, almost identical to the ms-sql stored procedure. a LOT of work had been done to make postgresql getting close to ms-sql speed (preparing temp-tables in advance, using analyze in special places inside the stored function in order to hint the optimizer that the temp-tables have very few records, thus eliminating unnecessary and expansive hash-join, and a lot more..). after all that, the stored function is running in a reasonable speed (normally ~60 milliseconds). now, i run a test that simulates 20 simultaneous clients, asking for account-id randomly. once a client get a result, it immediately asks for another one. the test last 5 seconds. i use a connection pool (with Tomcat web-server). the pool is automatically increased to ~20 connections (as expected). the result is postgresql dose ~60 account-ids, whereas ms-sql dose ~330 account-ids. postgresql shows that each account-id took about 400-1000 msec ,which is so much slower than the ~60 msec of a single execution. in a single execution postgresql may be less the twice slower than ms-sql, but in 20 simultaneous clients, it's about 6 times worse. why is that? the hardware is one 4-core xeon. 8GB of ram. the database size is just a few GB's. centos-6.2. do you think the fact that postgresql use a process per connection (instead of multi-threading) is inherently a weakness of postgrsql, regarding scale-up? would it be better to limit the number of connections to something like 4, so that executions don't interrupt each other? thanks in advance for any help! I doubt that the process-per-connection has much effect, especially on Linux where process creation is extremely cheap, and you're using a connection pooler anyway. The server is pretty modest, though. If you can add enough RAM that you can fit the whole db into Postgres shared buffers you might find things run a whole lot better. You should show us your memory settings, among other things - especially shared_buffers, temp_buffers and work_mem. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance of SQL Function versus View
On 04/03/2012 10:21 AM, Robert Haas wrote: You should probably test this in your environment, but I'd expect the view to be better. Wrapping logic inside PL/pgsql functions needlessly rarely turn outs to be a win. Right, But also note that auto_explain is very useful in getting plans and times of queries nested in functions which can't easily be got otherwise. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Update join performance issues
On 04/03/2012 01:29 PM, Kevin Kempter wrote: Hi All; I have a query that wants to update a table based on a join like this: update test_one set f_key = t.f_key from upd_temp1 t, test_one t2 where t.id_number = t2.id_number Why is test_one in the from clause? update joins whatever is in the from clause to the table being updated. You almost never need it repeated in the from clause. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TCP Overhead on Local Loopback
On 04/01/2012 09:11 PM, Andrew Dunstan wrote: On 04/01/2012 08:29 PM, Claudio Freire wrote: On Sun, Apr 1, 2012 at 8:54 PM, Andrew Dunstanand...@dunslane.net wrote: You could try using Unix domain socket and see if the performance improves. A relevant link: He said Windows. There are no Unix domain sockets on Windows. (And please don't top-post) Windows supports named pipes, which are functionally similar, but I don't think pg supports them. Correct, so telling the OP to have a look at them isn't at all helpful. And they are not supported on all Windows platforms we support either (specifically not on XP, AIUI). Apparently I was mistaken about the availability. However, my initial point remains. Since all our client/server comms on Windows are over TCP, telling the OP to look at Unix domain sockets is unhelpful. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TCP Overhead on Local Loopback
On 04/01/2012 06:01 PM, Andy wrote: You could try using Unix domain socket and see if the performance improves. A relevant link: He said Windows. There are no Unix domain sockets on Windows. (And please don't top-post) cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TCP Overhead on Local Loopback
On 04/01/2012 08:29 PM, Claudio Freire wrote: On Sun, Apr 1, 2012 at 8:54 PM, Andrew Dunstanand...@dunslane.net wrote: You could try using Unix domain socket and see if the performance improves. A relevant link: He said Windows. There are no Unix domain sockets on Windows. (And please don't top-post) Windows supports named pipes, which are functionally similar, but I don't think pg supports them. Correct, so telling the OP to have a look at them isn't at all helpful. And they are not supported on all Windows platforms we support either (specifically not on XP, AIUI). cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tablespaces on a raid configuration
On 03/30/2012 10:45 AM, Campbell, Lance wrote: PostgreSQL 9.0.x When PostgreSQL storage is using a relatively large raid 5 or 6 array is there any value in having your tables distributed across multiple tablespaces if those tablespaces will exists on the same raid array? I understand the value if you were to have the tablespaces on different raid arrays. But what about on the same one? Not answering your question, but standard advice is not to use RAID 5 or 6, but RAID 10 for databases. Not sure if that still hold if you're using SSDs. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] database slowdown while a lot of inserts occur
On 03/29/2012 03:27 PM, Bob Lunney wrote: Lance, May small inserts cause frequent fsyncs. Is there any way those small inserts can be batched into some larger sets of inserts that use copy to perform the load? Or possibly a prepared statement called many times in a single transaction, if you're not using that already. It's not as efficient as COPY, but it's often a much less extensive change to the code. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow self-join query
On 03/18/2012 10:37 AM, Robert Poor wrote: On Sat, Mar 17, 2012 at 23:09, Scott Marlowe scott.marl...@gmail.com mailto:scott.marl...@gmail.com wrote: Also it looks like you're still not using the index on this: Subquery Scan u1 (cost=0.00..313.55 rows=50 width=4) (actual time=0.030..147.136 rows=1 loops=1) Filter: ((u1.type)::text = 'User::Twitter'::text) Are you sure you're using an indexable condition? I know that users.type is indexed -- what would keep that from being honored? FWIW, I believe that all user.type fields are set to User::Twitter, but that will change in the future. If all the rows have that value, then using the index would be silly. Postgres knows from the stats that ANALYZE calculates whether or not using an index is likely to be more efficient, and avoids doing so in cases where it isn't. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
On 03/01/2012 05:52 PM, Stefan Keller wrote: These are the current modified settings in postgresql.conf: shared_buffers = 128MB work_mem = 3MB These are extremely low settings on virtually any modern computer. I usually look to set shared buffers in numbers of Gb and work_mem at least in tens if not hundreds of Mb for any significantly sized database. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
On 03/01/2012 07:58 PM, Claudio Freire wrote: On Thu, Mar 1, 2012 at 9:28 PM, Peter van Hardenbergp...@pvh.ca wrote: Setting work_mem to hundreds of MB in a 4G system is suicide. Tens even is dangerous. Why do you say that? We've had work_mem happily at 100MB for years. Is there a particular degenerate case you're concerned about? Me too. But I've analyzed the queries I'll be sending to the database and I've carefully bound the effective amount of memory used given the load I'll be experiencing. Saying that it should be set to 100M without consideration for those matters is the suicide part. work_mem applies to each sort operation. Suppose, just for the sake of argument, that each connection is performing 5 such sorts (ie, 5 joins of big tables - not unthinkable), then suppose you have your max_connections to the default of 100, then the system could request as much as 50G of ram. I set work_mem higher in my database system since I *know* most of the connections will not perform any merge or hash joins, nor will they sort the output, so they won't use work_mem even once. The ones that will, I have limited on the application side to a handful, hence I *know* that 50G theoretical maximum will not be reached. Can the OP say that? I have no reason to think so. Hence I don't suggest 100M is OK on a 4G system. Well, obviously you need to know your workload. Nobody said otherwise. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
On 02/27/2012 09:45 AM, Shaun Thomas wrote: On 02/27/2012 02:08 AM, Reuven M. Lerner wrote: In the end, it was agreed that we could execute the deletes over time, deleting items in the background, or in parallel with the application's work. After all, if the disk is filling up at the rate of 2 GB/day, then so long as we delete 4 GB/day (which is pretty easy to do), we should be fine. Please tell me you understand deleting rows from a PostgreSQL database doesn't work like this. :) The MVCC storage system means you'll basically just be marking all those deleted rows as reusable, so your database will stop growing, but you'll eventually want to purge all the accumulated dead rows. One way to see how many there are is to use the pgstattuple contrib module. You can just call it on the table name in question: SELECT * FROM pgstattuple('my_table'); You may find that after your deletes are done, you'll have a free_pct of 80+%. In order to get rid of all that, you'll need to either run CLUSTER on your table(s) or use the select-truncate-insert method anyway. If he has autovacuum on he could well be just fine with his proposed strategy. Or he could have tables partitioned by time and do the delete by just dropping partitions. There are numerous way he could get this to work. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
On 02/23/2012 05:07 AM, Marcin Mańk wrote: DELETE FROM B WHERE r_id IN (SELECT R.id FROM R, B WHERE r.end_date (NOW() - (interval '1 day' * 30)) AND r.id = b.r_id How about: DELETE FROM B WHERE r_id IN (SELECT distinct R.id FROM R WHERE r.end_date (NOW() - (interval '1 day' * 30)) ? Or possibly without the DISTINCT. But I agree that the original query shouldn't have B in the subquery - that alone could well make it crawl. What is the distribution of end_dates? It might be worth running this in several steps, deleting records older than, say, 90 days, 60 days, 30 days. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query slow as Function
On 02/18/2012 11:37 AM, Tom Lane wrote: Andreas Kretschmerakretsch...@spamfence.net writes: You can check the plan with the auto_explain - Extension, and you can force the planner to create a plan based on the actual input-value by using dynamic SQL (EXECUTE 'your query string' inside the function) Steve *is* using EXECUTE, so that doesn't seem to be the answer. I'm wondering about datatype mismatches myself --- the function form is forcing the parameter to be char(9), which is not a constraint imposed in the written-out query. There are lots of other possibilities though. It would be hard to say much without a self-contained example to try. He's using EXECUTE ... USING. Does that plan with the used parameter? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Exploring memory usage
On 12/27/2011 11:00 AM, Scott Marlowe wrote: On Tue, Dec 27, 2011 at 8:33 AM, Claudio Freireklaussfre...@gmail.com wrote: On Sat, Dec 24, 2011 at 4:22 PM, Michael Smolskysitr...@email.com wrote: work_mem = 128MB (tried 257MB, didn't change anything) This is probably your problem. Without an EXPLAIN output, I cannot be sure, but 'work_mem' is not the total amount of memory a query can use, it's the amount of memory it can use for *one* sort/hash/whatever operation. A complex query can have many of those, so your machine is probably swapping due to excessive memory requirements. Try *lowering* it. You can do so only for that query, by executing: set work_mem = '8MB';your query He can lower it for just that query but honestly, even on a machine with much more memory I'd never set it as high as he has it. On a busy machine with 128G RAM the max I ever had it set to was 16M, and that was high enough I kept a close eye on it (well, nagios did anway.) It depends on the workload. Your 16M setting would make many of my clients' systems slow to an absolute crawl for some queries, and they don't run into swap issues, because we've made educated guesses about usage patterns. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance costs of various PL languages
On 12/27/2011 05:54 PM, Merlin Moncure wrote: On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehulepavel.steh...@gmail.com wrote: Hello 2011/12/27 Carlo Stonebanksstonec.regis...@sympatico.ca: We are currently using pltclu as our PL of choice AFTER plpgSql. I'd like to know if anyone can comment on the performance costs of the various PL languages BESIDES C. For example, does pltclu instantiate faster than pltcl (presumably because it uses a shared interpreter?) Is Perl more lightweight? I know that everything depends on context - what you are doing with it, e.g. choose Tcl for string handling vs. Perl for number crunching - but for those who know about this, is there a clear performance advantage for any of the various PL languages - and if so, is it a difference so big to be worth switching? I ask this because I had expected to see pl/pgsql as a clear winner in terms of performance over pltclu, but my initial test showed the opposite. I know this may be an apples vs oranges problem and I will test further, but if anyone has any advice or insight, I would appreciate it so I can tailor my tests accordingly. A performance strongly depends on use case. PL/pgSQL has fast start but any expression is evaluated as simple SQL expression - and some repeated operation should be very expensive - array update, string update. PL/pgSQL is best as SQL glue. Positive to performance is type compatibility between plpgsql and Postgres. Interpret plpgsql is very simply - there are +/- zero optimizations - plpgsql code should be minimalistic, but when you don't do some really wrong, then a speed is comparable with PHP. http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language PL/Perl has slower start - but string or array operations are very fast. Perl has own expression evaluator - faster than expression evaluation in plpgsql. On second hand - any input must be transformed from postgres format to perl format and any result must be transformed too. Perl and other languages doesn't use data type compatible with Postgres. One big advantage pl/pgsql has over scripting languages is that it understands postgresql types natively. It knows what a postgres array is, and can manipulate one directly. pl/perl would typically have to have the database convert it to a string, parse it into a perl structure, do the manipulation, then send it to the database to be parsed again. If your procedure code is mainly moving data between tables and doing minimal intermediate heavy processing, this adds up to a big advantage. Which pl to go with really depends on what you need to do. pl/pgsql is always my first choice though. perl and tcl are not particularly fast languages in the general case -- you are largely at the mercy of how well the language's syntax or library features map to the particular problem you're solving. if you need a fast general purpose language in the backend and are (very understandably) skeptical about C, I'd look at pl/java. PLV8, which is not yet ready for prime time, maps many common Postgres types into native JS types without the use of Input/Output functions, which means the conversion is very fast. It's work which could very well do with repeating for the other PL's. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Query planner suggestion, for indexes with similar but not exact ordering.
Hi all. Been using postgres for years, and lurking on this newsgroup for a short while now to help me gain the benefit of your expertise and experience and learn how to get most out of postgresql possible. I do a fair bit of work on tables using composite keys. I have discovered a couple of things lately that may or may not be worth looking at in terms of query planner. Consider my archetypal table. Based on real data. table master franchise smallint, partnumber varchar It is a spare parts list, combining part numbers from multiple suppliers (franchise). Part numbers are typically unique but sometimes there are duplicates. I have use cases which concern both finding parts by a specific franchise or finding parts system wide. In my table I have follow stats: * Number of records : 2,343,569 * Number of unique partnumber records : 2,130,379 (i.e. for a given partnumber there is on average, 1.1 records. i.e. a partnumber is used by 1.1 suppliers. The partnumber with the most number of records = 8 records. * Number of unique suppliers : 35 Now consider following query: its purpose is to render next 20 rows at an aribtrary position. The position being after record matching franchise=10, partnumber='1' in partnumber then franchise order. select * from master where partnum='1' and (partnum'1' or franchise10) order by partnum,franchise limit 20; Now if I have a composite index on partnum + franchise. This query performs the way you would expect and very quickly. But if I have an index on partnum only the system seqscan's master. And yields poor performance. i.e.: Limit (cost=143060.23..143060.28 rows=20 width=93) (actual time=2307.986..2307.998 rows=20 loops=1) - Sort (cost=143060.23..148570.14 rows=2203967 width=93) (actual time=2307.982..2307.986 rows=20 loops=1) Sort Key: partnum, franchise Sort Method: top-N heapsort Memory: 19kB - Seq Scan on master (cost=0.00..84413.46 rows=2203967 width=93) (actual time=0.019..1457.001 rows=2226792 loops=1) Filter: (((partnum)::text = '1'::text) AND (((partnum)::text '1'::text) OR (franchise 10))) Total runtime: 2308.118 ms I wonder, if it is possible and worthwhile, to setup the query planner to recognize that because of the stats I indicate above, that a sort by partnum is almost exactly the same as a sort by partnum+franchise. And doing a Index scan on partnum index, and sorting results in memory will be dramatically faster. The sort buffer only needs to be very small, will only grow to 8 records only at most in my above example. The buffer will scan partnum index, and as long as partnum is the same, it will sort that small segment, as soon as the partnum increments when walking the index, the buffer zeros out again for next sort group. Artificially simulating this in SQL (only works with foreknowledge of max count of records for a given part. i.e. +8 ) shows the dramatic theoretical performance gain over the above. explain analyze select * from (select * from master where partnum='1' order by partnum limit 20+8) x where partnum'1' or franchise10 order by partnum,franchise limit 20; = Limit (cost=77.71..77.75 rows=16 width=230) (actual time=0.511..0.555 rows=20 loops=1) - Sort (cost=77.71..77.75 rows=16 width=230) (actual time=0.507..0.524 rows=20 loops=1) Sort Key: x.partnum, x.franchise Sort Method: quicksort Memory: 21kB - Subquery Scan x (cost=0.00..77.39 rows=16 width=230) (actual time=0.195..0.367 rows=28 loops=1) Filter: (((x.partnum)::text '1'::text) OR (x.franchise 10)) - Limit (cost=0.00..76.97 rows=28 width=93) (actual time=0.180..0.282 rows=28 loops=1) - Index Scan using master_searchpartkey on master (cost=0.00..6134000.35 rows=2231481 width=93) (actual time=0.178..0.240 rows=28 loops=1) Index Cond: ((partnum)::text = '1'::text) Total runtime: 0.695 ms Of course I could just make sure I create indexes with match my order by fields perfectly; which is exactly what I am doing right now. But I thought that maybe it might be worth while considering looking at allowing some sort of in memory sort to be overlaid on an index if the statistics indicate that the sorts are very nearly ordered. Andrew
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 02/04/2011 02:32 AM, da...@lang.hm wrote: when a copy command is issued, I assume that there is some indication of how much data is going to follow. No of course there isn't. How would we do that with a stream like STDIN? Read the code. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 02/02/2011 07:17 PM, Greg Smith wrote: I direct anyone who thought Mladen was making a serious comment to http://www.nydailynews.com/news/politics/2009/01/08/2009-01-08_misunderestimate_tops_list_of_notable_bu-3.html if you want to get his little joke there. I plan to start using misunderestimate more in the future when talking about planner errors. Might even try to slip it into the docs at some point in the future and see if anybody catches it. My wings take dream ... cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best replication solution?
On Tue, Apr 07, 2009 at 10:31:02PM +1200, Mark Kirkwood wrote: From my experience - gained from unwittingly being in the wrong place at the wrong time and so being volunteered into helping people with Slony failures - it seems to be quite possible to have nodes out of sync and not be entirely aware of it I should have stated that differently. First, you're right that if you don't know where to look or what to look for, you can easily be unaware of nodes being out of sync. What's not a problem with Slony is that the nodes can get out of internally consistent sync state: if you have a node that is badly lagged, at least it represents, for sure, an actual point in time of the origin set's history. Some of the replication systems aren't as careful about this, and it's possible to get the replica into a state that never happened on the origin. That's much worse, in my view. In addition, it is not possible that Slony's system tables report the replica as being up to date without them actually being so, because the system tables are updated in the same transaction as the data is sent. It's hard to read those tables, however, because you have to check every node and understand all the states. Complexity seems to be the major evil here. Yes. Slony is massively complex. simpler to administer. Currently it lacks a couple of features Slony has (chained slaves and partial DDL support), but I'll be following its development closely - because if these can be added - whilst keeping the operator overhead (and the foot-gun) small, then this looks like a winner. Well, those particular features -- which are indeed the source of much of the complexity in Slony -- were planned in from the beginning. Londiste aimed to be simpler, so it would be interesting to see whether those features could be incorporated without the same complication. A -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best replication solution?
On Sun, Apr 05, 2009 at 11:36:33AM -0700, Lists wrote: *Slony-I* - I've used this in the past, but it's a huge pain to work with, caused serious performance issues under heavy load due to long running transactions (may not be the case anymore, it's been a while since I used it on a large database with many writes), and doesn't seem very reliable (I've had replication break on me multiple times). It is indeed a pain to work with, but I find it hard to believe that it is the actual source of performance issues. What's more likely true is that it wasn't tuned to your write load -- that _will_ cause performance issues. Of course, tuning it is a major pain, as mentioned. I'm also somewhat puzzled by the claim of unreliability: most of the actual replication failures I've ever seen under Slony are due to operator error (these are trivial to induce, alas -- aforementioned pain to work with again). Slony is baroque and confusing, but it's specifically designed to fail in safe ways (which is not true of some of the other systems: several of them have modes in which it's possible to have systems out of sync with each other, but with no way to detect as much. IMO, that's much worse, so we designed Slony to fail noisily if it was going to fail at all). *Mammoth Replicator* - This is open source now, is it any good? It sounds like it's trigger based like Slony. Is it based on Slony, or simply use a similar solution? It's completely unrelated, and it doesn't use triggers. I think the people programming it are first-rate. Last I looked at it, I felt a little uncomfortable with certain design choices, which seemed to me to be a little hacky. They were all on the TODO list, though. *SkyTools/Londiste* - Don't know anything special about it. I've been quite impressed by the usability. It's not quite as flexible as Slony, but it has the same theory of operation. The documentation is not as voluminous, although it's also much handier as reference material than Slony's (which is, in my experience, a little hard to navigate if you don't already know the system pretty well). A -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] inheritance, and plans
Tom == Tom Lane t...@sss.pgh.pa.us writes: Andrew Gierth and...@tao11.riddles.org.uk writes: Type-dependent selection of operators has already been done as part of parse analysis, no? And the domain - base conversion is purely a relabelling, no? So what semantic change is possible as a result? Tom Domain conversions are *not* simply relabellings. It's possible Tom now to have domain-specific functions/operators, Right, but that's irrelevent to the planner in this case because the resolution of precisely which operator is being called has _already happened_ (in parse analysis), no? Tom It's possible that there are specific cases where the UNION Tom optimization checks could allow domains to be treated as their Tom base types, The domain - base conversion is an important one (to anyone who uses domains) because it happens implicitly in a wide range of contexts, and so it's unsatisfactory for it to have major performance impacts such as interfering with important optimizations. Tom but blindly smashing both sides of the check to base is going to Tom break more cases than it fixes. The posted code was only looking up the base type for one side, not both (though I don't know that code well enough to know whether it was the correct side); the case of interest is when the subquery has the domain type but the outer query is seeing the base type, _not_ the reverse. -- Andrew. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] inheritance, and plans
Tom == Tom Lane t...@sss.pgh.pa.us writes: [domain - base type conversion interfering with optimization] Tom You seem to be laboring under the delusion that this is Tom considered a bug. Of course it's a bug, or at least a missing feature - there is no justification for putting performance deathtraps in the way of using domains. Tom It's a necessary semantic restriction, because the pushed-down Tom expression could mean different things when applied to different Tom data types. How? Type-dependent selection of operators has already been done as part of parse analysis, no? And the domain - base conversion is purely a relabelling, no? So what semantic change is possible as a result? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Deleting millions of rows
Hello All, TL If you're deleting very many but not all rows, people tend TL to drop the FK constraints and re-establish them afterwards. I find BEGIN; CREATE TEMP TABLE remnant AS SELECT * FROM bigtable WHERE (very_restrictive_condition); TRUNCATE TABLE bigtable; INSERT INTO bigtable SELECT * FROM remnant; COMMIT; ANALYSE bigtable; works well because there is no possibility of my forgetting FKs. -- Sincerely, Andrew Lazarusmailto:and...@pillette.comBEGIN:VCARD VERSION:2.1 N:Lazarus;Andrew;;;Ph.D. FN:Andrew Lazarus, Ph.D. EMAIL;PREF;INTERNET:and...@pillette.com TITLE:Director of RD ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S= treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA X-GENDER:Male REV:18991230T08Z END:VCARD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional Slow Commit
On Mon, Oct 27, 2008 at 05:23:37PM -0700, David Rees wrote: However, occasionally, processing time will jump up significantly - the average processing time is around 20ms with the maximum processing time taking 2-4 seconds for a small percentage of transactions. Ouch! Turning on statement logging and analyzing the logs of the application itself shows that step #4 is the culprit of the vast majority of the slow transactions. My bet is that you're waiting on checkpoints. Given that you're on 8.3, start fiddling with the checkpoint_completion_target parameter. 0.7 might help. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow updates, poor IO
Hi, On Fri, Sep 26, 2008 at 07:24:55AM +1200, John Huttley wrote: I've just had an interesting encounter with the slow full table update problem that is inherent with MVCC Quite apart from the other excellent observations in this thread, what makes you think this is an MVCC issue exactly? A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Choosing a filesystem
On Thu, Sep 11, 2008 at 06:29:36PM +0200, Laszlo Nagy wrote: The expert told me to use RAID 5 but I'm hesitating. I think that RAID 1+0 would be much faster, and I/O performance is what I really need. I think you're right. I think it's a big mistake to use RAID 5 in a database server where you're hoping for reasonable write performance. In theory RAID 5 ought to be fast for reads, but I've never seen it work that way. I would like to put the WAL file on the SAS disks to improve performance, and create one big RAID 1+0 disk for the data directory. But maybe I'm completely wrong. Can you please advise how to create logical partitions? I would listen to yourself before you listen to the expert. You sound right to me :) A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance