Re: [PERFORM] parse - bind take more time than execute
thx a lot for your answer :) 2) default_statistics_target is set to (default) 100 and there no special statistics per-column 3) i will do that very soon 4) in absolute units i can see the same query having similar stats to these: parse - 600 ms bind - 300 ms execute - 50 ms the query mentioned above is a simple select from one table using using two where conditions. and this table has 1 additional index (except the primary key) on the columns that are part of the where clause -- View this message in context: http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5103116.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to clock the time spent for query parsing and planning?
I'd like to find some measurements/figures of query preparation and planning time to justify the use of prepared statements and stored procedures. I know that complex queries have larger preparation time. Though, is it possible to explicitly measure the time the optimizer spends parsing and planning for query execution? Thank you, James
Re: [PERFORM] Exploring memory usage
On Sat, Dec 24, 2011 at 4:22 PM, Michael Smolsky sitr...@email.com wrote: work_mem = 128MB (tried 257MB, didn't change anything) This is probably your problem. Without an EXPLAIN output, I cannot be sure, but 'work_mem' is not the total amount of memory a query can use, it's the amount of memory it can use for *one* sort/hash/whatever operation. A complex query can have many of those, so your machine is probably swapping due to excessive memory requirements. Try *lowering* it. You can do so only for that query, by executing: set work_mem = '8MB'; your query -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] parse - bind take more time than execute
there are some performance issues on a server and by searching in the logs i noticed that the phases of parse and bind take considerably more time than execute for most of the queries. i guess that the right thing to do in this case is to use functions or prepare statements but in any case, what could be the cause of this? information about the server- -CentOS 5.6 -4-cores -12GB ram shared_buffers: 1 GB temp_buffers = 100MB work_mem : 30 MB maintenance_mem: 512 MB database_size: 1,5 GB archive_mode is ON vacuum/analyze (vacuum_scale_factor 0.1, analyze 0.05) this behaviour is not related with checkpoints on the database (as indicated by the logs, i dont see this latency when a checkpoint occurs, i see it most of the time) so my question is the following; what can cause the bind/parse phases to take so much longer than the execute? if you need any more info the server i ll be glad to provide it. thank you in advance for your advice -- View this message in context: http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102940.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] parse - bind take more time than execute
the version of postgres is 8.4.7 :) -- View this message in context: http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102954.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Exploring memory usage
On Sat, Dec 24, 2011 at 12:22 PM, Michael Smolsky sitr...@email.com wrote: shared_buffers = 2GB (tried 8GB, didn't change anything) work_mem = 128MB (tried 257MB, didn't change anything) As someone mentioned, lower is better here. 128M is quite high. effective_cache_size = 12GB (tried 2GB didn't change anything) This doesn't affect memory usage. It only tells the planner about how big the OS and pg caches are for the db. It's a very coarse adjustment knob, so don't get too worried about it. In order to resolve my issue, I tried to search for postgres profiling tools and found no relevant ones. This is rather disappointing. That's what I expected to find: Look for pg_buffercache. I'm sue there's some others I'm forgetting. Grab a copy of Greg Smith's Performance PostgreSQL, it's got a lot of great info in it on handling heavy load servers. I realize that postgres is a free software and one cannot demand new features from people who invest their own free time in developing and maintaining it. I am hoping that my feedback could be useful for future development. It's not just free as in beer. It's free as in do what you will with it. So, if you whip out your checkbook and start waving it around, you can certainly pay someone to write the code to instrument this stuff. Whether you release it back into the wild is up to you. But yea, first see if someone's already done some work on that, like the pg_bufffercache modules before spending money reinventing the wheel. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Exploring memory usage
Hello, I'm running a fairly complex query on my postgres-8.4.9 Ubuntu box. The box has 8-core CPU, 18G of RAM and no virtualization layer. The query takes many hours to run. The query essentially involves a join of two large tables on a common string column, but it also includes joins with other smaller tables. I looked at the query plan, as produced by EXPLAIN and found it reasonable. When the query starts, Linux `top' shows fairly low RAM usage (in the hundreds of MB range) and about 100% CPU usage for the relevant postgres process. I'm also seeing some temp files being generated by this postgres process in the postgres temp directory. All this makes sense to me. As the query processes further, the memory usage by this postgres process shoots up to 12G resident and 17G virtual, while the CPU usage falls down to single-digit percents. The need to utilize more memory at some point during query execution seems in agreement with the query plan. I feel that my server configuration is not optimal: I would like to observe close to 100% CPU utilization on my queries, but seeing 20 times lower values. My query forks a single large-RAM process running on the server. There are other queries running on the same server, but they are quick and light on memory. I cannot explain the following observations: * Postgres is not writing temp files into its temp directory once the RAM usage goes up, but vmstat shows heavy disk usage, mostly the swap in field is high. Top shows 6G of swap space in use. * All my attempts to limit postgres' memory usage by playing with postgres config parameters failed. Here are the relevant parameters from postgresql.conf (I did use SHOW parameter to check that the parameters have been read by the server). I think I'm using the defaults for all other memory-related configurations. shared_buffers = 2GB (tried 8GB, didn't change anything) work_mem = 128MB (tried 257MB, didn't change anything) wal_buffers = 16MB effective_cache_size = 12GB (tried 2GB didn't change anything) In order to resolve my issue, I tried to search for postgres profiling tools and found no relevant ones. This is rather disappointing. That's what I expected to find: * A tool that could explain to me why postgres is swapping. * A tool that showed what kind of memory (work mem vs buffers, etc) was taking all that virtual memory space. * A tool for examining plans of the running queries. It would be helpful to see what stage of the query plan the server is stuck on (e.g. mark the query plans with some symbols that indicate currently running, completed, results in memory/disk, etc). I realize that postgres is a free software and one cannot demand new features from people who invest their own free time in developing and maintaining it. I am hoping that my feedback could be useful for future development. Thanks!
Re: [PERFORM] parse - bind take more time than execute
hello. 1. planning time execute time, it can happen normally, for some fast-executing queries, so it is not bad per se. 2. what are your statistics settings? they influence planning time. I mean default_statistics_target and per-column SET STATISTICS? 3. upgrade to 8.4.10, it's quick upgrade (minimal downtime) and there were some planner improvements. 4. what is considerably more time in absolute units? Filip 2011/12/27 MirrorX mirr...@gmail.com: there are some performance issues on a server and by searching in the logs i noticed that the phases of parse and bind take considerably more time than execute for most of the queries. i guess that the right thing to do in this case is to use functions or prepare statements but in any case, what could be the cause of this? information about the server- -CentOS 5.6 -4-cores -12GB ram shared_buffers: 1 GB temp_buffers = 100MB work_mem : 30 MB maintenance_mem: 512 MB database_size: 1,5 GB archive_mode is ON vacuum/analyze (vacuum_scale_factor 0.1, analyze 0.05) this behaviour is not related with checkpoints on the database (as indicated by the logs, i dont see this latency when a checkpoint occurs, i see it most of the time) so my question is the following; what can cause the bind/parse phases to take so much longer than the execute? if you need any more info the server i ll be glad to provide it. thank you in advance for your advice -- View this message in context: http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102940.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] parse - bind take more time than execute
Hello 2011/12/27 MirrorX mirr...@gmail.com: there are some performance issues on a server and by searching in the logs i noticed that the phases of parse and bind take considerably more time than execute for most of the queries. i guess that the right thing to do in this case is to use functions or prepare statements but in any case, what could be the cause of this? A reason should be a blind optimization of prepared statement. Prepared statements are optimized to most frequent values. try to look on plan - statement EXPLAIN should be used for prepared statements too. Regards Pavel Stehule information about the server- -CentOS 5.6 -4-cores -12GB ram shared_buffers: 1 GB temp_buffers = 100MB work_mem : 30 MB maintenance_mem: 512 MB database_size: 1,5 GB archive_mode is ON vacuum/analyze (vacuum_scale_factor 0.1, analyze 0.05) this behaviour is not related with checkpoints on the database (as indicated by the logs, i dont see this latency when a checkpoint occurs, i see it most of the time) so my question is the following; what can cause the bind/parse phases to take so much longer than the execute? if you need any more info the server i ll be glad to provide it. thank you in advance for your advice -- View this message in context: http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102940.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Exploring memory usage
On Tue, Dec 27, 2011 at 1:00 PM, Scott Marlowe scott.marl...@gmail.com wrote: He can lower it for just that query but honestly, even on a machine with much more memory I'd never set it as high as he has it. On a busy machine with 128G RAM the max I ever had it set to was 16M, and that was high enough I kept a close eye on it (well, nagios did anway.) I have it quite high, because I know the blend of queries going into the server allows it. But yes, it's not a sensible setting if you didn't analyze the activity carefully. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Exploring memory usage
On Tue, Dec 27, 2011 at 8:33 AM, Claudio Freire klaussfre...@gmail.com wrote: On Sat, Dec 24, 2011 at 4:22 PM, Michael Smolsky sitr...@email.com wrote: work_mem = 128MB (tried 257MB, didn't change anything) This is probably your problem. Without an EXPLAIN output, I cannot be sure, but 'work_mem' is not the total amount of memory a query can use, it's the amount of memory it can use for *one* sort/hash/whatever operation. A complex query can have many of those, so your machine is probably swapping due to excessive memory requirements. Try *lowering* it. You can do so only for that query, by executing: set work_mem = '8MB'; your query He can lower it for just that query but honestly, even on a machine with much more memory I'd never set it as high as he has it. On a busy machine with 128G RAM the max I ever had it set to was 16M, and that was high enough I kept a close eye on it (well, nagios did anway.) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] parse - bind take more time than execute
i am not using prepared statements for now :) i just said that probably, if i do use them, i will get rid of that extra time since the plan will be already 'decided' in advance -- View this message in context: http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5103182.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to clock the time spent for query parsing and planning?
Hello 2011/12/23 Igor Schtein ischt...@gmail.com: I'd like to find some measurements/figures of query preparation and planning time to justify the use of prepared statements and stored procedures. I know that complex queries have larger preparation time. Though, is it possible to explicitly measure the time the optimizer spends parsing and planning for query execution? You can use time for EXPLAIN statement Regards Pavel Stehule Thank you, James -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Subquery flattening causing sequential scan
PostgreSQL 9.0.2 Mac OS X Server 10.6.8 Autovacuum is on, and I have a script that runs vacuum analyze verbose every night along with the backup. I have a situation where I'm experiencing a seq scan on a table with almost 3M rows when my condition is based on a subquery. A google search turned up a way to prevent flattening the subquery into a join using OFFSET 0. This does work, reducing the query from around 1s to around 250ms, most of which is the subquery. My question is why does it do a seq scan when it flattens this subquery into a JOIN? Is it because the emsg_messages table is around 1M rows? Are there some guidelines to when the planner will prefer not to use an available index? I just had a look through postgresql.conf and noticed that I forgot to set effective_cache_size to something reasonable for a machine with 16GB of memory. Would the default setting of 128MB cause this behavior? I can't bounce the production server midday to test that change. EXPLAIN ANALYZE SELECT ema.message_id, ema.email_address_id, ema.address_type FROM emsg_message_addresses ema WHERE ema.message_id IN ( SELECT id FROM emsg_messages msg WHERE msg.account_id = 314 AND msg.outgoing = FALSE AND msg.message_type = 1 AND msg.spam_level 2 AND msg.deleted_at IS NULL AND msg.id NOT IN ( SELECT emf.message_id FROM emsg_message_folders emf where emf.account_id = 314 ) ) QUERY PLAN Hash Semi Join (cost=84522.74..147516.35 rows=49545 width=12) (actual time=677.058..1083.685 rows=2 loops=1) Hash Cond: (ema.message_id = msg.id) - Seq Scan on emsg_message_addresses ema (cost=0.00..53654.78 rows=2873478 width=12) (actual time=0.020..424.241 rows=2875437 loops=1) - Hash (cost=84475.45..84475.45 rows=3783 width=4) (actual time=273.392..273.392 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB - Bitmap Heap Scan on emsg_messages msg (cost=7979.35..84475.45 rows=3783 width=4) (actual time=273.224..273.387 rows=1 loops=1) Recheck Cond: (account_id = 314) Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1)) - Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.98 rows=34611 width=0) (actual time=9.633..9.633 rows=34997 loops=1) Index Cond: (account_id = 314) SubPlan 1 - Bitmap Heap Scan on emsg_message_folders emf (cost=704.90..7022.51 rows=35169 width=4) (actual time=5.684..38.016 rows=34594 loops=1) Recheck Cond: (account_id = 314) - Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169 width=0) (actual time=5.175..5.175 rows=34594 loops=1) Index Cond: (account_id = 314) Total runtime: 1083.890 ms EXPLAIN ANALYZE SELECT ema.message_id, ema.email_address_id, ema.address_type FROM emsg_message_addresses ema WHERE ema.message_id IN ( SELECT id FROM emsg_messages msg WHERE msg.account_id = 314 AND msg.outgoing = FALSE AND msg.message_type = 1 AND msg.spam_level 2 AND msg.deleted_at IS NULL AND msg.id NOT IN ( SELECT emf.message_id FROM emsg_message_folders emf where emf.account_id = 314 ) OFFSET 0 ) QUERY PLAN Nested Loop (cost=84524.89..87496.74 rows=2619 width=12) (actual time=273.409..273.412 rows=2 loops=1) - HashAggregate (cost=84524.89..84526.89 rows=200 width=4) (actual time=273.345..273.346 rows=1 loops=1) - Limit (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.171..273.335 rows=1 loops=1) - Bitmap Heap Scan on emsg_messages msg (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.169..273.333 rows=1 loops=1) Recheck Cond: (account_id = 314) Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1)) - Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.99 rows=34612 width=0) (actual time=9.693..9.693 rows=34998 loops=1) Index Cond: (account_id = 314) SubPlan 1 - Bitmap Heap Scan on emsg_message_folders emf (cost=704.90..7022.51 rows=35169 width=4) (actual time=5.795..39.420 rows=34594 loops=1) Recheck Cond: (account_id = 314) - Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169
Re: [PERFORM] Subquery flattening causing sequential scan
Jim Crate jim...@gmail.com writes: My question is why does it do a seq scan when it flattens this subquery into a JOIN? Because it thinks there will be 3783 rows out of the msg scan, which if true would make your desired nestloop join a serious loser. You need to see about getting that estimate to be off by less than three orders of magnitude. Possibly raising the stats target on emsg_messages would help. I'd also try converting the inner NOT IN into a NOT EXISTS, just to see if that makes the estimate any better. Using something newer than 9.0.2 might help too, as we fixed some outer-join estimation bugs a few months ago. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Exploring memory usage
On Tue, Dec 27, 2011 at 9:14 AM, Andrew Dunstan and...@dunslane.net wrote: It depends on the workload. Your 16M setting would make many of my clients' systems slow to an absolute crawl for some queries, and they don't run into swap issues, because we've made educated guesses about usage patterns. Exactly. I've had an old Pentium4 machine that did reporting and only had 2G RAM with a 256M work_mem setting, while the heavily loaded machine I mentioned earlier handles something on the order of several hundred concurrent users and thousands of queries a second, and 16Meg was a pretty big setting on that machine, but since most of the queries were of the select * from sometable where pkid=123456 it wasn't too dangerous. It's all about the workload. For that, we need more info from the OP. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Exploring memory usage
On 12/27/2011 11:00 AM, Scott Marlowe wrote: On Tue, Dec 27, 2011 at 8:33 AM, Claudio Freireklaussfre...@gmail.com wrote: On Sat, Dec 24, 2011 at 4:22 PM, Michael Smolskysitr...@email.com wrote: work_mem = 128MB (tried 257MB, didn't change anything) This is probably your problem. Without an EXPLAIN output, I cannot be sure, but 'work_mem' is not the total amount of memory a query can use, it's the amount of memory it can use for *one* sort/hash/whatever operation. A complex query can have many of those, so your machine is probably swapping due to excessive memory requirements. Try *lowering* it. You can do so only for that query, by executing: set work_mem = '8MB';your query He can lower it for just that query but honestly, even on a machine with much more memory I'd never set it as high as he has it. On a busy machine with 128G RAM the max I ever had it set to was 16M, and that was high enough I kept a close eye on it (well, nagios did anway.) It depends on the workload. Your 16M setting would make many of my clients' systems slow to an absolute crawl for some queries, and they don't run into swap issues, because we've made educated guesses about usage patterns. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Performance costs of various PL languages
We are currently using pltclu as our PL of choice AFTER plpgSql. I'd like to know if anyone can comment on the performance costs of the various PL languages BESIDES C. For example, does pltclu instantiate faster than pltcl (presumably because it uses a shared interpreter?) Is Perl more lightweight? I know that everything depends on context - what you are doing with it, e.g. choose Tcl for string handling vs. Perl for number crunching - but for those who know about this, is there a clear performance advantage for any of the various PL languages - and if so, is it a difference so big to be worth switching? I ask this because I had expected to see pl/pgsql as a clear winner in terms of performance over pltclu, but my initial test showed the opposite. I know this may be an apples vs oranges problem and I will test further, but if anyone has any advice or insight, I would appreciate it so I can tailor my tests accordingly. Thanks, Carlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance costs of various PL languages
Hello 2011/12/27 Carlo Stonebanks stonec.regis...@sympatico.ca: We are currently using pltclu as our PL of choice AFTER plpgSql. I'd like to know if anyone can comment on the performance costs of the various PL languages BESIDES C. For example, does pltclu instantiate faster than pltcl (presumably because it uses a shared interpreter?) Is Perl more lightweight? I know that everything depends on context - what you are doing with it, e.g. choose Tcl for string handling vs. Perl for number crunching - but for those who know about this, is there a clear performance advantage for any of the various PL languages - and if so, is it a difference so big to be worth switching? I ask this because I had expected to see pl/pgsql as a clear winner in terms of performance over pltclu, but my initial test showed the opposite. I know this may be an apples vs oranges problem and I will test further, but if anyone has any advice or insight, I would appreciate it so I can tailor my tests accordingly. A performance strongly depends on use case. PL/pgSQL has fast start but any expression is evaluated as simple SQL expression - and some repeated operation should be very expensive - array update, string update. PL/pgSQL is best as SQL glue. Positive to performance is type compatibility between plpgsql and Postgres. Interpret plpgsql is very simply - there are +/- zero optimizations - plpgsql code should be minimalistic, but when you don't do some really wrong, then a speed is comparable with PHP. http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language PL/Perl has slower start - but string or array operations are very fast. Perl has own expression evaluator - faster than expression evaluation in plpgsql. On second hand - any input must be transformed from postgres format to perl format and any result must be transformed too. Perl and other languages doesn't use data type compatible with Postgres. Regards Pavel Stehule Thanks, Carlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Subquery flattening causing sequential scan
Hi, On 28 December 2011 05:12, Tom Lane t...@sss.pgh.pa.us wrote: Possibly raising the stats target on emsg_messages would help. In the function std_typanalyze() is this comment: /* * The following choice of minrows is based on the paper * Random sampling for histogram construction: how much is enough? * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in * Proceedings of ACM SIGMOD International Conference on Management * of Data, 1998, Pages 436-447. Their Corollary 1 to Theorem 5 * says that for table size n, histogram size k, maximum relative * error in bin size f, and error probability gamma, the minimum * random sample size is * r = 4 * k * ln(2*n/gamma) / f^2 * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain * r = 305.82 * k * Note that because of the log function, the dependence on n is * quite weak; even at n = 10^12, a 300*k sample gives = 0.66 * bin size error with probability 0.99. So there's no real need to * scale for n, which is a good thing because we don't necessarily * know it at this point. * */ The question is why the parameter f is not exposed as a GUC? Sometimes it could make sense to have few bins with better estimation (for same r). -- Ondrej Ivanic (ondrej.iva...@gmail.com) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance costs of various PL languages
On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello 2011/12/27 Carlo Stonebanks stonec.regis...@sympatico.ca: We are currently using pltclu as our PL of choice AFTER plpgSql. I'd like to know if anyone can comment on the performance costs of the various PL languages BESIDES C. For example, does pltclu instantiate faster than pltcl (presumably because it uses a shared interpreter?) Is Perl more lightweight? I know that everything depends on context - what you are doing with it, e.g. choose Tcl for string handling vs. Perl for number crunching - but for those who know about this, is there a clear performance advantage for any of the various PL languages - and if so, is it a difference so big to be worth switching? I ask this because I had expected to see pl/pgsql as a clear winner in terms of performance over pltclu, but my initial test showed the opposite. I know this may be an apples vs oranges problem and I will test further, but if anyone has any advice or insight, I would appreciate it so I can tailor my tests accordingly. A performance strongly depends on use case. PL/pgSQL has fast start but any expression is evaluated as simple SQL expression - and some repeated operation should be very expensive - array update, string update. PL/pgSQL is best as SQL glue. Positive to performance is type compatibility between plpgsql and Postgres. Interpret plpgsql is very simply - there are +/- zero optimizations - plpgsql code should be minimalistic, but when you don't do some really wrong, then a speed is comparable with PHP. http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language PL/Perl has slower start - but string or array operations are very fast. Perl has own expression evaluator - faster than expression evaluation in plpgsql. On second hand - any input must be transformed from postgres format to perl format and any result must be transformed too. Perl and other languages doesn't use data type compatible with Postgres. One big advantage pl/pgsql has over scripting languages is that it understands postgresql types natively. It knows what a postgres array is, and can manipulate one directly. pl/perl would typically have to have the database convert it to a string, parse it into a perl structure, do the manipulation, then send it to the database to be parsed again. If your procedure code is mainly moving data between tables and doing minimal intermediate heavy processing, this adds up to a big advantage. Which pl to go with really depends on what you need to do. pl/pgsql is always my first choice though. perl and tcl are not particularly fast languages in the general case -- you are largely at the mercy of how well the language's syntax or library features map to the particular problem you're solving. if you need a fast general purpose language in the backend and are (very understandably) skeptical about C, I'd look at pl/java. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance costs of various PL languages
On 12/27/2011 05:54 PM, Merlin Moncure wrote: On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehulepavel.steh...@gmail.com wrote: Hello 2011/12/27 Carlo Stonebanksstonec.regis...@sympatico.ca: We are currently using pltclu as our PL of choice AFTER plpgSql. I'd like to know if anyone can comment on the performance costs of the various PL languages BESIDES C. For example, does pltclu instantiate faster than pltcl (presumably because it uses a shared interpreter?) Is Perl more lightweight? I know that everything depends on context - what you are doing with it, e.g. choose Tcl for string handling vs. Perl for number crunching - but for those who know about this, is there a clear performance advantage for any of the various PL languages - and if so, is it a difference so big to be worth switching? I ask this because I had expected to see pl/pgsql as a clear winner in terms of performance over pltclu, but my initial test showed the opposite. I know this may be an apples vs oranges problem and I will test further, but if anyone has any advice or insight, I would appreciate it so I can tailor my tests accordingly. A performance strongly depends on use case. PL/pgSQL has fast start but any expression is evaluated as simple SQL expression - and some repeated operation should be very expensive - array update, string update. PL/pgSQL is best as SQL glue. Positive to performance is type compatibility between plpgsql and Postgres. Interpret plpgsql is very simply - there are +/- zero optimizations - plpgsql code should be minimalistic, but when you don't do some really wrong, then a speed is comparable with PHP. http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language PL/Perl has slower start - but string or array operations are very fast. Perl has own expression evaluator - faster than expression evaluation in plpgsql. On second hand - any input must be transformed from postgres format to perl format and any result must be transformed too. Perl and other languages doesn't use data type compatible with Postgres. One big advantage pl/pgsql has over scripting languages is that it understands postgresql types natively. It knows what a postgres array is, and can manipulate one directly. pl/perl would typically have to have the database convert it to a string, parse it into a perl structure, do the manipulation, then send it to the database to be parsed again. If your procedure code is mainly moving data between tables and doing minimal intermediate heavy processing, this adds up to a big advantage. Which pl to go with really depends on what you need to do. pl/pgsql is always my first choice though. perl and tcl are not particularly fast languages in the general case -- you are largely at the mercy of how well the language's syntax or library features map to the particular problem you're solving. if you need a fast general purpose language in the backend and are (very understandably) skeptical about C, I'd look at pl/java. PLV8, which is not yet ready for prime time, maps many common Postgres types into native JS types without the use of Input/Output functions, which means the conversion is very fast. It's work which could very well do with repeating for the other PL's. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance costs of various PL languages
Thanks guys. Ah, Pl/java - of course. I would miss writing the code right in the SQL script, but that would have been true of C as well. None of these procedures really qualify as stored procs that move data; rather they are scalar functions used for fuzzy string comparisons based on our own domain logic - imagine something like, SELECT * FROM fathers AS f, sons AS s WHERE same_name(f.last_name, s.last_name) ... and same_name had business logic that corrected for O'reilly vs oreilly, Van De Lay vs Vandelay, etc. The point is that as we learn about the domain, we would add the rules into the function same_name() so that all apps would benefit from the new rules. Some of the functions are data-driven, for example a table of common abbreviations with regex or LIKE expressions that would be run against both strings so that each string is reduced to common abbreviations (i.e. lowest common denominator) then compared, e.g. SELECT * FROM companies AS c WHERE same_business_name(s, 'ACME Business Supplies, Incorporated') Would reduce both parameters down to the most common abbreviation and then compare again with fuzzy logic. Of course, even if this was written in C, the function would be data-bound as it read from the abbreviation table - unless you guys tell that there is a not inconsiderable cost involved in type conversion from PG to internal vars. Carlo -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: December 27, 2011 5:54 PM To: Pavel Stehule Cc: Carlo Stonebanks; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance costs of various PL languages On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello 2011/12/27 Carlo Stonebanks stonec.regis...@sympatico.ca: We are currently using pltclu as our PL of choice AFTER plpgSql. I'd like to know if anyone can comment on the performance costs of the various PL languages BESIDES C. For example, does pltclu instantiate faster than pltcl (presumably because it uses a shared interpreter?) Is Perl more lightweight? I know that everything depends on context - what you are doing with it, e.g. choose Tcl for string handling vs. Perl for number crunching - but for those who know about this, is there a clear performance advantage for any of the various PL languages - and if so, is it a difference so big to be worth switching? I ask this because I had expected to see pl/pgsql as a clear winner in terms of performance over pltclu, but my initial test showed the opposite. I know this may be an apples vs oranges problem and I will test further, but if anyone has any advice or insight, I would appreciate it so I can tailor my tests accordingly. A performance strongly depends on use case. PL/pgSQL has fast start but any expression is evaluated as simple SQL expression - and some repeated operation should be very expensive - array update, string update. PL/pgSQL is best as SQL glue. Positive to performance is type compatibility between plpgsql and Postgres. Interpret plpgsql is very simply - there are +/- zero optimizations - plpgsql code should be minimalistic, but when you don't do some really wrong, then a speed is comparable with PHP. http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL. 2FpgSQL_language PL/Perl has slower start - but string or array operations are very fast. Perl has own expression evaluator - faster than expression evaluation in plpgsql. On second hand - any input must be transformed from postgres format to perl format and any result must be transformed too. Perl and other languages doesn't use data type compatible with Postgres. One big advantage pl/pgsql has over scripting languages is that it understands postgresql types natively. It knows what a postgres array is, and can manipulate one directly. pl/perl would typically have to have the database convert it to a string, parse it into a perl structure, do the manipulation, then send it to the database to be parsed again. If your procedure code is mainly moving data between tables and doing minimal intermediate heavy processing, this adds up to a big advantage. Which pl to go with really depends on what you need to do. pl/pgsql is always my first choice though. perl and tcl are not particularly fast languages in the general case -- you are largely at the mercy of how well the language's syntax or library features map to the particular problem you're solving. if you need a fast general purpose language in the backend and are (very understandably) skeptical about C, I'd look at pl/java. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Subquery flattening causing sequential scan
=?UTF-8?Q?Ondrej_Ivani=C4=8D?= ondrej.iva...@gmail.com writes: The question is why the parameter f is not exposed as a GUC? What would that accomplish that default_statistics_target doesn't? (Other than being much harder to explain...) regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance