Re: [PERFORM] parse - bind take more time than execute

2011-12-27 Thread MirrorX
thx a lot for your answer :)

2) default_statistics_target is set to (default) 100 and there no special
statistics per-column

3) i will do that very soon

4) in absolute units i can see the same query having similar stats to these:
parse - 600 ms
bind - 300 ms
execute - 50 ms

the query mentioned above is a simple select from one table using using two
where conditions. and this table has 1 additional index (except the primary
key) on the columns that are part of the where clause

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5103116.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] How to clock the time spent for query parsing and planning?

2011-12-27 Thread Igor Schtein
I'd like to find some measurements/figures of query preparation and
planning time to justify the use of prepared statements and stored
procedures.
I know that complex queries have larger preparation time. Though, is it
possible to explicitly measure the time the optimizer spends parsing and
planning for query execution?

Thank you,
James


Re: [PERFORM] Exploring memory usage

2011-12-27 Thread Claudio Freire
On Sat, Dec 24, 2011 at 4:22 PM, Michael Smolsky sitr...@email.com wrote:
 work_mem = 128MB (tried 257MB, didn't change anything)

This is probably your problem.

Without an EXPLAIN output, I cannot be sure, but 'work_mem' is not the
total amount of memory a query can use, it's the amount of memory it
can use for *one* sort/hash/whatever operation. A complex query can
have many of those, so your machine is probably swapping due to
excessive memory requirements.

Try *lowering* it. You can do so only for that query, by executing:

set work_mem = '8MB'; your query

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] parse - bind take more time than execute

2011-12-27 Thread MirrorX
there are some performance issues on a server and by searching in the logs i
noticed that the phases of parse and bind take considerably more time than
execute for most of the queries. i guess that the right thing to do in this
case is to use functions or prepare statements but in any case, what could
be the cause of this?

information about the server-
-CentOS 5.6
-4-cores
-12GB ram


shared_buffers: 1 GB
temp_buffers = 100MB
work_mem : 30 MB
maintenance_mem: 512 MB

database_size: 1,5 GB
archive_mode is ON
vacuum/analyze (vacuum_scale_factor 0.1, analyze 0.05)


this behaviour is not related with checkpoints on the database (as indicated
by the logs, i dont see this latency when a checkpoint occurs, i see it most
of the time)

so my question is the following; what can cause the bind/parse phases to
take so much longer than the execute? if you need any more info the server i
ll be glad to provide it. thank you in advance for your advice

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102940.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] parse - bind take more time than execute

2011-12-27 Thread MirrorX
the version of postgres is 8.4.7 :)

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102954.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exploring memory usage

2011-12-27 Thread Scott Marlowe
On Sat, Dec 24, 2011 at 12:22 PM, Michael Smolsky sitr...@email.com wrote:
 shared_buffers = 2GB (tried 8GB, didn't change anything)
 work_mem = 128MB (tried 257MB, didn't change anything)

As someone mentioned, lower is better here.  128M is quite high.

 effective_cache_size = 12GB (tried 2GB didn't change anything)

This doesn't affect memory usage.  It only tells the planner about how
big the OS and pg caches are for the db.  It's a very coarse
adjustment knob, so don't get too worried about it.

 In order to resolve my issue, I tried to search for postgres profiling tools
 and found no relevant ones. This is rather disappointing. That's what I
 expected to find:

Look for pg_buffercache.  I'm sue there's some others I'm forgetting.
Grab a copy of Greg Smith's Performance PostgreSQL, it's got a lot of
great info in it on handling heavy load servers.

 I realize that postgres is a free software and one cannot demand new
 features from people who invest their own free time in developing and
 maintaining it. I am hoping that my feedback could be useful for future
 development.

It's not just free as in beer.  It's free as in do what you will with
it.  So, if you whip out your checkbook and start waving it around,
you can certainly pay someone to write the code to instrument this
stuff.  Whether you release it back into the wild is up to you.  But
yea, first see if someone's already done some work on that, like the
pg_bufffercache modules before spending money reinventing the wheel.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Exploring memory usage

2011-12-27 Thread Michael Smolsky
Hello,

 I'm running a fairly complex query on my postgres-8.4.9 Ubuntu box. The box 
has 8-core CPU, 18G of RAM and no virtualization layer. The query takes many 
hours to run. The query essentially involves a join of two large tables on a 
common string column, but it also includes joins with other smaller tables. I 
looked at the query plan, as produced by EXPLAIN and found it reasonable.

 When the query starts, Linux `top' shows fairly low RAM usage (in the hundreds 
of MB range) and about 100% CPU usage for the relevant postgres process. I'm 
also seeing some temp files being generated by this postgres process in the 
postgres temp directory. All this makes sense to me.

 As the query processes further, the memory usage by this postgres process 
shoots up to 12G resident and 17G virtual, while the CPU usage falls down to 
single-digit percents. The need to utilize more memory at some point during 
query execution seems in agreement with the query plan.

 I feel that my server configuration is not optimal: I would like to observe 
close to 100% CPU utilization on my queries, but seeing 20 times lower values.

 My query forks a single large-RAM process running on the server. There are 
other queries running on the same server, but they are quick and light on 
memory.

 I cannot explain the following observations:

 * Postgres is not writing temp files into its temp directory once the RAM 
usage goes up, but vmstat shows heavy disk usage, mostly the swap in field is 
high. Top shows 6G of swap space in use.

 * All my attempts to limit postgres' memory usage by playing with postgres 
config parameters failed.

 Here are the relevant parameters from postgresql.conf (I did use SHOW 
parameter to check that the parameters have been read by the server). I think 
I'm using the defaults for all other memory-related configurations.

 shared_buffers = 2GB (tried 8GB, didn't change anything)
 work_mem = 128MB (tried 257MB, didn't change anything)
 wal_buffers = 16MB
 effective_cache_size = 12GB (tried 2GB didn't change anything)

 In order to resolve my issue, I tried to search for postgres profiling tools 
and found no relevant ones. This is rather disappointing. That's what I 
expected to find:

 * A tool that could explain to me why postgres is swapping.

 * A tool that showed what kind of memory (work mem vs buffers, etc) was taking 
all that virtual memory space.

 * A tool for examining plans of the running queries. It would be helpful to 
see what stage of the query plan the server is stuck on (e.g. mark the query 
plans with some symbols that indicate currently running, completed, 
results in memory/disk, etc).

 I realize that postgres is a free software and one cannot demand new features 
from people who invest their own free time in developing and maintaining it. I 
am hoping that my feedback could be useful for future development.

 Thanks!


Re: [PERFORM] parse - bind take more time than execute

2011-12-27 Thread Filip Rembiałkowski
hello.

1. planning time  execute time, it can happen normally, for some
fast-executing queries, so it is not bad per se.

2. what are your statistics settings? they influence planning time. I
mean default_statistics_target and per-column SET STATISTICS?

3. upgrade to 8.4.10, it's quick upgrade (minimal downtime) and there
were some planner improvements.

4. what is considerably more time in absolute units?


Filip


2011/12/27 MirrorX mirr...@gmail.com:
 there are some performance issues on a server and by searching in the logs i
 noticed that the phases of parse and bind take considerably more time than
 execute for most of the queries. i guess that the right thing to do in this
 case is to use functions or prepare statements but in any case, what could
 be the cause of this?

 information about the server-
 -CentOS 5.6
 -4-cores
 -12GB ram


 shared_buffers: 1 GB
 temp_buffers = 100MB
 work_mem : 30 MB
 maintenance_mem: 512 MB

 database_size: 1,5 GB
 archive_mode is ON
 vacuum/analyze (vacuum_scale_factor 0.1, analyze 0.05)


 this behaviour is not related with checkpoints on the database (as indicated
 by the logs, i dont see this latency when a checkpoint occurs, i see it most
 of the time)

 so my question is the following; what can cause the bind/parse phases to
 take so much longer than the execute? if you need any more info the server i
 ll be glad to provide it. thank you in advance for your advice

 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102940.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] parse - bind take more time than execute

2011-12-27 Thread Pavel Stehule
Hello

2011/12/27 MirrorX mirr...@gmail.com:
 there are some performance issues on a server and by searching in the logs i
 noticed that the phases of parse and bind take considerably more time than
 execute for most of the queries. i guess that the right thing to do in this
 case is to use functions or prepare statements but in any case, what could
 be the cause of this?


A reason should be a blind optimization of prepared statement.
Prepared statements are optimized to most frequent values.

try to look on plan - statement EXPLAIN should be used for prepared
statements too.

Regards

Pavel Stehule

 information about the server-
 -CentOS 5.6
 -4-cores
 -12GB ram


 shared_buffers: 1 GB
 temp_buffers = 100MB
 work_mem : 30 MB
 maintenance_mem: 512 MB

 database_size: 1,5 GB
 archive_mode is ON
 vacuum/analyze (vacuum_scale_factor 0.1, analyze 0.05)


 this behaviour is not related with checkpoints on the database (as indicated
 by the logs, i dont see this latency when a checkpoint occurs, i see it most
 of the time)

 so my question is the following; what can cause the bind/parse phases to
 take so much longer than the execute? if you need any more info the server i
 ll be glad to provide it. thank you in advance for your advice

 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102940.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exploring memory usage

2011-12-27 Thread Claudio Freire
On Tue, Dec 27, 2011 at 1:00 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 He can lower it for just that query but honestly, even on a machine
 with much more memory I'd never set it as high as he has it.  On a
 busy machine with 128G RAM the max I ever had it set to was 16M, and
 that was high enough I kept a close eye on it (well, nagios did
 anway.)

I have it quite high, because I know the blend of queries going into
the server allows it.

But yes, it's not a sensible setting if you didn't analyze the
activity carefully.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exploring memory usage

2011-12-27 Thread Scott Marlowe
On Tue, Dec 27, 2011 at 8:33 AM, Claudio Freire klaussfre...@gmail.com wrote:
 On Sat, Dec 24, 2011 at 4:22 PM, Michael Smolsky sitr...@email.com wrote:
 work_mem = 128MB (tried 257MB, didn't change anything)

 This is probably your problem.

 Without an EXPLAIN output, I cannot be sure, but 'work_mem' is not the
 total amount of memory a query can use, it's the amount of memory it
 can use for *one* sort/hash/whatever operation. A complex query can
 have many of those, so your machine is probably swapping due to
 excessive memory requirements.

 Try *lowering* it. You can do so only for that query, by executing:

 set work_mem = '8MB'; your query

He can lower it for just that query but honestly, even on a machine
with much more memory I'd never set it as high as he has it.  On a
busy machine with 128G RAM the max I ever had it set to was 16M, and
that was high enough I kept a close eye on it (well, nagios did
anway.)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] parse - bind take more time than execute

2011-12-27 Thread MirrorX
i am not using prepared statements for now :)
i just said that probably, if i do use them, i will get rid of that extra
time since the plan will be already 'decided' in advance 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5103182.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to clock the time spent for query parsing and planning?

2011-12-27 Thread Pavel Stehule
Hello

2011/12/23 Igor Schtein ischt...@gmail.com:
 I'd like to find some measurements/figures of query preparation and planning
 time to justify the use of prepared statements and stored procedures.
 I know that complex queries have larger preparation time. Though, is it
 possible to explicitly measure the time the optimizer spends parsing and
 planning for query execution?

You can use time for EXPLAIN statement

Regards

Pavel Stehule


 Thank you,
 James

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Subquery flattening causing sequential scan

2011-12-27 Thread Jim Crate
PostgreSQL 9.0.2
Mac OS X Server 10.6.8
Autovacuum is on, and I have a script that runs vacuum analyze verbose every 
night along with the backup.

I have a situation where I'm experiencing a seq scan on a table with almost 3M 
rows when my condition is based on a subquery.  A google search turned up a way 
to prevent flattening the subquery into a join using OFFSET 0.  This does work, 
reducing the query from around 1s to around 250ms, most of which is the 
subquery.   

My question is why does it do a seq scan when it flattens this subquery into a 
JOIN?  Is it because the emsg_messages table is around 1M rows?  Are there some 
guidelines to when the planner will prefer not to use an available index?  I 
just had a look through postgresql.conf and noticed that I forgot to set 
effective_cache_size to something reasonable for a machine with 16GB of memory. 
 Would the default setting of 128MB cause this behavior?  I can't bounce the 
production server midday to test that change.



EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
SELECT id 
FROM emsg_messages msg 
WHERE msg.account_id = 314 AND msg.outgoing = FALSE 
  AND msg.message_type = 1 AND msg.spam_level  2 
  AND msg.deleted_at IS NULL  
  AND msg.id NOT IN (
SELECT emf.message_id 
FROM emsg_message_folders emf 
where emf.account_id = 314
)
)


QUERY PLAN  
Hash Semi Join  (cost=84522.74..147516.35 rows=49545 width=12) (actual 
time=677.058..1083.685 rows=2 loops=1)   
  Hash Cond: (ema.message_id = msg.id)  
  -  Seq Scan on emsg_message_addresses ema  (cost=0.00..53654.78 rows=2873478 
width=12) (actual time=0.020..424.241 rows=2875437 loops=1) 
  -  Hash  (cost=84475.45..84475.45 rows=3783 width=4) (actual 
time=273.392..273.392 rows=1 loops=1)   
Buckets: 1024  Batches: 1  Memory Usage: 1kB
-  Bitmap Heap Scan on emsg_messages msg  (cost=7979.35..84475.45 
rows=3783 width=4) (actual time=273.224..273.387 rows=1 loops=1) 
  Recheck Cond: (account_id = 314)  
  Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level 
 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))
  -  Bitmap Index Scan on index_emsg_messages_on_account_id  
(cost=0.00..867.98 rows=34611 width=0) (actual time=9.633..9.633 rows=34997 
loops=1)  
Index Cond: (account_id = 314)  
  SubPlan 1 
-  Bitmap Heap Scan on emsg_message_folders emf  
(cost=704.90..7022.51 rows=35169 width=4) (actual time=5.684..38.016 rows=34594 
loops=1)  
  Recheck Cond: (account_id = 314)  
  -  Bitmap Index Scan on 
index_emsg_message_folders_on_account_id  (cost=0.00..696.10 rows=35169 
width=0) (actual time=5.175..5.175 rows=34594 loops=1)   
Index Cond: (account_id = 314)  
Total runtime: 1083.890 ms  




EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
SELECT id 
FROM emsg_messages msg 
WHERE msg.account_id = 314 AND msg.outgoing = FALSE 
  AND msg.message_type = 1 AND msg.spam_level  2 
  AND msg.deleted_at IS NULL  
  AND msg.id NOT IN (
SELECT emf.message_id 
FROM emsg_message_folders emf 
where emf.account_id = 314
)
OFFSET 0
)


QUERY PLAN  
Nested Loop  (cost=84524.89..87496.74 rows=2619 width=12) (actual 
time=273.409..273.412 rows=2 loops=1) 
  -  HashAggregate  (cost=84524.89..84526.89 rows=200 width=4) (actual 
time=273.345..273.346 rows=1 loops=1)   
-  Limit  (cost=7979.36..84477.60 rows=3783 width=4) (actual 
time=273.171..273.335 rows=1 loops=1) 
  -  Bitmap Heap Scan on emsg_messages msg  
(cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.169..273.333 rows=1 
loops=1)   
Recheck Cond: (account_id = 314)
Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND 
(spam_level  2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))  
-  Bitmap Index Scan on index_emsg_messages_on_account_id  
(cost=0.00..867.99 rows=34612 width=0) (actual time=9.693..9.693 rows=34998 
loops=1)
  Index Cond: (account_id = 314)
SubPlan 1   
  -  Bitmap Heap Scan on emsg_message_folders emf  
(cost=704.90..7022.51 rows=35169 width=4) (actual time=5.795..39.420 rows=34594 
loops=1)
Recheck Cond: (account_id = 314)
-  Bitmap Index Scan on 
index_emsg_message_folders_on_account_id  (cost=0.00..696.10 rows=35169 

Re: [PERFORM] Subquery flattening causing sequential scan

2011-12-27 Thread Tom Lane
Jim Crate jim...@gmail.com writes:
 My question is why does it do a seq scan when it flattens this
 subquery into a JOIN?

Because it thinks there will be 3783 rows out of the msg scan, which if
true would make your desired nestloop join a serious loser.  You need to
see about getting that estimate to be off by less than three orders of
magnitude.  Possibly raising the stats target on emsg_messages would
help.  I'd also try converting the inner NOT IN into a NOT EXISTS, just
to see if that makes the estimate any better.  Using something newer
than 9.0.2 might help too, as we fixed some outer-join estimation bugs a
few months ago.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exploring memory usage

2011-12-27 Thread Scott Marlowe
On Tue, Dec 27, 2011 at 9:14 AM, Andrew Dunstan and...@dunslane.net wrote:
 It depends on the workload. Your 16M setting would make many of my clients'
 systems slow to an absolute crawl for some queries, and they don't run into
 swap issues, because we've made educated guesses about usage patterns.

Exactly.  I've had an old Pentium4 machine that did reporting and only
had 2G RAM with a 256M work_mem setting, while the heavily loaded
machine I mentioned earlier handles something on the order of several
hundred concurrent users and thousands of queries a second, and 16Meg
was a pretty big setting on that machine, but since most of the
queries were of the select * from sometable where pkid=123456 it
wasn't too dangerous.

It's all about the workload.  For that, we need more info from the OP.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exploring memory usage

2011-12-27 Thread Andrew Dunstan



On 12/27/2011 11:00 AM, Scott Marlowe wrote:

On Tue, Dec 27, 2011 at 8:33 AM, Claudio Freireklaussfre...@gmail.com  wrote:

On Sat, Dec 24, 2011 at 4:22 PM, Michael Smolskysitr...@email.com  wrote:

work_mem = 128MB (tried 257MB, didn't change anything)

This is probably your problem.

Without an EXPLAIN output, I cannot be sure, but 'work_mem' is not the
total amount of memory a query can use, it's the amount of memory it
can use for *one* sort/hash/whatever operation. A complex query can
have many of those, so your machine is probably swapping due to
excessive memory requirements.

Try *lowering* it. You can do so only for that query, by executing:

set work_mem = '8MB';your query

He can lower it for just that query but honestly, even on a machine
with much more memory I'd never set it as high as he has it.  On a
busy machine with 128G RAM the max I ever had it set to was 16M, and
that was high enough I kept a close eye on it (well, nagios did
anway.)




It depends on the workload. Your 16M setting would make many of my 
clients' systems slow to an absolute crawl for some queries, and they 
don't run into swap issues, because we've made educated guesses about 
usage patterns.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance costs of various PL languages

2011-12-27 Thread Carlo Stonebanks
We are currently using pltclu as our PL of choice AFTER plpgSql.

I'd like to know if anyone can comment on the performance costs of the
various PL languages BESIDES C. For example, does pltclu instantiate faster
than pltcl (presumably because it uses a shared interpreter?) Is Perl more
lightweight?

I know that everything depends on context - what you are doing with it, e.g.
choose Tcl for string handling vs. Perl for number crunching - but for those
who know about this, is there a clear performance advantage for any of the
various PL languages - and if so, is it a difference so big to be worth
switching?

I ask this because I had expected to see pl/pgsql as a clear winner in terms
of performance over pltclu, but my initial test showed the opposite. I know
this may be an apples vs oranges problem and I will test further, but if
anyone has any advice or insight, I would appreciate it so I can tailor my
tests accordingly.


Thanks,

Carlo




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance costs of various PL languages

2011-12-27 Thread Pavel Stehule
Hello

2011/12/27 Carlo Stonebanks stonec.regis...@sympatico.ca:
 We are currently using pltclu as our PL of choice AFTER plpgSql.

 I'd like to know if anyone can comment on the performance costs of the
 various PL languages BESIDES C. For example, does pltclu instantiate faster
 than pltcl (presumably because it uses a shared interpreter?) Is Perl more
 lightweight?

 I know that everything depends on context - what you are doing with it, e.g.
 choose Tcl for string handling vs. Perl for number crunching - but for those
 who know about this, is there a clear performance advantage for any of the
 various PL languages - and if so, is it a difference so big to be worth
 switching?

 I ask this because I had expected to see pl/pgsql as a clear winner in terms
 of performance over pltclu, but my initial test showed the opposite. I know
 this may be an apples vs oranges problem and I will test further, but if
 anyone has any advice or insight, I would appreciate it so I can tailor my
 tests accordingly.


A performance strongly depends on use case.

PL/pgSQL has fast start but any expression is evaluated as simple SQL
expression - and some repeated operation should be very expensive -
array update, string update. PL/pgSQL is best as SQL glue. Positive to
performance is type compatibility between plpgsql and Postgres.
Interpret plpgsql is very simply - there are +/- zero optimizations -
plpgsql code should be minimalistic, but when you don't do some really
wrong, then a speed is comparable with PHP.

http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language

PL/Perl has slower start - but string or array operations are very
fast. Perl has own expression evaluator - faster than expression
evaluation in plpgsql. On second hand - any input must be transformed
from postgres format to perl format and any result must be transformed
too. Perl and other languages doesn't use data type compatible with
Postgres.

Regards

Pavel Stehule



 Thanks,

 Carlo




 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery flattening causing sequential scan

2011-12-27 Thread Ondrej Ivanič
Hi,

On 28 December 2011 05:12, Tom Lane t...@sss.pgh.pa.us wrote:
 Possibly raising the stats target on emsg_messages would help.

In the function std_typanalyze() is this comment:

   /*
 * The following choice of minrows is based on the paper
 * Random sampling for histogram construction: how much is enough?
 * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
 * Proceedings of ACM SIGMOD International Conference on Management
 * of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
 * says that for table size n, histogram size k, maximum relative
 * error in bin size f, and error probability gamma, the minimum
 * random sample size is
 *  r = 4 * k * ln(2*n/gamma) / f^2
 * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
 *  r = 305.82 * k
 * Note that because of the log function, the dependence on n is
 * quite weak; even at n = 10^12, a 300*k sample gives = 0.66
 * bin size error with probability 0.99.  So there's no real need to
 * scale for n, which is a good thing because we don't necessarily
 * know it at this point.
 *
 */

The question is why the parameter f is not exposed as a GUC? Sometimes
it could make sense to have few bins with better estimation (for same
r).

-- 
Ondrej Ivanic
(ondrej.iva...@gmail.com)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance costs of various PL languages

2011-12-27 Thread Merlin Moncure
On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 2011/12/27 Carlo Stonebanks stonec.regis...@sympatico.ca:
 We are currently using pltclu as our PL of choice AFTER plpgSql.

 I'd like to know if anyone can comment on the performance costs of the
 various PL languages BESIDES C. For example, does pltclu instantiate faster
 than pltcl (presumably because it uses a shared interpreter?) Is Perl more
 lightweight?

 I know that everything depends on context - what you are doing with it, e.g.
 choose Tcl for string handling vs. Perl for number crunching - but for those
 who know about this, is there a clear performance advantage for any of the
 various PL languages - and if so, is it a difference so big to be worth
 switching?

 I ask this because I had expected to see pl/pgsql as a clear winner in terms
 of performance over pltclu, but my initial test showed the opposite. I know
 this may be an apples vs oranges problem and I will test further, but if
 anyone has any advice or insight, I would appreciate it so I can tailor my
 tests accordingly.


 A performance strongly depends on use case.

 PL/pgSQL has fast start but any expression is evaluated as simple SQL
 expression - and some repeated operation should be very expensive -
 array update, string update. PL/pgSQL is best as SQL glue. Positive to
 performance is type compatibility between plpgsql and Postgres.
 Interpret plpgsql is very simply - there are +/- zero optimizations -
 plpgsql code should be minimalistic, but when you don't do some really
 wrong, then a speed is comparable with PHP.

 http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language

 PL/Perl has slower start - but string or array operations are very
 fast. Perl has own expression evaluator - faster than expression
 evaluation in plpgsql. On second hand - any input must be transformed
 from postgres format to perl format and any result must be transformed
 too. Perl and other languages doesn't use data type compatible with
 Postgres.

One big advantage pl/pgsql has over scripting languages is that it
understands postgresql types natively.  It knows what a postgres array
is, and can manipulate one directly.  pl/perl would typically have to
have the database convert it to a string, parse it into a perl
structure, do the manipulation, then send it to the database to be
parsed again.  If your procedure code is mainly moving data between
tables and doing minimal intermediate heavy processing, this adds up
to a big advantage.  Which pl to go with really depends on what you
need to do.  pl/pgsql is always my first choice though.

perl and tcl are not particularly fast languages in the general case
-- you are largely at the mercy of how well the language's syntax or
library features map to the particular problem you're solving.  if you
need a fast general purpose language in the backend and are (very
understandably) skeptical about C, I'd look at pl/java.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance costs of various PL languages

2011-12-27 Thread Andrew Dunstan



On 12/27/2011 05:54 PM, Merlin Moncure wrote:

On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehulepavel.steh...@gmail.com  wrote:

Hello

2011/12/27 Carlo Stonebanksstonec.regis...@sympatico.ca:

We are currently using pltclu as our PL of choice AFTER plpgSql.

I'd like to know if anyone can comment on the performance costs of the
various PL languages BESIDES C. For example, does pltclu instantiate faster
than pltcl (presumably because it uses a shared interpreter?) Is Perl more
lightweight?

I know that everything depends on context - what you are doing with it, e.g.
choose Tcl for string handling vs. Perl for number crunching - but for those
who know about this, is there a clear performance advantage for any of the
various PL languages - and if so, is it a difference so big to be worth
switching?

I ask this because I had expected to see pl/pgsql as a clear winner in terms
of performance over pltclu, but my initial test showed the opposite. I know
this may be an apples vs oranges problem and I will test further, but if
anyone has any advice or insight, I would appreciate it so I can tailor my
tests accordingly.


A performance strongly depends on use case.

PL/pgSQL has fast start but any expression is evaluated as simple SQL
expression - and some repeated operation should be very expensive -
array update, string update. PL/pgSQL is best as SQL glue. Positive to
performance is type compatibility between plpgsql and Postgres.
Interpret plpgsql is very simply - there are +/- zero optimizations -
plpgsql code should be minimalistic, but when you don't do some really
wrong, then a speed is comparable with PHP.

http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language

PL/Perl has slower start - but string or array operations are very
fast. Perl has own expression evaluator - faster than expression
evaluation in plpgsql. On second hand - any input must be transformed
from postgres format to perl format and any result must be transformed
too. Perl and other languages doesn't use data type compatible with
Postgres.

One big advantage pl/pgsql has over scripting languages is that it
understands postgresql types natively.  It knows what a postgres array
is, and can manipulate one directly.  pl/perl would typically have to
have the database convert it to a string, parse it into a perl
structure, do the manipulation, then send it to the database to be
parsed again.  If your procedure code is mainly moving data between
tables and doing minimal intermediate heavy processing, this adds up
to a big advantage.  Which pl to go with really depends on what you
need to do.  pl/pgsql is always my first choice though.

perl and tcl are not particularly fast languages in the general case
-- you are largely at the mercy of how well the language's syntax or
library features map to the particular problem you're solving.  if you
need a fast general purpose language in the backend and are (very
understandably) skeptical about C, I'd look at pl/java.




PLV8, which is not yet ready for prime time, maps many common Postgres 
types into native JS types without the use of Input/Output functions, 
which means the conversion  is very fast. It's work which could very 
well do with repeating for the other PL's.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance costs of various PL languages

2011-12-27 Thread Carlo Stonebanks
Thanks guys. 

Ah, Pl/java - of course. I would miss writing the code right in the SQL
script, but that would have been true of C as well.

None of these procedures really qualify as stored procs that move data;
rather they are scalar functions used for fuzzy string comparisons based on
our own domain logic - imagine something like,

   SELECT *
   FROM fathers AS f, sons AS s
   WHERE same_name(f.last_name, s.last_name)

... and same_name had business logic that corrected for O'reilly vs oreilly,
Van De Lay vs Vandelay, etc.

The point is that as we learn about the domain, we would add the rules into
the function same_name() so that all apps would benefit from the new rules.

Some of the functions are data-driven, for example a table of common
abbreviations with regex or LIKE expressions that would be run against both
strings so that each string is reduced to common abbreviations (i.e. lowest
common denominator) then compared, e.g.

   SELECT *
   FROM companies AS c
   WHERE same_business_name(s, 'ACME Business Supplies, Incorporated')

Would reduce both parameters down to the most common abbreviation and then
compare again with fuzzy logic.

Of course, even if this was written in C, the function would be data-bound
as it read from the abbreviation table - unless you guys tell that there is
a not inconsiderable cost involved in type conversion from PG to internal
vars.

Carlo


-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: December 27, 2011 5:54 PM
To: Pavel Stehule
Cc: Carlo Stonebanks; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance costs of various PL languages

On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehule pavel.steh...@gmail.com
wrote:
 Hello

 2011/12/27 Carlo Stonebanks stonec.regis...@sympatico.ca:
 We are currently using pltclu as our PL of choice AFTER plpgSql.

 I'd like to know if anyone can comment on the performance costs of the
 various PL languages BESIDES C. For example, does pltclu instantiate
faster
 than pltcl (presumably because it uses a shared interpreter?) Is Perl
more
 lightweight?

 I know that everything depends on context - what you are doing with it,
e.g.
 choose Tcl for string handling vs. Perl for number crunching - but for
those
 who know about this, is there a clear performance advantage for any of
the
 various PL languages - and if so, is it a difference so big to be worth
 switching?

 I ask this because I had expected to see pl/pgsql as a clear winner in
terms
 of performance over pltclu, but my initial test showed the opposite. I
know
 this may be an apples vs oranges problem and I will test further, but if
 anyone has any advice or insight, I would appreciate it so I can tailor
my
 tests accordingly.


 A performance strongly depends on use case.

 PL/pgSQL has fast start but any expression is evaluated as simple SQL
 expression - and some repeated operation should be very expensive -
 array update, string update. PL/pgSQL is best as SQL glue. Positive to
 performance is type compatibility between plpgsql and Postgres.
 Interpret plpgsql is very simply - there are +/- zero optimizations -
 plpgsql code should be minimalistic, but when you don't do some really
 wrong, then a speed is comparable with PHP.


http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.
2FpgSQL_language

 PL/Perl has slower start - but string or array operations are very
 fast. Perl has own expression evaluator - faster than expression
 evaluation in plpgsql. On second hand - any input must be transformed
 from postgres format to perl format and any result must be transformed
 too. Perl and other languages doesn't use data type compatible with
 Postgres.

One big advantage pl/pgsql has over scripting languages is that it
understands postgresql types natively.  It knows what a postgres array
is, and can manipulate one directly.  pl/perl would typically have to
have the database convert it to a string, parse it into a perl
structure, do the manipulation, then send it to the database to be
parsed again.  If your procedure code is mainly moving data between
tables and doing minimal intermediate heavy processing, this adds up
to a big advantage.  Which pl to go with really depends on what you
need to do.  pl/pgsql is always my first choice though.

perl and tcl are not particularly fast languages in the general case
-- you are largely at the mercy of how well the language's syntax or
library features map to the particular problem you're solving.  if you
need a fast general purpose language in the backend and are (very
understandably) skeptical about C, I'd look at pl/java.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery flattening causing sequential scan

2011-12-27 Thread Tom Lane
=?UTF-8?Q?Ondrej_Ivani=C4=8D?= ondrej.iva...@gmail.com writes:
 The question is why the parameter f is not exposed as a GUC?

What would that accomplish that default_statistics_target doesn't?
(Other than being much harder to explain...)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance