Re: [PERFORM] strange pgbench results (as if blocked at the end)

2011-08-14 Thread tv
On Sun, 14 Aug 2011 07:15:00 -0600, Scott Marlowe
scott.marl...@gmail.com wrote:
 On Sun, Aug 14, 2011 at 6:51 AM,  t...@fuzzy.cz wrote:

 I've increased the test duration to 10 minutes, decreased the
 checkpoint timeout to 4 minutes and a checkpoint is issued just before
 the pgbench. That way the starting position should be more or less the
 same for all runs.
 
 Also look at increasing checkpoint completion target to something
 closer to 1. 0.8 is a nice starting place.

Yes, I've increased that already:

checkpoints_segments=64 
checkpoints_completion_target=0.9

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread tv
 Hi,

 DB : POSTGRES 8.4.8
 OS  : Debian
 HD : SAS 10k rpm

 Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

 After insert trigger is again calling 2 more trigger and insert record in
 another table depends on condition.

 with all trigger enable there are 8 insert and 32 updates(approx. update
 is
 depends on hierarchy)

Hi,

it's very difficult to give you reliable recommendations with this little
info, but the triggers are obviously the bottleneck. We have no idea what
queries are executed in them, but I guess there are some slow queries.

Find out what queries are executed in the triggers, benchmark each of them
and make them faster. Just don't forget that those SQL queries are
executed as prepared statements, so they may behave a bit differently than
plain queries. So use 'PREPARE' and 'EXPLAIN EXECUTE' to tune them.

 Plz explain multiple connections. Current scenario application server is
 sending all requests.

PostgreSQL does not support parallel queries (i.e. a query distributed on
multiple CPUs) so each query may use just a single CPU. If you're CPU
bound (one CPU is 100% utilized but the other CPUs are idle), you can
usually parallelize the workload on your own - just use multiple
connections.

But if you're using an application server and there are multiple
connections used, this is not going to help you. How many connections are
active at the same time? Are the CPUs idle or utilized?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Long Running Update - My Solution

2011-06-27 Thread tv
 The mystery remains, for me: why updating 100,000 records could complete
 in as quickly as 5 seconds, whereas an attempt to update a million
 records was still running after 25 minutes before we killed it?

Hi, there's a lot of possible causes. Usually this is caused by a plan
change - imagine for example that you need to sort a table and the amount
of data just fits into work_mem, so that it can be sorted in memory. If
you need to perform the same query with 10x the data, you'll have to sort
the data on disk. Which is way slower, of course.

And there are other such problems ...

 One thing remains crystal clear: I love Postgresql :-)

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-10 Thread tv
 If I had set the primary key to (diag_id, create_time) would simple
 queries on
 diag_id still work well i.e.
 select * from tdiag where diag_id = 1234;

Yes. IIRC the performance penalty for using non-leading column of an index
is negligible. But why don't you try that on your own - just run an
explain and you'll get an immediate answer if that works.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how much postgres can scale up?

2011-06-10 Thread tv
 I have a function in pgsql language, this function do some select to some
 tables for verify some conditions and then do one insert to a table with
 NO
 index. Update are not performed in the function

 When 1 client connected postgres do 180 execution per second
 With 2 clients connected postgres do 110 execution per second
 With 3 clients connected postgres do 90 execution per second

 Finally with 6 connected clients postgres do 60 executions per second
 (totally 360 executions per second)

 While testing, I monitor disk, memory and CPU and not found any overload.

There's always a bottleneck - otherwise the system might run faster (and
hit another bottleneck eventually). It might be CPU, I/O, memory, locking
and maybe some less frequent things.

 I know that with this information you can figure out somethigns, but in
 normal conditions, Is normal the degradation of performance per connection
 when connections are incremented?
 Or should I spect 180 in the first and something similar in the second
 connection? Maybe 170?


 The server is a dual xeon quad core with 16 GB of ram and a very fast
 storage
 The OS is a windows 2008 R2 x64

Might be, but we need more details about how the system works. On Linux
I'd ask for output from 'iostat -x 1' and 'vmstat 1' but you're on Windows
so there are probably other tools.

What version of PostgreSQL is this? What are the basic config values
(shared_buffers, work_mem, effective_cache_size, ...)? Have you done some
tuning? There's a wiki page about this:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Have you tried to log slow queries? Maybe there's one query that makes the
whole workload slow? See this:
http://wiki.postgresql.org/wiki/Logging_Difficult_Queries

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread tv
 What seems odd to me is that the only difference between the two is the
 limit
 clause:

 select * from tdiag where (create_time = '2011-06-03
 09:49:04.00+0' and create_time  '2011-06-06 09:59:04.00+0') order
 by
 diag_id limit 1;

 select * from tdiag where (create_time = '2011-06-03
 09:49:04.00+0' and create_time  '2011-06-06 09:59:04.00+0') order
 by
 diag_id;

 and yet the plan completely changes.

As Claudio Freire already pointed out, this is expected behavior. With
LIMIT the planner prefers plans with low starting cost, as it expects to
end soon and building index bitmap / hash table would be a waste. So
actually it would be very odd if the plan did not change in this case ...

Anyway I have no idea how to fix this clean - without messing with
enable_* or cost variables or other such dirty tricks.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread tv
 On Wednesday 08 June 2011 17:39, Claudio Freire wrote:
 Of course optimally executing a plan with limit is a lot different
 than one without.

 I imagined that limit just cuts out a slice of the query results.
 If it can find 8 rows in 0.5 seconds then I would have thought that
 returning just the first 100 of them should be just as easy.

But that's exactly the problem with LIMIT clause. The planner considers
two choices - index scan with this estimate

Index Scan using tdiag_pkey on tdiag  (cost=0.00..19114765.76
rows=1141019 width=114)

and bitmap index scan with this estimate

Bitmap Heap Scan on tdiag  (cost=25763.48..638085.13 rows=1141019
width=114)

and says - hey, the index scan has much lower starting cost, and I'm using
limit so it's much better! Let's use index scan. But then it finds out it
needs to scan most of the table and that ruins the performance.

Have you tried to create a composite index on those two columns? Not sure
if that helps but I'd try that.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Set of related slow queries

2011-06-08 Thread tv
 Thanks for including explain analyze output.

 Is there any chance you can pop the full explains (not just excerpts) in
 here:

 http://explain.depesz.com/

 ?

I believe he already did that - there's a link below each query.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Set of related slow queries

2011-06-08 Thread tv
--+-
  shared_buffers   | 16MB
  work_mem | 250MB

This seems a bit suspicious. Are you sure you want to keep the
shared_buffers so small and work_mem so large at the same time? There
probably are workloads where this is the right thing to do, but I doubt
this is the case. Why have you set it like this?

I don't have much experience with running Pg on AWS, but I'd try to
increase the shared buffers to say 512MB and decrease the work_mem to 16MB
(or something like that).

Undersized shared_buffers might actually be part of the problem - to
access a row, the page needs to be loaded into shared_buffers. Even though
the I/O is very fast (or the page is already in the filesystem page
cache), there's some locking etc. that needs to be done. When the cache is
small (e.g. 16MB) then the pages need to be removed and read again
frequently. This might be one of the reasons why the CPU is 100% utilized.

 SELECT   logparser_entry.id   ,
  logparser_entry.log_id   ,
  logparser_entry.encounter_id ,
  logparser_entry.entry_order  ,
  logparser_entry.timestamp,
  logparser_entry.seconds_since_start  ,
  logparser_entry.event_type   ,
  logparser_entry.actor_id ,
  logparser_entry.actor_relation   ,
  logparser_entry.target_id,
  logparser_entry.target_relation  ,
  logparser_entry.pet_owner_id ,
  logparser_entry.pet_owner_relation   ,
  logparser_entry.pet_target_owner_id  ,
  logparser_entry.pet_target_owner_relation,
  logparser_entry.ability_id   ,
  logparser_entry.effective_value  ,
  logparser_entry.blocked  ,
  logparser_entry.absorbed ,
  logparser_entry.overkill ,
  logparser_entry.overheal ,
  logparser_entry.total_value
 FROM logparser_entry
 WHERE(
   logparser_entry.log_id = 2
  AND  NOT
   (
(
 logparser_entry.actor_relation
 IN (E'Other',

 E'N/A')
AND  logparser_entry.actor_relation
 IS NOT NULL
)
   )
  AND  logparser_entry.event_type IN (E'Attack'  ,
  E'DoT Tick',
  E'Critical Attack')
  )
 ORDER BY logparser_entry.entry_order ASC
 LIMIT1
 http://explain.depesz.com/s/vEx

Well, the problem with this is that it needs to evaluate the whole result
set, sort it by entry_order and then get the 1st row. And there's no
index on entry_order, so it has to evaluate the whole result set and then
perform a traditional sort.

Try to create an index on the entry_order column - that might push it
towards index scan (to be honest I don't know if PostgreSQL knows it can
do it this way, so maybe it won't work).

 SELECT   (ROUND(logparser_entry.seconds_since_start / 42)) AS interval,
  SUM(logparser_entry.effective_value)  AS
 effective_value__sum
 FROM logparser_entry
 WHERE(
   logparser_entry.log_id = 2
  AND  NOT
   (
(
 logparser_entry.actor_relation
 IN (E'Other',

 E'N/A')
AND  logparser_entry.actor_relation
 IS NOT NULL
)
   )
  AND  logparser_entry.event_type IN (E'Attack'  ,
  E'DoT Tick',
  E'Critical Attack')
  )
 GROUP BY (ROUND(logparser_entry.seconds_since_start / 42)),
  ROUND(logparser_entry.seconds_since_start  / 42)
 ORDER BY interval ASC
 http://explain.depesz.com/s/Rhb

Hm, this is probably the best plan possible - not sure how to make it
faster. I'd expect a better performance with larger shared_buffers.

 http://explain.depesz.com/s/JUo

Same as above. Good plan, maybe increase shared_buffers?

 http://explain.depesz.com/s/VZA

Same as above. Good plan, maybe increase shared_buffers.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-07 Thread tv
 Version: PostgreSQL 8.3.5 (mammoth replicator)

 Schema:

 CREATE TABLE tdiag (
 diag_id integer DEFAULT nextval('diag_id_seq'::text),
 create_time   timestamp with time zone default now(), /* time 
 this
 record
 was created */
 diag_time   timestamp with time zone not null,
 device_id   integer,/* optional */
 fleet_idinteger,/* optional */
 customer_id integer,/* optional */
 module  character varying,
 node_kind   smallint,
 diag_level  smallint,
 tag character varying not null default '',
 message character varying not null default '',
 options text,

 PRIMARY KEY (diag_id)
 );

 create index tdiag_create_time   ON tdiag(create_time);

 The number of rows is over 33 million with time stamps over the past two
 weeks.

 The create_time order is almost identical to the id order.  What I want
 to find is the first or last entry by id in a given time range. The
 query I am having a problem with is:

Hi,

why are you reposting this? Pavel Stehule already recommended you to run
ANALYZE on the tdiag table - have you done that? What was the effect?

The stats are off - e.g. the bitmap scan says

   -  Bitmap Heap Scan on tdiag  (cost=25763.48..638085.13 rows=1141019
width=114) (actual time=43.232..322.441 rows=86530 loops=1)

so it expects to get 1141019 rows but it gets 86530, i.e. about 7% of the
expected number. That might be enough to cause bad plan choice and thus
performance issues.

And yet another recommendation - the sort is performed on disk, so give it
more work_mem and it should be much faster (should change from merge
sort to quick sort). Try something like work_mem=20MB and see if it
does the trick.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-13 Thread tv
 It's not synchronous at all.  The clients create a temporary file for
 the statistics collector and move on.  The actual statistics don't get
 updated until the statistics collector decides enough time has passed to
 bother, which defaults to at most every 500ms.

Really? I thought the clients send the updates using a socket, at least
that's what I see in backend/postmaster/pgstat.c (e.g. in
pgstat_send_bgwriter where the data are sent, and in PgstatCollectorMain
where it's read from the socket and applied).

But no matter how exactly this works, this kind of stats has nothing to do
with ANALYZe - it's asynchronously updated every time you run a query.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.2.13 commit is taking too much time

2011-05-10 Thread tv

 Any idea how to improve the performance?

Hmmm, I guess we'll need more info about resource usage (CPU, I/O, locks)
used when the commit happens. Run these two commands

$ iostat -x 1
$ vmstat 1

and then execute the commit. See what's causing problems. Is the drive
utilization close to 100%? You've problems with disks (I'd bet this is the
cause). Etc.

There's a very nice chapter about this in Greg's book.

BTW what filesystem are you using? Ext3, ext4, reiserfs, xfs? I do
remember there were some problems with sync, that some filesystems are
unable to sync individual files and always sync everything (which is going
to suck if you want to sync just the WAL).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX takes half a day (and still not complete!)

2011-04-18 Thread tv
 Thanks. But let me do the top stuff later. I think I have a bigger
 problem now.

 While doing a PG dump, I seem to get this error:

 ERROR: invalid memory alloc request size 4294967293

 Upon googling, this seems to be a data corruption issue!

 One of the older messages suggests that I do file level backup and
 restore the data -
 http://archives.postgresql.org/pgsql-admin/2008-05/msg00191.php

 How does one do this -- should I copy the data folder? What are the
 specific steps to restore from here, would I simply copy the files
 from the data folder back to the new install or something? Cant find
 these steps in the PG documentation.

Just stop the database, and copy the 'data' directory somewhere else (to a
different machine prefferably). You can then start the database from this
directory copy (not sure how that works in CentOS, but you can always run
postmaster -D directory).


 I'm on PG 8.2.9, CentOS 5, with 8GB of RAM.


This is a massive thread (and part of the important info is in another
thread other mailing lists), so maybe I've missed something important, but
it seems like:

1) You're I/O bound (according to the 100% utilization reported by iostat).

2) Well, you're running RAID1 setup, which basically means it's 1 drive
(and you're doing reindex, which means a lot of read/writes).

3) The raid controller should handle this, unless it's broken, the battery
is empty (and thus the writes are not cached) or something like that. I'm
not that familiar with 3ware - is there any diagnostic tool that you use
to check the health of the controller / drives?

4) I know you've mentioned there is no bloat (according to off-the-list
discussion with Merlin) - is this true for the table only? Because if the
index is not bloated, then there's no point in running reindex ...

BTW what is the size of the database and that big table? I know it's 125
million rows,  but how much is that? 1GB, 1TB, ... how much? What does
this return

  SELECT reltuples FROM pg_class WHERE relname = 'links';

Do you have any pg_dump backups? What size are they, compared to the live
database? Havou you tried to rebuild the database from these backups? That
would give you a fresh indexes, so you could see how a 'perfectly clean'
database looks (whether the indexes bloated, what speed is expected etc.).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-14 Thread tv
 On Thu, Apr 14, 2011 at 1:26 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Workload A: Touches just a very small portion of the database, to the
 'active' part actually fits into the memory. In this case the cache hit
 ratio can easily be close to 99%.

 Workload B: Touches large portion of the database, so it hits the drive
 very often. In this case the cache hit ratio is usually around RAM/(size
 of the database).

 You've answered it yourself without even realized it.

 This particular factor is not about an abstract and opaque Workload
 the server can't know about. It's about cache hit rate, and the server
 can indeed measure that.

OK, so it's not a matter of tuning random_page_cost/seq_page_cost? Because
tuning based on cache hit ratio is something completely different (IMHO).

Anyway I'm not an expert in this field, but AFAIK something like this
already happens - btw that's the purpose of effective_cache_size. But I'm
afraid there might be serious fail cases where the current model works
better, e.g. what if you ask for data that's completely uncached (was
inactive for a long time). But if you have an idea on how to improve this,
great - start a discussion in the hackers list and let's see.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-13 Thread tv
 Thomas,

 Thank you for your very detailed and well written description. In
 conclusion, I should keep my random_page_cost (3.0) to a value more than
 seq_page_cost (1.0)? Is this bad practice or will this suffice for my
 setup (where the database is much bigger than the RAM in the system)? Or
 is this not what you are suggesting at all?

Yes, keep it that way. The fact that 'random_page_cost = seq_page_cost'
generally means that random reads are more expensive than sequential
reads. The actual values are  dependent but 4:1 is usually OK, unless your
db fits into memory etc.

The decrease of performance after descreasing random_page_cost to 3 due to
changes of some execution plans (the index scan becomes slightly less
expensive than seq scan), but in your case it's a false assumption. So
keep it at 4 (you may even try to increase it, just to see if that
improves the performance).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance problem with LIMIT (order BY in DESC order). Wrong index used?

2011-04-12 Thread tv
 On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
 dieter.rehb...@skiline.cc wrote:
 I just executed a VACUUM ANALYZE and now everything performs well. hm,
 strange.

 That probably means you need more statistics - try increasing the
 newsfeed's statistics target count.

 ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS n;

 Try different n numbers, you can crank it up to 4000 or perhaps more
 in 9.0, but you should start lower I guess.

AFAIK the max value is 1 and the default is 100. Higher numbers mean
higher overhead, so do not jump to 1 directly. Set it to 1000 and see
if that helps, etc.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query postgres 8.3

2011-04-09 Thread tv
 Hi,

 I am trying to tune a query that is taking too much time on a large
 dataset (postgres 8.3).


Hi, run ANALYZE on the tables used in the query - the stats are very off,
so the db chooses a really bad execution plan.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-04-06 Thread tv
 some additional info.
 the table inventory is about 4481 MB and also has postgis types.
 the table gran_ver is about 523 MB
 the table INVSENSOR is about 217 MB

 the server itself has 32G RAM with the following set in the postgres conf
 shared_buffers = 3GB
 work_mem = 64MB
 maintenance_work_mem = 512MB
 wal_buffers = 6MB

Not sure how to improve the query itself - it's rather simple and the
execution plan seems reasonable. You're dealing with a lot of data, so it
takes time to process.

Anyway, I'd try to bump up the shared buffers a bit (the tables you've
listed have about 5.5 GB, so 3GB of shared buffers won't cover it). OTOH
most of the data will be in pagecache maintained by the kernel anyway.

Try to increase the work_mem a bit, that might speed up the hash joins
(the two hash joins consumed about 15s, the whole query took 17s). This
does not require a restart, just do

set work_mem = '128MB'

(or 256MB) and then run the query in the same session. Let's see if that
works.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Performance Tuning

2011-04-04 Thread tv
 max_connections = 700
 shared_buffers = 4096MB
 temp_buffers = 16MB
 work_mem = 64MB
 maintenance_work_mem = 128MB
 wal_buffers = 32MB
 checkpoint_segments = 32
 random_page_cost = 2.0
 effective_cache_size = 4096MB

First of all, there's no reason to increase wal_buffers above 32MB. AFAIK
the largest sensible value is 16MB - I doubt increasing it further will
improve performance.

Second - effective_cache_size is just a hint how much memory is used by
the operating system for filesystem cache. So this does not influence
amount of allocated memory in any way.

 but Still Postgres Server uses Swap Memory While SELECT  INSERT into
 database tables.

Are you sure it's PostgreSQL. What else is running on the box? Have you
analyzed why the SQL queries are slow (using EXPLAIN)?

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Performance Tuning

2011-04-04 Thread tv


 Thanks Scott :

 My iostat package is not installed but have a look on below output:

 [root@s8-mysd-2 8.4SS]# vmstat 10
 procs ---memory-- ---swap-- -io --system--
 -cpu--
  r  b   swpd   free   buff  cache   si   sobibo   incs us sy
 id wa st
  1  0 147664  93920  72332 1558074801   113   170   47   177  6
 1 92  1  0
  0  0 147664  94020  72348 1558074800 0 4  993   565  0
 0 100  0  0
  0  0 147664  93896  72364 1558074800 0 5  993   571  0
 0 100  0  0
  0  0 147664  93524  72416 1558086000 0   160 1015   591  0
 0 100  0  0
  0  0 147664  93524  72448 1558086000 0 8 1019   553  0
 0 100  0  0
  0  0 147664  93648  72448 1558086000 0 0 1019   555  0
 0 100  0  0
  0  0 147664  93648  72448 1558086000 0 3 1023   560  0
 0 100  0  0

Is this from a busy or idle period? I guess it's from an idle one, because
the CPU is 100% idle and there's very little I/O activity. That's useless
- we need to see vmstat output from period when there's something wrong.

 [root@s8-mysd-2 8.4SS]# iostat
 -bash: iostat: command not found
 [root@s8-mysd-2 8.4SS]#

Then install it. Not sure what distro you use, but it's usually packed in
sysstat package.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread tv
 I just want to retrieve that id 's from page_content which do not have
 any entry in clause2 table.

In that case the query probably does not work (at least the query you've
sent in the first post) as it will return even those IDs that have at
least one other row in 'clause2' (not matching the != condition). At least
that's how I understand it.

So instead of this

select distinct(p.crawled_page_id)
from page_content p, clause2 c where p.crawled_page_id != c.source_id ;

you should probably do this

select distinct(p.crawled_page_id)
from page_content p left join clause2 c on (p.crawled_page_id =
c.source_id) where (c.source_id is null);

I guess this will be much more efficient too.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread tv

 Actually the plans are equal, so I suppose it depends on what were
 run first :). Slow query operates with data mostly on disk, while
 fast one with data in memory.

 yeah. maybe the easiest way, is to start a fresh session and fire the
 queries.


 After the fresh start , the results obtained are :

As Chetan Suttraway already pointed out, the execution plans are exactly
the same. And by excactly I mean there's no difference in evaluating
those two queries.

The difference is due to cached data - not just in shared buffers (which
will be lost of postgres restart) but also in filesystem cache (which is
managed by kernel, not postgres).

So the first execution had to load (some of) the data into shared buffers,
while the second execution already had a lot of data in shared buffers.
That's why the first query run in 7.7sec while the second 6.2sec.

 This seems a slight upper hand of the second query .

Again, there's no difference between those two queries, they're exactly
the same. It's just a matter of which of them is executed first.

 Would it be possible to tune it further.

I don't think so. The only possibility I see is to add a flag into
page_content table, update it using a trigger (when something is
inserted/deleted from clause2). Then you don't need to do the join.

 My postgresql.conf parameters are as follows : ( Total RAM = 16 GB )

 shared_buffers = 4GB
 max_connections=700
 effective_cache_size = 6GB
 work_mem=16MB
 maintenance_mem=64MB

 I think to change

 work_mem=64MB
 maintenance_mem=256MB

 Does it has some effects now.

Generally a good idea, but we don't know if there are other processes
running on the same machine and what kind of system is this (how many
users are there, what kind of queries do they run). If there's a lot of
users, keep work_mem low. If there's just a few users decrease
max_connections and bump up work_mem and consider increasing
shared_buffers.

Maintenance_work_mem is used for vacuum/create index etc. so it really
does not affect regular queries.

Some of those values (e.g. work_mem/maintenance_work_mem) are dynamic, so
you can set them for the current connection and see how it affects the
queries.

Just do something like

db=# SET work_mem='32MB'
db=# EXPLAIN ANALYZE SELECT ...

But I don't think this will improve the query we've been talking about.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread tv
 On 03/23/2011 04:17 AM, Adarsh Sharma wrote:

 explain analyze select distinct(p.crawled_page_id) from page_content
 p where NOT EXISTS (select 1 from clause2 c where c.source_id =
 p.crawled_page_id);

 You know... I'm surprised nobody has mentioned this, but DISTINCT is
 very slow unless you have a fairly recent version of Postgres that
 replaces it with something faster. Try this:

Nobody mentioned that because the explain plan already uses hash aggregate
(instead of the old sort)

 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual
time=7047.259..7050.261 rows=72 loops=1)

which means this is at least 8.4. Plus the 'distinct' step uses less than
1% of total time, so even if you improve it the impact will be minimal.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] buffercache/bgwriter

2011-03-23 Thread tv
 Hi,

 I have very bad bgwriter statistics on a server which runs since many
 weeks
 and it is still the same after a recent restart.
 There are roughly 50% of buffers written by the backend processes and the
 rest by checkpoints.
 The statistics below are from a server with 140GB RAM, 32GB shared_buffers
 and a runtime of one hour.

 As you can see in the pg_buffercache view that there are most buffers
 without usagecount - so they are as free or even virgen as they can be.
 At the same time I have 53% percent of the dirty buffers written by the
 backend process.

There are some nice old threads dealing with this - see for example

http://postgresql.1045698.n5.nabble.com/Bgwriter-and-pg-stat-bgwriter-buffers-clean-aspects-td2071472.html

http://postgresql.1045698.n5.nabble.com/tuning-bgwriter-in-8-4-2-td1926854.html

and there even some nice external links to more detailed explanation

http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with Query Tuning

2011-03-18 Thread tv
 Thanks , it works now .. :-)

 Here is the output :

 pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like
 '%2010%' and
 pdc_uima-# content_language='en' and content is not null and
 isprocessable = 1 and
 pdc_uima-# to_tsvector('english',content) @@
 to_tsquery('english','Mujahid' || ' | '
 pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' |
 '
 pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' ||
 ' | '
 pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' ||
 'crpf' || ' | '
 pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );

  count
 
  137193
 (1 row)

 Time: 195441.894 ms


 But my original query is to use AND also i.e

Hi, just replace AND and OR (used with LIKE operator) for  and |
(used with to_tsquery).

So this

(content like '%Militant%' OR content like '%jihad%') AND (content like
'%kill%' OR content like '%injure%')

becomes

to_tsvector('english',content) @@ to_tsquery('english', '(Militant |
jihad)  (kill | injure)')

BTW it seems you somehow believe you'll get exactly the same result from
those two queries (LIKE vs. tsearch) - that's false expectation. I believe
the fulltext query is much better and more appropriate in this case, just
don't expect the same results.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with Query Tuning

2011-03-17 Thread tv
 *Modified Query :-

 *SELECT count(*)  from page_content
 WHERE publishing_date like '%2010%' and content_language='en' and
 content is not null and isprocessable = 1 and
 to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' ||
 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' ||
 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' ||
 'dsf' || 'ssb');

I guess there should be spaces between the words. This way it's just one
very long word 'MujahidjihadMilitantfedayeen' and I doubt that's what
you're looking for.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Two different execution plans for similar requests

2011-03-01 Thread tv
Hi, and why do you think this is a problem?

The explain plan is expected to change for different parameter values,
that's OK. The merge in the first query is expected to produce
significantly more rows (91774) than the other one (229). That's why the
second query chooses nested loop instead of merge join ...

But it's difficult to say if those plans are OK, as you have posted just
EXPLAIN output - please, provide 'EXPLAIN ANALYZE' output so that we can
see if the stats are off.

regards
Tomas

 *Hi all !

 Postgresql (8.2) has as a strange behaviour in some of my environments.
 *
 *A request follows two execution plans ( but not always !!! ). I encounter
 some difficulties to reproduce the case.*

 *J-2*
 Aggregate  (*cost=2323350.24..2323350.28 rows=1 width=24*)
   -  Merge Join  (cost=2214044.98..2322432.49 rows=91774 width=24)
 Merge Cond: ((azy_header.txhd_azy_nr = azy_detail.txhd_azy_nr) AND
 ((azy_header.till_short_desc)::text = inner.?column8?) AND
 ((azy_header.orgu_xxx)::text = inner.?column9?) AND
 ((azy_header.orgu_xxx_cmpy)::text = inner.?column10?))
 -  Sort  (cost=409971.56..410050.39 rows=31532 width=77)
   Sort Key: azy_queue.txhd_azy_nr,
 (azy_queue.till_short_desc)::text, (azy_queue.orgu_xxx)::text,
 (azy_queue.orgu_xxx_cmpy)::text
   -  Nested Loop  (cost=0.00..407615.41 rows=31532 width=77)
 -  Nested Loop  (cost=0.00..70178.58 rows=52216
 width=46)
   Join Filter: (((azy_queue.orgu_xxx_cmpy)::text =
 (firma_session.orgu_xxx_cmpy)::text) AND ((azy_queue.orgu_xxx)::text =
 (firma_session.orgu_xxx)::text))
   -  Seq Scan on firma_session
 (cost=0.00..599.29
 rows=401 width=25)
 Filter: ((cssn_trading_date =
 '20110226'::bpchar) AND (cssn_trading_date = '20110226'::bpchar))
   -  Index Scan using azyq_ix2 on azy_queue
 (cost=0.00..165.92 rows=434 width=41)
 Index Cond: (azy_queue.cssn_session_id =
 firma_session.cssn_session_id)
 -  Index Scan using txhd_pk on azy_header
 (cost=0.00..6.44 rows=1 width=31)
   Index Cond: (((azy_queue.orgu_xxx_cmpy)::text =
 (azy_header.orgu_xxx_cmpy)::text) AND ((azy_queue.orgu_xxx)::text =
 (azy_header.orgu_xxx)::text) AND ((azy_queue.till_short_desc)::text =
 (azy_header.till_short_desc)::text) AND (azy_queue.txhd_azy_nr =
 azy_header.txhd_azy_nr))
   Filter: (txhd_voided = 0::numeric)
 -  Sort  (cost=1804073.42..1825494.05 rows=8568252 width=55)
   Sort Key: azy_detail.txhd_azy_nr,
 (azy_detail.till_short_desc)::text, (azy_detail.orgu_xxx)::text,
 (azy_detail.orgu_xxx_cmpy)::text
   -  Seq Scan on azy_detail  (cost=0.00..509908.30
 rows=8568252
 width=55)
 Filter: (txde_item_void = 0::numeric)



 *J-1*
 Aggregate  (*cost=10188.38..10188.42 rows=1 width=24*)
   -  Nested Loop  (cost=0.00..10186.08 rows=229 width=24)
 -  Nested Loop  (cost=0.00..2028.51 rows=79 width=77)
   -  Nested Loop  (cost=0.00..865.09 rows=130 width=46)
 Join Filter: (((azy_queue.orgu_xxx_cmpy)::text =
 (firma_session.orgu_xxx_cmpy)::text) AND ((azy_queue.orgu_xxx)::text =
 (firma_session.orgu_xxx)::text))
 -  Seq Scan on firma_session  (cost=0.00..599.29
 rows=1
 width=25)
   Filter: ((cssn_trading_date =
 '20110227'::bpchar)
 AND (cssn_trading_date = '20110227'::bpchar))
 -  Index Scan using azyq_ix2 on azy_queue
 (cost=0.00..258.20 rows=434 width=41)
   Index Cond: (azy_queue.cssn_session_id =
 firma_session.cssn_session_id)
   -  Index Scan using txhd_pk on azy_header  (cost=0.00..8.93
 rows=1 width=31)
 Index Cond: (((azy_queue.orgu_xxx_cmpy)::text =
 (azy_header.orgu_xxx_cmpy)::text) AND ((azy_queue.orgu_xxx)::text =
 (azy_header.orgu_xxx)::text) AND ((azy_queue.till_short_desc)::text =
 (azy_header.till_short_desc)::text) AND (azy_queue.txhd_azy_nr =
 azy_header.txhd_azy_nr))
 Filter: (txhd_voided = 0::numeric)
 -  Index Scan using txde_pk on azy_detail  (cost=0.00..102.26
 rows=50 width=55)
   Index Cond: (((azy_detail.orgu_xxx_cmpy)::text =
 (azy_header.orgu_xxx_cmpy)::text) AND ((azy_detail.orgu_xxx)::text =
 (azy_header.orgu_xxx)::text) AND ((azy_detail.till_short_desc)::text =
 (azy_header.till_short_desc)::text) AND (azy_detail.txhd_azy_nr =
 azy_header.txhd_azy_nr))
   Filter: (txde_item_void = 0::numeric)



 *
 Where shall I investigate ?*
 Thanks for your help




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Two different execution plans for similar requests

2011-03-01 Thread tv
 I've already used an 'EXPLAIN ANALYZE' to post the message. So I don't
 clearly understand what you are expecting for, when you tell me to provide
 'EXPLAIN ANALYZE'  (please excuse me for the misunderstood)

No, you haven't. You've provided 'EXPLAIN' output, but that just prepares
an execution plan and displays it. So it shows just estimates of row
counts etc. and not actual values.

Do the same thing but use 'EXPLAIN ANALYZE' instead of 'EXPLAIN' - it will
run the query and provide more details about it (run time for each node,
actual number of rows etc.).

Anyway the sudden changes of estimated costs are suspicious ...

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which gives good performance? separate database vs separate schema

2010-11-25 Thread tv
Hello,

 Now, should I put these tables in 1 Database's different schemas or in
 separate
 databases itself for good performance?
 I am using libpq for connection.

 Pictorial Representation:

 Process1 - DB1.schema1.table1

 Process2 - DB1.schema2.table1

   Vs.

 Process1 - DB1.default.table1

 Process2 - DB2.default.table1

 Which one is better?

Well, that depends on what you mean by database. In many other products
each database is completely separate (with it's own cache, processes etc).
In PostgreSQL, there's a cluster of databases, and all of them share the
same cache (shared buffers) etc.

I don't think you'll get performance improvement from running two
PostgreSQL clusters (one for DB1, one for DB2). And when running two
databases within the same cluster, there's no measurable performance
difference AFAIK.

So the two options are exactly the same.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which gives good performance? separate database vs separate schema

2010-11-25 Thread tv
 I don't think it will make a big difference in performance.

 The real question is: do you need queries that cross boundaries? If that
 is the case you have to use schema, because Postgres does not support
 cross-database queries.

Well, there's dblink contrib module, but that won't improve performance.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which gives good performance? separate database vs separate schema

2010-11-25 Thread tv
 On Thursday 25 November 2010 13:02:08 t...@fuzzy.cz wrote:
 I don't think you'll get performance improvement from running two
 PostgreSQL clusters (one for DB1, one for DB2). And when running two
 databases within the same cluster, there's no measurable performance
 difference AFAIK.
 That one is definitely not true in many circumstances. As soon as you
 start to
 hit contention (shared memory, locks) you may very well be better of with
 two
 separate clusters.

 Andres

Good point, I forgot about that. Anyway it's hard to predict what kind of
performance issue he's facing and whether two clusters would fix it.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which gives good performance? separate database vs separate schema

2010-11-25 Thread tv
 I am not facing any issues, but yes I want to have optimal performance for
 SELECT and INSERT, especially when I am doing these ops repeatedly.
 Actually I am porting from Oracle to PG. Oracle starts a lot of processes
 when
 it needs to run many schemas. I do not think PG would need much more
 resources
 (mem, cpu) if I go for different database for each process..? Also, is
 there any
 limit on number of databases I can start using a PG server?

Hm, I would try to run that using single cluster, and only if that does
not perform well I'd try multiple clusters. Yes, Oracle starts a lot of
processes for an instance, and then some processes for each connection.

But again - in PostgreSQL, you do not start databases. You start a
cluster, containing databases and then there are connections. This is
similar to Oracle where you start instances (something like cluster in
PostgreSQL) containing schemas (something like databases in PostgreSQL).
And then you create connections, which is the object consuming processes
and memory.

PostgreSQL will create one process for each connection (roughly the same
as Oracle in case of dedicated server). And yes, the number of connections
is limited - see max_connections parameter in postgresql.conf.

Tomas



  Best Regards,
 Divakar




 
 From: t...@fuzzy.cz t...@fuzzy.cz
 To: Andres Freund and...@anarazel.de
 Cc: pgsql-performance@postgresql.org; t...@fuzzy.cz; Divakar Singh
 dpsma...@yahoo.com
 Sent: Thu, November 25, 2010 5:55:33 PM
 Subject: Re: [PERFORM] Which gives good performance? separate database vs
 separate schema

 On Thursday 25 November 2010 13:02:08 t...@fuzzy.cz wrote:
 I don't think you'll get performance improvement from running two
 PostgreSQL clusters (one for DB1, one for DB2). And when running two
 databases within the same cluster, there's no measurable performance
 difference AFAIK.
 That one is definitely not true in many circumstances. As soon as you
 start to
 hit contention (shared memory, locks) you may very well be better of
 with
 two
 separate clusters.

 Andres

 Good point, I forgot about that. Anyway it's hard to predict what kind of
 performance issue he's facing and whether two clusters would fix it.

 regards
 Tomas


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance







-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-22 Thread tv


 Correct, the optimizer did not take the settings with the pg_ctl reload
 command. I did a pg_ctl restart and work_mem now displays the updated
 value. I had to bump up all the way to 2047 MB to get the response below
 (with work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB
 (which is the max value that can be set for work_mem - anything more than
 that results in a FATAL error because of the limit) the results are below.

Hm, can you post explain plan for the case work_mem=1024MB. I guess the
difference is due to caching. According to the explain analyze, there are
just cache hits, no reads.

Anyway the hash join uses only about 40MB of memory, so 1024MB should be
perfectly fine and the explain plan should be exactly the same as with
work_mem=2047MB. And the row estimates seem quite precise, so I don't
think there's some severe overestimation.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-22 Thread tv
 I believe you can set work_mem to a different value just for the duration
 of
 a single query, so you needn't have work_mem set so high if for every
 query
 on the system.  A single query may well use a multiple of work_mem, so you
 really probably don't want it that high all the time unless all of your
 queries are structured similarly.  Just set work_mem='2047MB'; query;
 reset
 all;

Yes, executing set work_mem='64MB' right before the query should be just
fine. Setting work_mem to 2GB is an overkill most of the time (99.9%).

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread tv
 4) INDEXESI can certainly add an index but given the table sizes I am not
 sure if that is a factor. This by no means is a large dataset less than
 350,000 rows in total and 3 columns. Also this was just a quick dump of
 data for comparison purpose. When I saw the poor performance on the
 COALESCE, I pointed the data load to SQL Server and ran the same query
 except with the TSQL specific ISNULL function.

35 rows definitely is a lot of rows, although with 3 INT column it's
just about 13MB of data (including overhead). But indexes can be quite
handy when doing joins, as in this case.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread tv
 First, I modified the work_mem setting to 1GB (reloaded config) from the
 default 1MB and I see a response time of 33 seconds. Results below from
 EXPLAIN ANALYZE:

...

 Second, I modified the work_mem setting to 2GB (reloaded config) and I see
 a response time of 38 seconds. Results below from EXPLAIN ANALYZE:

...

How did you reload the config? Using 'kill -HUP pid'? That should work
fine. Have you cheched 'work_mem' after the reload?

Because the explain plans are exactly the same (structure, estimated
costs). The really interesting bit is this and it did not change at all

   Buckets: 1024 Batches: 64  Memory Usage: 650kB

As Tom Lane already mentioned, splitting hash join into batches (due to
small memory) adds overhead, the optimal number of batches is 1. But I
guess 1GB of work_mem is an overkill - something like 64MB should be fine.

The suspicious thing is the query plans have not changed at all
(especially the number of batches). I think you're not telling us
something important (unintentionally of course).

 By no means I am trying to compare the 2 products. When I noticed the slow
 behavior of COALESCE I tried it on SQL Server. And since they are running
 on the same machine my comment regarding apples to apples. It is possible
 that this is not an apples to apples comparison other than the fact that
 it is running on the same machine.

OK. The point of my post was that you've provided very little info about
the settings etc. so it was difficult to identify why PostgreSQL is so
slow.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread tv
 4) INDEXESI can certainly add an index but given the table sizes I am
 not
 sure if that is a factor. This by no means is a large dataset less than
 350,000 rows in total and 3 columns. Also this was just a quick dump of
 data for comparison purpose. When I saw the poor performance on the
 COALESCE, I pointed the data load to SQL Server and ran the same query
 except with the TSQL specific ISNULL function.

 35 rows definitely is a lot of rows, although with 3 INT column it's
 just about 13MB of data (including overhead). But indexes can be quite
 handy when doing joins, as in this case.

OK, I've just realized the tables have 3 character columns, not integers.
In that case the tables are probably much bigger (and there are things
like TOAST). In that case indexes may be even more important.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread tv
 Pavel Stehule  wrote:
 2010/11/21 Humair Mohammed :

 shared_buffers = 2

 shared_buffers = 2 ???

 Yeah, if that's not a typo, that's a very serious misconfiguration.

I guess that's a typo, as the explain plain in one of the previous posts
contains

   Buffers: shared hit=192 read=4833

for a sequential scan. But I still don't know why is the query so slow :-(

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum blocks the operations of other manual vacuum

2010-11-19 Thread tv
 Excerpts from kuopo's message of jue nov 18 04:10:36 -0300 2010:
 However, when I analyze the table A, the autovacuum or vacuum on the
 table B cannot find any removable row version (the number of
 nonremoveable row versions and pages keeps increasing). After the
 analysis finishes, the search operations on the table B is still
 inefficient. If I call full vacuum right now, then I can have quick
 response time of the search operations on the table B again.

Hi, I don't know how to fix the long VACUUM/ANALYZE, but have you tried to
minimize the growth using HOT?

HOT means that if you update only columns that are not indexed, and if the
update can fit into the same page (into an update chain), this would not
create a dead row.

Are there any indexes on the small table? How large is it? You've
mentioned there are about 2049 rows - that might be just a few pages so
the indexes would not be very efficient anyway.

Try to remove the indexes, and maybe create the table with a smaller
fillfactor (so that there is more space for the updates).

That should be much more efficient and the table should not grow.

You can see if HOT works through pg_stat_all_tables view (columns
n_tup_upd and n_tup_hot_upd).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Low disk performance?

2010-11-18 Thread tv
Hi, what is the size of the table and index (in terms of pages and
tuples)? Try something like

SELECT relpages, reltuples FROM pg_class WHERE relname = 'table or index
name';

And what indexes have you created? It seems to me there's just index on
the variable_id. It might be useful to create index on (variable_id, ts)
or even (variable_id, ts, good_through).

Tomas

 Hello everybody,
 having this SQL query:

 --
 select
 variable_id,float_value,ts,good_through,interval,datetime_value,string_value,int_value,blob_value,history_value_type
 from records_437954e9-e048-43de-bde3-057658966a9f where variable_id
 in (22727) and (ts = '2010-10-02 11:19:55' or good_through =
 '2010-10-02 11:19:55') and (ts = '2010-10-14 11:19:55' or
 good_through = '2010-10-14 11:19:55')
 union all
 select
 variable_id,float_value,ts,good_through,interval,datetime_value,string_value,int_value,blob_value,history_value_type
 from records_1d115712-e943-4ae3-bb14-b56a95796111 where variable_id
 in (24052) and (ts = '2010-10-02 11:19:55' or good_through =
 '2010-10-02 11:19:55') and (ts = '2010-10-14 11:19:55' or
 good_through = '2010-10-14 11:19:55')
 order by ts
 limit 2501 offset 0

 ---

 and these two results:

 1st run:
 http://explain.depesz.com/s/1lT

 2nd run:
 http://explain.depesz.com/s/bhA

 is there anything I can do about the speed? Only buying faster
 hard-disk seems to me as the solution... Am I right?

 Thank you in advance
   Martin

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why dose the planner select one bad scan plan.

2010-11-11 Thread tv
 But I doubt your answer. I think the essence of the problem is when the
 planner selects 'Bitmap Index Scan' and how the planner computes the cost
 of 'Bitmap Index Scan'.

The essence of the problem obviously is a bad estimate of the cost. The
planner considers the two plans, computes the costs and then chooses the
one with the lower cost. But obviously the cost does not reflect the
reality (first time when the query is executed and the filesystem cache is
empty).

 Tom Lane said ��In principle a bitmap index scan should be significantly
 faster if the index can return the bitmap more or less natively rather
 than having to construct it. My recollection though is that a significant
 amount of work is needed to make that happen, and that there is no
 existing patch that tackled the problem. So I'm not sure that this report
 should be taken as indicating that there's no chance of a SELECT
 performance improvement. What it does say is that we have to do that work
 if we want to make bitmap indexes useful.��

Tom Lane is right (as usual). The point is that when computing the cost,
planner does not know whether the data are already in the filesystem cache
or if it has to fetch  them from the disk (which is much slower).

 Okay, I want to know how the planner computes the cost of constructing
 bitmap. And when the planner computes the cost of 'Bitmap Index Scan', if
 it considers the influence of memory cache? As when I do not clear the
 memory cache, I find the 'Bitmap Index Scan' is real fast than 'Seq
 Scan'.

There are two things here - loading the data from a disk into a cache
(filesystem cache at the OS level / shared buffers at the PG level), and
then the execution itself.

PostgreSQL estimates the first part using an effective_cache_size hint,
and uses that to estimate the probability that the data are already in the
filesystem cache. But you're confusing him by the 'reboot' which results
in an empty cache.

The plan itself seems fine to me - you might play with the cost variables,
but I think it won't improve the overall perfomance.

Actually what you see is a worst case scenario - the plan is not bad if
the data are in a cache (filesystem or shared buffers), but when Pg has to
read the data from the disk, performance sucks. But is this reflecting
reality? How often is the query executed? What other queries are executed
on the box? What is the size of shared_buffers?

If the query is executed often (compared to other queries) and the shared
buffers is set high enough, most of the table will remain in the shared
buffers and everything will work fine.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why dose the planner select one bad scan plan.

2010-11-10 Thread tv
 Okay, 225044.255ms VS 83813.808 ms, it obviously seems that the planner
 select one bad scan plan by default.

Actually no, the planner chose the cheapest plan (more precisely a plan
with the lowest computed cost). The first plan has a cost 600830.86 while
the second one has a cost 634901.28, so the first one is chosen.

To fix this, you'll have to tweak the cost variables, and maybe work_mem.
See this -
http://www.postgresql.org/docs/9.0/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
(but I'm not sure which of those influence the Bitmap Heap Scan /
HashAggregate plans).

So you'll have to modify these values until the hash aggregate plan is
cheaper. And you don't need to reboot the machine between EXPLAIN
executions. And even if you do EXPLAIN ANALYZE it's not necessary - there
are better ways to clear the filesystem cache.

BTW this is not a bug, so it's pointless to send it to 'bugs' mailinglist.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] planner index choice

2010-07-29 Thread tv
 http://explain.depesz.com/s/br9
 http://explain.depesz.com/s/gxH

Well, I don't have time to do a thorough analysis right now, but in all
the plans you've posted there are quite high values in the Rows x column
(e.g. the 5727.5 value).

That means a significant difference in estimated and actual row number,
which may lead to poor choice of indexes etc. The planner may simply think
the index is better due to imprecise statistics etc.

Try to increase te statistics target for the columns, e.g.

ALTER TABLE table ALTER COLUMN column SET STATISTICS integer

where integer is between 0 and 1000 (the default value is 10 so use 100
or maybe 1000), run analyze and try to run the query again.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identical query slower on 8.4 vs 8.3

2010-07-16 Thread tv

 I'd suggest to increase the value up to ~80MB, if not for the system,
 may be just for the session running this query.
 Then see if performance improved.

Don't forget you can do this for the given query without affecting the
other queries - just do something like

SET work_mem = 128M

and then run the query - it should work fine. This is great for testing
and to set environment for special users (batch processes etc.).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Small Queries Really Fast, Large Queries Really Slow...

2010-06-24 Thread tv
 Any suggestions on what I can do to speed things up? I presume if I turn
 off
 Sequential Scan then it might default to Index Scan.. Is there anything
 else?

 Cheers,
 Tom

Well, I doubt turning off the sequential scan will improve the performance
in this case - actually the first case (running 400 sec) uses an index
scan, while the 'fast' one uses sequential scan.

Actually I'd try exactly the oposite - disabling the index scan, i.e.
forcing it to use sequential scan in the first case. You're selecting
about 4% of the rows, but we don't know how 'spread' are those rows
through the table. It might happen PostgreSQL actually has to read all the
blocks of the table.

This might be improved by clustering - create and index on the
'match_data_id' colunm and then run

CLUSTER match_data_id_idx ON match_data;

This will sort the table accoring to match_data_id column, which should
improve the performance. But it won't last forever - it degrades through
time, so you'll have to perform clustering once a while (and it locks the
table, so be careful).

How large is the table anyway? How many rows / pages are there? Try
something like this

SELECT reltuples, relpages FROM pg_class WHERE relname = 'match_data';

Multiply the blocks by 8k and you'll get the occupied space. How much is
it? How much memory (shared_buffers) is there?

You could try partitioning accoring to the match_data_id column, but there
are various disadvantages related to foreign keys etc. and it's often a
major change in the application, so I'd consider other solutions first.

BTW I have no experience with running PostgreSQL inside a Virtual Box VM,
so it might be another source of problems. I do remember we had some
serious problems with I/O (network and disks) when running vmware, but it
was a long time ago and now it works fine. But maybe this the root cause?
Can you run dstat / vmstat / iostat or something like that in the host OS
to see which of the resources is causing problems (if any)?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query

2010-06-04 Thread tv
 I am reposting as my original query was mangled

 The link to the explain plan is here as it does not paste well into
 the email body.

 http://explain.depesz.com/s/kHa


 The machine is a 2 cpu quad core 5430 with 32G RAM and 6x450G 15K
 single raid-10 array

 1G work_mem
 default_statistics_target=1000
 random_page_cost=1

Are you sure it's wise to set the work_mem to 1G? Do you really need it?
Don't forget this is not a 'total' or 'per query' - each query may
allocate multiple work areas (and occupy multiple GB). But I guess this
does not cause the original problem.

The last row 'random_page_cost=1' - this basically says that reading data
by random is just as cheap as reading data sequentially. Which may result
in poor performance due to bad plans. Why have you set this value?

Sure, there are rare cases where 'random_page_cost=1' is OK.


 I am curious why the hash join takes so long. The main table
 dev4_act_dy_fact_2010_05_t has 25 million rows. The table is
 partitioned into 3 parts per month. Remaining tables are very small (
  1000 rows)

Well, the real cause that makes your query slow is the 'index scan' part.

Index Scan using dev4_act_dy_fact_2010_05_t3_thedate on
dev4_act_dy_fact_2010_05_t3 a (cost=0.00..94041.89 rows=204276 width=60)
(actual time=164533.725..164533.725 rows=0 loops=1)

The first thing to note here is the difference in expected and actual
number of rows - the planner expects 204276 but gets 0 rows. How large is
this partition?

Try to analyze it, set the random_page_cost to something reasonable (e.g.
4) and try to run the query again.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread tv
 Current Folder: Sent   Sign Out
Compose   Addresses   Folders   Options   Autoreply   Search   Help  
CalendarG-Hosting.cz

Message List | Delete | Edit Message as New Previous | Next Forward 
|
Forward as Attachment | Reply | Reply All
Subject:Re: [PERFORM] Random Page Cost and Planner
From:   t...@fuzzy.cz
Date:   Wed, May 26, 2010 12:01 pm
To: David Jarvis thanga...@gmail.com
Priority:   Normal
Options:View Full Header |  View Printable Version  | Download this as
a file  | View Message details

 Hi, Tom.

 Yes, that is what happened, making the tests rather meaningless, and
 giving
 me the false impression that the indexes were being used. They were but
 only
 because of cached results. When multiple users making different queries,
 the
 performance will return to ~80s per query.

 I also tried Kevin's suggestion, which had no noticeable effect:
 effective_cache_size = 512MB

 That said, when using the following condition, the query is fast (1
 second):

 extract(YEAR FROM sc.taken_start) = 1963 AND
 extract(YEAR FROM sc.taken_end) = 2009 AND

 -  Index Scan using measurement_013_stc_idx on
 measurement_013 m  (cost=0.00..511.00 rows=511 width=15) (actual
 time=0.018..3.601 rows=3356 loops=104)
   Index Cond: ((m.station_id = sc.station_id) AND
 (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND
 (m.category_id
 = 7))

 This condition makes it slow (13 seconds on first run, 8 seconds
 thereafter):

 *extract(YEAR FROM sc.taken_start) = 1900 AND
 *extract(YEAR FROM sc.taken_end) = 2009 AND

   Filter: (category_id = 7)
 -  Seq Scan on measurement_013 m
 (cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692
 rows=18118395 loops=1)

 At this point, I'm tempted to write a stored procedure that iterates over
 each station category for all the years of each station. My guess is that
 the planner's estimate for the number of rows that will be returned by
 *extract(YEAR
 FROM sc.taken_start) = 1900* is incorrect and so it chooses a full table
 scan for all rows. Even though the lower bound appears to be a constant
 value of the 1900, the average year a station started collecting data was
 44
 years ago (1965), and did so for an average of 21.4 years.

 The part I am having trouble with is convincing PG to use the index for
 the
 station ID and the date range for when the station was active. Each
 station
 has a unique ID; the data in the measurement table is ordered by
 measurement
 date then by station.

Well, don't forget indexes may not be the best way to evaluate the query -
if the selectivity is low (the query returns large portion of the table)
the sequetial scan is actually faster. The problem is using index means
you have to read the index blocks too, and then the table blocks, and this
is actually random access. So your belief that thanks to using indexes the
query will run faster could be false.

And this is what happens in the queries above - the first query covers
years 1963-2009, while the second one covers 1900-2009. Given the fact
this table contains ~40m rows, the first query returns about 0.01% (3k
rows) while the second one returns almost 50% of the data (18m rows). So I
doubt this might be improved using an index ...

But you can try that by setting enable_seqscan=off or proper setting of
the random_page_cost / seq_page_cost variables (so that the plan with
indexes is cheaper than the sequential scan). You can do that in the
session (e.g. use SET enable_seqscan=off) so that you won't harm other
sessions.

 Should I add a clustered index by station then by date?

 Any other suggestions are very much appreciated.

Well, the only thing that crossed my mind is partitioning with properly
defined constraints and constrain_exclusion=on. I'd recommend partitioning
by time (each year a separate partition) but you'll have to investigate
that on your own (depends on your use-cases).

BTW the cache_effective_size mentioned in the previous posts is just an
'information parameter' - it does not increase the amount of memory
allocated by PostgreSQL. It merely informs PostgreSQL of expected disk
cache size maintained by the OS (Linux), so that PostgreSQL may estimate
the change that the requested data are actually cached (and won't be read
from the disk).

regards
Tomas




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread tv
 Hi,

 And this is what happens in the queries above - the first query covers
 years 1963-2009, while the second one covers 1900-2009. Given the fact
 this table contains ~40m rows, the first query returns about 0.01% (3k
 rows) while the second one returns almost 50% of the data (18m rows). So
 I
 doubt this might be improved using an index ...


 I don't think that's what I'm doing.

 There are two tables involved: station_category (sc) and measurement (m).

 The first part of the query:

  extract(YEAR FROM sc.taken_start) = 1900 AND
  extract(YEAR FROM sc.taken_end) = 2009 AND

 That is producing a limit on the station_category table. There are, as far
 as I can tell, no stations that have been taking weather readings for 110
 years. Most of them have a lifespan of 24 years. The above condition just
 makes sure that I don't get data before 1900 or after 2009.



OK, I admit I'm a little bit condfused by the query, especially by these
rows:

sc.taken_start = '1900-01-01'::date AND
sc.taken_end = '1996-12-31'::date AND
m.taken BETWEEN sc.taken_start AND sc.taken_end AND

Which seems to me a little bit convoluted. Well, I think I understand
what that means - give me all stations for a given city, collecting the
category of data at a certain time. But I'm afraid this makes the planning
much more difficult, as the select from measurements depend on the data
returned by other parts of the query (rows from category).

See this http://explain.depesz.com/s/H1 and this
http://explain.depesz.com/s/GGx

I guess the planner is confused in the second case - believes it has to
read a lot more data from the measurement table, and so chooses the
sequential scan. The question is if this is the right decision (I believe
it is not).

How many rows does the query return without the group by clause? About
14 in both cases, right?

 by time (each year a separate partition) but you'll have to investigate
 that on your own (depends on your use-cases).


 I cannot partition by time. First, there are 7 categories, which would
 mean
 770 partitions if I did it by year -- 345000 rows per partition. This will
 grow in the future. I have heard there are troubles with having lots of
 child tables (too many files for the operating system). Second, the user
 has
 the ability to pick arbitrary day ranges for arbitrary year spans.

 There's a year wrapping issue that I won't explain because I never get
 it
 right the first time. ;-)

OK, I haven't noticed the table is already partitioned by category_id and
I didn't mean to partition by (taken, category_id) - that would produce a
lot of partitions. Yes, that might cause problems related to number of
files, but that's rather a filesystem related issue.

I'd expect rather issues related to RULEs or triggers (not sure which of
them you use to redirect the data into partitions). But when partitioning
by time (and not by category_id) the number of partitions will be much
lower and you don't have to keep all of the rules active - all you need is
a rule for the current year (and maybe the next one).

I'm not sure what you mean by 'year wrapping issue' but I think it might
work quite well - right not the problem is PostgreSQL decides to scan the
whole partition (all data for a given category_id).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread tv
I guess we need some more details about the test. Is the
connection/disconnection part of each test iteration? And how are the
databases connected (using a socked / localhost / different host)?

Anyway measuring such simple queries will tell you almost nothing about
the general app performance - use the queries that are used in the
application.

 I also wonder why the reported runtime of 5.847 ms is so much different
 to the runtime reported of my scripts (both php and ruby are almost the
 same). What's the best tool to time queries in postgresql? Can this be
 done from pgadmin?

I doubt there's a 'best tool' to time queries, but I'd vote for logging
from the application itself, as it measures the performance from the end
user view-point (and that's what you're interested in). Just put some
simple logging into the database access layer.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query after upgrade to 8.4

2009-09-24 Thread tv
 Hello postgres wizards,

 We recently upgraded from 8.1.5 to 8.4
 We have a query (slow_query.sql) which took about 9s on 8.1.5
 On 8.4, the same query takes 17.7 minutes.

 The code which generated this query is written to support the
 calculation of arbitrary arithmetic expressions across variables and
 data within our application.  The example query is a sum of three
 variables, but please note that because the code supports arbitrary
 arithmetic, we do not use an aggregate function like sum()

 We have collected as much information as we could and zipped it up here:

 http://pgsql.privatepaste.com/download/a3SdI8j2km

 Thank you very much in advance for any suggestions you may have,
 Jared Beck

Tom Lane already replied, so I'm posting just parsed explain plans - I've
created that before noticing the reply, and I think it might be useful.

good (8.1): http://explain.depesz.com/s/1dT
bad (8.4): http://explain.depesz.com/s/seT

As you can see, the real problem is the 'index scan / sort'.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Different query plans for the same query

2009-09-18 Thread tv
 Hi all,

 on our PostgreSQL 8.3.1 (CentOS 5.3 64-bit) two different query plans
 for one of our (weird) queries are generated. One of the query plans
 seems to be good (and is used most of the time). The other one is bad -
 the query takes about 2 minutes and the database process, which is
 executing the query, is cpu bound during this time.

 After several tries I was able to reproduce the problem when executing
 the query with EXPLAIN ANALYZE. The bad query plan was generated only
 seconds after the good one was used when executing the query. What's the
 reasond for the different query plans? Statistics are up to date.

 ...

Hi,

please, when posting an explain plan, either save it into a file and
provide a URL (attachments are not allowed here), or use
explain.depesz.com or something like that. This wrapping makes the plan
unreadable so it's much more difficult to help you.

I've used the explain.depesz.com (this time):

- good plan: http://explain.depesz.com/s/HX
- bad plan: http://explain.depesz.com/s/gcr

It seems the whole problem is caused by the 'Index Scan using ind_atobjval
on atobjval t9' - in the first case it's executed only 775x, but in the
second case it's moved to the nested loop (one level deeper) and suddenly
it's executed 271250x. And that causes the huge increase in cost.

Why is this happening? I'm not sure, but I'm not quite sure the statistics
are up to data and precise enough - some of the steps state 'rows=1'
estimate, but 'rows=775' in the actual results.

Have you tried to increase target on the tables? That might provide more
accurate stats, thus better estimates.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] possible wrong query plan on pg 8.3.5,

2009-09-14 Thread tv
 Hi Tom,

 Yes, 24 is relative ok ( the real number is 20).
 And the statistic target for the database is 800 at the moment. If
 needet I can set it to 1000 ( the maximum).

 Also I waited to the end of this query to gather info for explain analyze.
 It is it:

   explain analyze  select d.ids from a_doc d  join a_sklad s on
 (d.ids=s.ids_doc)  join a_nomen n on (n.ids=s.ids_num)  join a_nom_gr
 nmgr on (nmgr.ids=n.ids_grupa)  join a_gar_prod_r gr on
 (gr.ids_a_sklad=s.ids and gr.sernum!='ok')  join a_location l on
 (l.ids=s.ids_sklad)  join a_klienti kl on (kl.ids=d.ids_ko)  left
 outer join a_slujiteli sl on (sl.ids=d.ids_slu_ka)  left outer join
 a_slujiteli slu on (slu.ids=d.ids_slu_targ)  where d.op=1  AND
 d.date_op = 12320 AND d.date_op = 12362 and n.num like '191%';

 QUERY PLAN
 
   Nested Loop Left Join  (cost=63.64..133732.47 rows=4 width=64)
 (actual time=616059.833..1314396.823 rows=91 loops=1)
 -  Nested Loop  (cost=63.64..133699.35 rows=4 width=128) (actual
 time=616033.205..1313991.756 rows=91 loops=1)
   -  Nested Loop  (cost=63.64..133688.22 rows=4 width=192)
 (actual time=616033.194..1313991.058 rows=91 loops=1)
 -  Nested Loop Left Join  (cost=63.64..133687.10
 rows=4 width=256) (actual time=616033.183..1313936.577 rows=91 loops=1)
   -  Nested Loop  (cost=63.64..133685.78 rows=4
 width=320) (actual time=616033.177..1313929.258 rows=91 loops=1)
 -  Nested Loop  (cost=63.64..133646.56
 rows=6 width=384) (actual time=616007.069..1313008.701 rows=91 loops=1)
   -  Nested Loop
 (cost=63.64..127886.54 rows=2833 width=192) (actual
 time=376.309..559763.450 rows=211357 loops=1)
 -  Nested Loop
 (cost=63.64..107934.83 rows=13709 width=256) (actual
 time=224.058..148475.499 rows=370803 loops=1)
   -  Index Scan using
 i_nomen_num on a_nomen n  (cost=0.00..56.39 rows=24 width=128) (actual
 time=15.702..198.049 rows=20 loops=1)
 Index Cond:
 (((num)::text = '191'::text) AND ((num)::text  '192'::text))
 Filter:
 ((num)::text ~~ '191%'::text)
   -  Bitmap Heap Scan on
 a_sklad s  (cost=63.64..4480.23 rows=1176 width=256) (actual
 time=93.223..7398.764 rows=18540 loops=20)
 Recheck Cond:
 (s.ids_num = n.ids)
 -  Bitmap Index
 Scan on i_sklad_ids_num  (cost=0.00..63.34 rows=1176 width=0) (actual
 time=78.430..78.430 rows=18540 loops=20)
   Index Cond:
 (s.ids_num = n.ids)
 -  Index Scan using
 i_a_gar_prod_r_ids_a_sklad on a_gar_prod_r gr  (cost=0.00..1.44 rows=1
 width=64) (actual time=1.098..1.108 rows=1 loops=370803)
   Index Cond:
 (gr.ids_a_sklad = s.ids)
   Filter: (gr.sernum 
 'ok'::text)
   -  Index Scan using a_doc_pkey on
 a_doc d  (cost=0.00..2.02 rows=1 width=256) (actual time=3.563..3.563
 rows=0 loops=211357)
 Index Cond: (d.ids = s.ids_doc)
 Filter: ((d.date_op = 12320)
 AND (d.date_op = 12362) AND (d.op = 1))
 -  Index Scan using a_klienti_pkey on
 a_klienti kl  (cost=0.00..6.53 rows=1 width=64) (actual
 time=10.109..10.113 rows=1 loops=91)
   Index Cond: (kl.ids = d.ids_ko)
   -  Index Scan using a_slujiteli_pkey on
 a_slujiteli sl  (cost=0.00..0.32 rows=1 width=64) (actual
 time=0.078..0.078 rows=0 loops=91)
 Index Cond: (sl.ids = d.ids_slu_ka)
 -  Index Scan using a_location_pkey on a_location l
 (cost=0.00..0.27 rows=1 width=64) (actual time=0.596..0.597 rows=1
 loops=91)
   Index Cond: (l.ids = s.ids_sklad)
   -  Index Scan using a_nom_gr_pkey on a_nom_gr nmgr
 (cost=0.00..2.77 rows=1 width=64) (actual time=0.005..0.006 rows=1
 loops=91)
 Index Cond: (nmgr.ids = n.ids_grupa)
 -  Index Scan using a_slujiteli_pkey on a_slujiteli slu
 (cost=0.00..8.27 rows=1 width=64) (actual time=4.448..4.449 rows=1
 loops=91)
   Index Cond: (slu.ids = d.ids_slu_targ)
   Total runtime: 1314397.153 ms
 (32 rows)


 And if I try this query for second time it is working very fast:


 -
   Nested Loop Left Join  (cost=63.64..133732.47 rows=4 width=64)
 

Re: [PERFORM] Sub-optimal plan chosen

2009-09-10 Thread tv
 default_statistics_target = 100 (tried with 500, no change). Vacuum
 analyzed
 before initial query, and after each change to default_statistics_target.

Modifying the statistics target is useful only if the estimates are
seriously off, which is not your case - so it won't help, at least not
reliably.

 The same query, with a different ofid, will occasionally get the more
 optimal plan -- I assume that the distribution of data is the
 differentiator
 there.

Yes, the difference between costs of the two plans is quite small (11796
vs. 13153) so it's very sensible to data distribution.

 Is there any other data I can provide to shed some light on this?

You may try to play with the 'cost' constants - see this:

http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

You just need to modify them so that the bitmap index scan / bitmap heap
scan is prefered to plain index scan.

Just be careful - if set in the postgresql.conf, it affects all the
queries and may cause serious problems with other queries. So it deserves
proper testing ...

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sub-optimal plan chosen

2009-09-10 Thread tv
 Playing around with seq_page_cost (1) and random_page_cost (1), I can get
 the correct index selected. Applying those same settings to our production
 server does not produce the optimal plan, though.

I doubt setting seq_page_cost and random_page_cost to the same value is
reasonable - random access is almost always more expensive than sequential
access.

Anyway, post the EXPLAIN ANALYZE output from the production server. Don't
forget there are other _cost values - try to modify them too, but I'm not
sure how these values relate to the bitmap heap scan / bitmap index plans.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving data between tables causes the db to overwhelm the system

2009-09-01 Thread tv
 Hi all;

 We have a table that's  2billion rows big and growing fast.  We've setup
 monthly partitions for it. Upon running the first of many select * from
 bigTable insert into partition statements (330million rows per month) the
 entire box eventually goes out to lunch.

 Any thoughts/suggestions?

 Thanks in advance


Sorry, but your post does not provide enough information, so it's
practically impossible to give you some suggestions :-(

Provide at least these information:

1) basic info about the hardware (number and type of cpus, amount of RAM,
controller, number of disk drives)

2) more detailed information of the table size and structure (see the
pg_class and pg_stat_* views). Information about indexes and triggers
created on the table

3) explain plan of the problematic queries - in this case the 'select *
from bigtable' etc.

4) detailed description what 'going to lunch' means - does that mean the
CPU is 100% occupied, or is there a problem with I/O (use vmstat / dstat
or something like that)

I've probably forgot something, but this might be a good starting point.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving data between tables causes the db to overwhelm the system

2009-09-01 Thread tv
 On Tuesday 01 September 2009 03:26:08 Pierre FrĂŠdĂŠric Caillaud wrote:
  We have a table that's  2billion rows big and growing fast.  We've
 setup
  monthly partitions for it. Upon running the first of many select *
 from
  bigTable insert into partition statements (330million rows per month)
 the
  entire box eventually goes out to lunch.
 
  Any thoughts/suggestions?
 
  Thanks in advance

  Did you create the indexes on the partition before or after inserting
 the
 330M rows into it ?
  What is your hardware config, where is xlog ?


 Indexes are on the partitions, my bad.  Also HW is a Dell server with 2
 quad
 cores and 32G of ram

 we have a DELL MD3000 disk array with an MD1000 expansion bay, 2
 controllers,
 2 hbs's/mount points runing RAID 10

 The explain plan looks like this:
 explain SELECT * from bigTable
 where
 time = extract ('epoch' from timestamp '2009-08-31 00:00:00')::int4
 and time = extract ('epoch' from timestamp '2009-08-31 23:59:59')::int
 ;

QUERY PLAN
 
  Index Scan using bigTable_time_index on bigTable  (cost=0.00..184.04
 rows=1
 width=129)
Index Cond: ((time = 1251676800) AND (time = 1251763199))
 (2 rows)

This looks like a single row matches your conditions. Have you run ANALYZE
on the table recently? Try to run ANALYZE BigTable and then the explain
again.

BTW what version of PostgreSQL are you running?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread tv
 Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune
 before and everything worked fine for me.

 And now i have ~93% cpu load. Here's changed values of config:

 default_statistics_target = 50
 maintenance_work_mem = 1GB
 constraint_exclusion = on
 checkpoint_completion_target = 0.9
 effective_cache_size = 22GB
 work_mem = 192MB
 wal_buffers = 8MB
 checkpoint_segments = 16
 shared_buffers = 7680MB
 max_connections = 80


 My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only
 postgresql working on it.

 For connection pooling i'm using pgbouncer's latest version with
 pool_size 20 (used 30 before, but now lowered) and 10k connections.

 What parameters i should give more attention on?


All the values seem quite reasonable to me. What about the _cost variables?

I guess one or more queries are evaluated using a different execution
plan, probably sequential scan instead of index scan, hash join instead of
merge join, or something like that.

Try to log the slow statements - see log_min_statement_duration. That
might give you slow queries (although not necessarily the ones causing
problems), and you can analyze them.

What is the general I/O activity? Is there a lot of data read/written to
the disks, is there a lot of I/O wait?

regards
Tomas

PS: Was the database analyzed recently?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread tv
 Unfortunately had to downgrade back to 8.3. Now having troubles with
 that and still solving them.

 For future upgrade, what is the basic steps?

1. create database
2. dump the data from the old database
3. load the data into the new database
4. analyze etc. (I prefer to do this manually at the beginning)
5. check that everything is working (that the correct execution plans are
used, etc.)

You may even run the (2) and (3) at once - use pipe instead of a file.


Was the database analyzed recently?
 Hm... there was smth like auto analyzer in serverlog when i started it
 first time, but i didn't mention that.
 Should I analyze whole db? How to do it?

Just execute 'ANALYZE' and the whole database will be analyzed, but when
the autovacuum daemon is running this should be performed automatically (I
guess - check the pg_stat_user_tables, there's information about last
manual/automatic vacuuming and/or analysis).

 And how should I change _cost variables?

I haven't noticed you've not modified those variables, so don't change them.

 I/O was very high. at first memory usage grew up and then began to full
 swap.

OK, this seems to be the cause. What were the original values of the
config variables? If you've lowered the work_mem and you need to sort a
lot of data, this may be a problem. What amounts of data are you working
with? If the data were not analyzed recently, the execution plans will be
inefficient and this may be the result.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fwd: Casting issue!!

2009-01-07 Thread tv
Hi Jose,

why haven't you post an example of the failing query, data and the exact
error message? The casting should work on 8.3 (and it works for me) so I
guess there are some invalid data, invalid SQL or something like that.

Anyway I doubt this is a performance issue - this falls into generic SQL
mailing list.

regards
Tomas


 -- Forwarded message --
 From: jose fuenmayor jaf...@gmail.com
 Date: Wed, Jan 7, 2009 at 2:56 PM
 Subject: Casting issue!!
 To: psql-ad...@postgresql.org


 Hi all I am trying to migrate from postgresql 8.2.x to  8.3.x, i have an
 issue with casting values when i try to perform the auto cast , it does
 not
 work and I get an error, how can i perform  auto casting on 8.3 without
 rewrite my source code, I am using pl/pgsql. I mean i dont want to write
 value::dataType. I dont want to use explicit type cast. Maybe change
 something in the config files? to make it work like 8.2 on tha regard(cast
 values).
 thanks a lot!!!
 Kind Regards;
 Jose Fuenmayor

















































 ç




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres using more memory than it should

2008-12-03 Thread tv

 Hi. I have a problem on one of our production servers. A fairly
 complicated query is running, and the backend process is using 30 GB of
 RAM. The machine only has 32GB, and is understandably swapping like crazy.
 My colleague is creating swap files as quickly as it can use them up.

 The work_mem setting on this machine is 1000MB, running Postgres 8.3.0.

Are you aware that this is a per-session / per-sort settings? That means,
if you have 10 sessions, each of them running query with 2 sort steps in
the plan, it may occupy up to 20 GB of RAM (if both sorts use the whole
1GB of RAM).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-24 Thread tv
 Given the fact that the performance issues are caused by bloated tables
 and / or slow I/O subsystem, moving to a similar system won't help I
 guess.

 I have ran VACUUM FULL ANALYZE VERBOSE
 and set MAX_FSM_PAGES = 15

 So there is no any bloat except pg_shdepend indexes which should not
 affect to query speed.

OK, what was the number of unused pointer items in the VACUUM output?

The query performance is still the same as when the tables were bloated?
What are the outputs of iostat/vmstat/dstat/top when running the query?

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-21 Thread tv
Just the most important points:

1) dok table contains 1235086 row versions in 171641 pages (with 8kB
pages this means 1.4GB MB of data), but there are 1834279 unused item
pointers (i.e. about 60% of the space is wasted)

2) rid table contains 3275189 roiws in 165282 (with 8kB pages this means
about 1.3GB of data), but there are 1878923 unused item pointers (i.e.
about 30% of the space is wasted)

3) don't forget to execute analyze after vacuuming (or vacuum analyze)

4) I'm not sure why the sizes reported by you (for example 2.3GB vs 1.5GB
for doc table) - the difference seems too large for me.

Anyway the amount of wasted rows seems significant to me - I'd try to
solve this first. Either by VACUUM FULL or by CLUSTER. The CLUSTER will
lock the table exclusively, but the results may be better (when sorting by
a well chosen index). Don't forget to run ANALYZE afterwards.

Several other things to consider:

1) Regarding the toode column - why are you using CHAR(20) when the values
are actually shorter? This may significantly increase the amount of space
required.

2) I've noticed the CPU used is Celeron, which may negatively affect the
speed of hash computation. I'd try to replace it by something faster - say
INTEGER as an artificial primary key of the toode table and using it as
a FK in other tables.  This might improve the Bitmap Heap Scan on rid
part, but yes - it's just a minor improvement compared to the Hash Join
part of the query.

Materialized views seem like a good idea to me, but maybe I'm not seeing
something. What do you mean by reports are different? If there is a lot
of rows for a given product / day, then creating an aggregated table with
(product code / day) as a primary key is quite simple. It may require a
lot of disk space, but it'll remove the hash join overhead. But if the
queries are very different, then it may be difficult to build such
materialized view(s).

regards
Tomas

 PFC,

 thank you.

 OK so vmstat says you are IO-bound, this seems logical if the same plan
 has widely varying timings...

 Let's look at the usual suspects :

 - how many dead rows in your tables ? are your tables data, or bloat ?
 (check vacuum verbose, etc)

 set search_path to firma2,public;
 vacuum verbose dok; vacuum verbose rid

 INFO:  vacuuming firma2.dok
 INFO:  index dok_pkey now contains 1235086 row versions in 9454 pages
 DETAIL:  100 index row versions were removed.
 0 index pages have been deleted, 0 are currently reusable.
 CPU 0.16s/0.38u sec elapsed 0.77 sec.
 INFO:  index dok_dokumnr_idx now contains 1235086 row versions in 9454
 pages
 DETAIL:  100 index row versions were removed.
 0 index pages have been deleted, 0 are currently reusable.
 CPU 0.14s/0.40u sec elapsed 0.78 sec.
 INFO:  index dok_klient_idx now contains 1235086 row versions in 18147
 pages
 DETAIL:  887 index row versions were removed.
 3265 index pages have been deleted, 3033 are currently reusable.
 CPU 0.36s/0.46u sec elapsed 31.87 sec.
 INFO:  index dok_krdokumnr_idx now contains 1235086 row versions in
 11387
 pages
 DETAIL:  119436 index row versions were removed.
 1716 index pages have been deleted, 1582 are currently reusable.
 CPU 0.47s/0.55u sec elapsed 63.38 sec.
 INFO:  index dok_kuupaev_idx now contains 1235101 row versions in 10766
 pages
 DETAIL:  119436 index row versions were removed.
 659 index pages have been deleted, 625 are currently reusable.
 CPU 0.62s/0.53u sec elapsed 40.20 sec.
 INFO:  index dok_tasudok_idx now contains 1235104 row versions in 31348
 pages
 DETAIL:  119436 index row versions were removed.
 0 index pages have been deleted, 0 are currently reusable.
 CPU 1.18s/1.08u sec elapsed 118.97 sec.
 INFO:  index dok_tasudok_unique_idx now contains 99 row versions in 97
 pages
 DETAIL:  98 index row versions were removed.
 80 index pages have been deleted, 80 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.48 sec.
 INFO:  index dok_tasumata_idx now contains 1235116 row versions in 11663
 pages
 DETAIL:  119436 index row versions were removed.
 5340 index pages have been deleted, 5131 are currently reusable.
 CPU 0.43s/0.56u sec elapsed 53.96 sec.
 INFO:  index dok_tellimus_idx now contains 1235122 row versions in 11442
 pages
 DETAIL:  119436 index row versions were removed.
 1704 index pages have been deleted, 1569 are currently reusable.
 CPU 0.45s/0.59u sec elapsed 76.91 sec.
 INFO:  index dok_yksus_pattern_idx now contains 1235143 row versions in
 5549 pages
 DETAIL:  119436 index row versions were removed.
 529 index pages have been deleted, 129 are currently reusable.
 CPU 0.19s/0.46u sec elapsed 2.72 sec.
 INFO:  index dok_doktyyp now contains 1235143 row versions in 3899 pages
 DETAIL:  119436 index row versions were removed.
 188 index pages have been deleted, 13 are currently reusable.
 CPU 0.14s/0.44u sec elapsed 1.40 sec.
 INFO:  index dok_sihtyksus_pattern_idx now contains 1235143 row versions
 in 5353 pages
 DETAIL:  119436 index row versions were removed.
 286 index pages 

Re: [PERFORM] Very Urgent : Sequences Problem

2008-11-19 Thread tv
 On Wed, Nov 19, 2008 at 10:54 AM, Kranti#65533; K K Parisa
 [EMAIL PROTECTED] wrote:
 Hi,

 I have defined sequence on a table something like this


 CREATE SEQUENCE items_unqid_seq
   INCREMENT 1
   MINVALUE 0
   MAXVALUE 9223372036854775807
   START 7659
   CACHE 1;

 this is on a table called items. where i have currently the max(unq_id)
 as
 7659.

 and in the stored procedure when i am inserting values into the items
 table
 for the unq_id column i am using the sequence as follows:

 nextval('items_unqid_seq'::text)


 it seems to be working some times. and the sequences are not getting
 updated
 sometime. which is casuing primary key exceptions.

 please advise as soon as possible.

 is there any trivial problem with sequences in postgresql??

 no (at least none that I know of).

 maybe if you posted the source of your procedure?  I bet your error is
 coming form some other source.

Are you sure you're using the nextval() properly whenever you insert data
into the table? This usually happens when a developer does not use it
properly, i.e. he just uses a (select max(id) + 1 from ...) something like
that. One of the more creative ways of breaking sequences was calling
nextval() only for the first insert, and then adding 1 to the ID.

BTW. do you have RULEs defined on the table? Some time ago I run into a
problem with RULEs defined on the table, as all the rules are evaluated -
I've used nextval() in all the rules so it was incremented for each rule
and it was not clear which value was actually used. So it was not sure
which value to use in a following insert (as a FK value).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance and IN clauses

2008-11-18 Thread tv
I bet there is no 'critical' length - this is just another case of index
scan vs. seqscan. The efficiency depends on the size of the table / row,
amount of data in the table, variability of the column used in the IN
clause, etc.

Splitting the query with 1000 items into 10 separate queries, the smaller
queries may be faster but the total time consumed may be actually higher.
Something like

10 * (time of small query) + (time to combine them)  (time of large query)

If the performance of the 'split' solution is actually better than the
original query, it just means that the planner does not use index scan
when it actually should. That means that either

(a) the planner is not smart enough
(b) it has not current statistics of the table (run ANALYZE on the table)
(c) the statistics are not detailed enough (ALTER TABLE ... SET STATICTICS)
(d) the cost variables are not set properly (do not match the hardware -
decreate index scan cost / increase seq scan cost)

regards
Tomas

 On Tue, 18 Nov 2008, Kynn Jones wrote:
 Also, assuming that the optimal way to write the query depends on the
 length of $node_list, how can I estimate the
 critical length at which I should switch from one form of the query to
 the other?

 In the past, I have found the fastest way to do this was to operate on
 groups of a bit less than a thousand values, and issue one query per
 group. Of course, Postgres may have improved since then, so I'll let more
 knowledgable people cover that for me.

 Matthew

 --
  Heat is work, and work's a curse. All the heat in the universe, it's
  going to cool down, because it can't increase, then there'll be no
  more work, and there'll be perfect peace.  -- Michael Flanders

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Question

2008-11-12 Thread tv
 max_connections = 100
 shared_buffers = 16MB
 work_mem = 64MB
 everything else is set to the default

OK, but what about effective_cache_size for example?

Anyway, we need more information about the table itself - the number of
rows is nice, but it does not say how large the table is. The rows might
be small (say 100B each) or large (say several kilobytes), affecting the
amount of data to be read.

We need to know the structure of the table, and the output of the
following commands:

ANALYZE table;
SELECT relpages, reltuples FROM pg_class WHERE relname = 'table';
EXPLAIN SELECT * FROM table;


 One of my tables has 660,000 records and doing a SELECT * from that table
 (without any joins or sorts) takes 72 seconds. Ordering the table based on
 3 columns almost doubles that time to an average of 123 seconds. To me,
 those numbers are crazy slow and I don't understand why the queries are
 taking so long. The tables are UTF-8 encode and contain a mix of languages
 (English, Spanish, etc). I'm running the query from pgadmin3 on a remote
 host. The server has nothing else running on it except the database.

 As a test I tried splitting up the data across a number of other tables. I
 ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join
 the results together. This was even slower, taking an average of 103
 seconds to complete the generic select all query.

Well, splitting the tables just to read all of them won't help. It will
make the problem even worse, due to the necessary processing (UNION ALL).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple indexed IN query takes 40 seconds

2008-11-10 Thread tv
Obviously, most of the total cost (cost=327569, time=39749ms) comes from
two operations in the execution plan:

(a) sequential scan on the 'rid' table (cost=195342, time=26347ms) that
produces almost 3.200.000 rows
(b) hash join of the two subresults (cost=24, time=14000ms)

How many rows are there in the 'rid' table? If the 'IN' clause selects
more than a few percent of the table, the index won't be used as the
sequential scan of the whole table will be faster than random access
(causing a lot of seeks).

Try to:

(a) analyze the table - might help if the stats are too old and don't
reflect current state
(b) increase the statistics target of the table (will give more precise
stats, allowing to select a better plan)
(c) tune the 'cost' parameters of the planner - the default values are
quite conservative, so if you have fast disks (regarding seeks) the
sequential scan may be chosen too early, you may even 'turn off' the
sequential scan

regards
Tomas

 I found that simple IN query on indexed tables takes too much time.

 dok and rid have both indexes on int dokumnr columnr and dokumnr is not
 null.
 PostgreSql can use index on dok or event on rid so it can executed fast.

 How to make this query to run fast ?

 Andrus.



 note: list contain a lot of integers, output below is abbreviated in this
 part.

 explain analyze select
 sum(rid.kogus)
from dok JOIN rid USING(dokumnr)
where dok.dokumnr in
 (869906,869907,869910,869911,869914,869915,869916,869917,869918,869921,869925,869926,869928,869929,869934,869935,869936,...)

 Aggregate  (cost=327569.15..327569.16 rows=1 width=9) (actual
 time=39749.842..39749.846 rows=1 loops=1)
   -  Hash Join  (cost=83872.74..327537.74 rows=12563 width=9) (actual
 time=25221.702..39697.249 rows=11857 loops=1)
 Hash Cond: (outer.dokumnr = inner.dokumnr)
 -  Seq Scan on rid  (cost=0.00..195342.35 rows=3213135 width=13)
 (actual time=0.046..26347.959 rows=3243468 loops=1)
 -  Hash  (cost=83860.76..83860.76 rows=4792 width=4) (actual
 time=128.366..128.366 rows=4801 loops=1)
   -  Bitmap Heap Scan on dok  (cost=9618.80..83860.76
 rows=4792 width=4) (actual time=58.667..108.611 rows=4801 loops=1)
 Recheck Cond: ((dokumnr = 869906) OR (dokumnr =
 869907)
 OR (dokumnr = 869910) OR (dokumnr = 869911) OR (dokumnr = 869914) OR
 (dokumnr = 869915) OR (dokumnr = 869916) OR (dokumnr = 869917) OR (dokumnr
 =
 869918) OR (dokumnr = 869921) OR (dokumnr = 869925) OR (dokumnr = 869926)
 OR
 (dokumnr = 869928) OR (dokumnr = 869929) OR (dokumnr = 869934) OR (dokumnr
 =
 869935) OR (dokumnr = 869936) OR (dokumnr = 869937) OR (dokumnr = 869940)
 OR
 (dokumnr = 869941) OR (dokumnr = 869945) OR (dokumnr = 869951) OR (dokumnr
 =
 869964) OR (dokumnr = 869966) OR (dokumnr = 869969) OR (dokumnr = 869974)
 OR
 (dokumnr = 869979) OR (dokumnr = 869986) OR (dokumnr = 869992) OR (dokumnr
 =
 869993) OR (dokumnr = 869995) OR (dokumnr = 869997) OR (dokumnr = 870007)
 OR
 (dokumnr = 870018) OR (dokumnr = 870021) OR (dokumnr = 870023) OR (dokumnr
 =
 870025) OR (dokumnr = 870033) OR (dokumnr = 870034) OR (dokumnr = 870036)
 OR
 (dokumnr = 870038) OR (dokumnr = 870043) OR (dokumnr = 870044) OR (dokumnr
 =
 870046) OR (dokumnr = 870050) OR (dokumnr = 870051) OR (dokumnr = 870053)
 OR
 (dokumnr = 870054) OR (dokumnr = 870055) OR (dokumnr = 870064) OR (dokumnr
 =
 870066) OR (dokumnr = 870069) OR (dokumnr = 870077) OR (dokumnr = 870079)
 OR
 (dokumnr = 870081) OR (dokumnr = 870084) OR (dokumnr = 870085) OR (dokumnr
 =
 870090) OR (dokumnr = 870096) OR (dokumnr = 870110) OR (dokumnr = 870111)
 OR
 (dokumnr = 870117) OR (dokumnr = 870120) OR (dokumnr = 870124) OR (dokumnr
 =
 870130)
 ...
 OR (dokumnr = 890907) OR (dokumnr = 890908))
 -  BitmapOr  (cost=9618.80..9618.80 rows=4801
 width=0)
 (actual time=58.248..58.248 rows=0 loops=1)
   -  Bitmap Index Scan on dok_dokumnr_idx
 (cost=0.00..2.00 rows=1 width=0) (actual time=0.052..0.052 rows=3
 loops=1)
 Index Cond: (dokumnr = 869906)
   -  Bitmap Index Scan on dok_dokumnr_idx
 (cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=3
 loops=1)
 Index Cond: (dokumnr = 869907)
   -  Bitmap Index Scan on dok_dokumnr_idx
 (cost=0.00..2.00 rows=1 width=0) (actual time=0.020..0.020 rows=3
 loops=1)
 Index Cond: (dokumnr = 869910)
   -  Bitmap Index Scan on dok_dokumnr_idx
 (cost=0.00..2.00 rows=1 width=0) (actual time=0.010..0.010 rows=3
 loops=1)
 Index Cond: (dokumnr = 869911)
   -  Bitmap Index Scan on dok_dokumnr_idx
 (cost=0.00..2.00 rows=1 width=0) (actual time=0.008..0.008 rows=3
 loops=1)
 Index Cond: (dokumnr = 869914)
 ...
   -  Bitmap Index Scan on dok_dokumnr_idx
 

Re: [PERFORM] slow delete

2008-07-04 Thread tv
 My next question is: what is the difference between select and delete?
 There is another table that has one foreign key to reference the test
 (parent) table that I am deleting from and this foreign key does not have
 an index on it (a 330K row table).

The difference is that with SELECT you're not performing any modifications
to the data, while with DELETE you are. That means that with DELETE you
may have a lot of overhead due to FK checks etc.

Someone already pointed out that if you reference a table A from table B
(using a foreign key), then you have to check FK in case of DELETE, and
that may knock the server down if the table B is huge and does not have an
index on the FK column.

 Deleting one row at a time is fine: delete from test where pk_col = n1;

 but deleting the big chunk all together (with  80K rows to delete) always
 hangs: delete from test where cola = 'abc';

 I am wondering if I don't have enough memory to hold and carry on the
 80k-row delete.
 but how come I can select those 80k-row very fast? what is the difference
  between select and delete?

 Maybe the foreign key without an index does play a big role here, a
 330K-row table references a 29K-row table will get a lot of table scan on
 the foreign table to check if each row can be deleted from the parent
 table... Maybe select from the parent table does not have to check the
 child table?

Yes, and PFC already pointed this out.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql is very slow

2008-06-24 Thread tv
 I was not aware of the VACUUM functionality earlier, but some times back
 i read and run this on the server but i did not achieve anything in
 terms of performance. The server is running from 1 to 1.5 years and we
 have done VACUUM only once.

 vacuuming isn't so much about performance as about maintenance.  You
 don't change the oil in your car to make it go faster, you do it to
 keep it running smoothly.  Don't change it for 1.5 years and you could
 have problems.  sludge build up / dead tuple build up.  Kinda similar.


I have to disagree - the VACUUM is a maintenance task, but with a direct
impact on performance. The point is that Postgresql holds dead rows (old
versions, deleted, etc.) until freed by vacuum, and these rows need to be
checked every time (are they still visible to the transaction?). So on a
heavily modified table you may easily end up with most of the tuples being
dead and table consisting of mostly dead tuples.

 The output of EXPLAIN query;

 select * from USERS where email like '%bijayant.kumar%';
 This simplest query tooks 10 minutes and server loads goes from 0.35 to
 16.94.

 EXPLAIN select * from USERS where email like '%bijayant.kumar%';
  QUERY PLAN
 --
  Seq Scan on USERS  (cost=0.00..54091.84 rows=1 width=161)
   Filter: ((email)::text ~~ '%bijayant.kumar%'::text)
 (2 rows)

 You're scanning ~ 54094 sequential pages to retrieve 1 row.   Note
 that explain analyze is generally a better choice, it gives more data
 useful for troubleshooting.

Not necessarily, the 'cost' depends on seq_page_cost and there might be
other value than 1 (which is the default). A better approach is

SELECT relpages, reltuples FROM pg_class WHERE relname = 'users';

which reads the values from system catalogue.

 Definitely need a vacuum full on this table, likely followed by a reindex.

Yes, that's true. I guess the table holds a lot of dead tuples. I'm not
sure why this happens on one server (the new one) and not on the other
one. I guess the original one uses some automatic vacuuming (autovacuum,
cron job, or something like that).

As someone already posted, clustering the table (by primary key for
example) should be much faster than vacuuming and give better performance
in the end. See

http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html

The plain reindex won't help here - it won't remove dead tuples.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql is very slow

2008-06-24 Thread tv
 Not necessarily, the 'cost' depends on
 seq_page_cost and there might be
 other value than 1 (which is the default). A better
 approach is

 SELECT relpages, reltuples FROM pg_class WHERE relname =
 'users';

 which reads the values from system catalogue.

 The Output of query on the Slow Server

 SELECT relpages, reltuples FROM pg_class WHERE relname ='users';
  relpages | reltuples
 --+---
 54063 |  2307
 (1 row)

 The Output of query on the old server which is fast

  relpages | reltuples
 --+---
42 |  1637



This definitely confirms the suspicion about dead tuples etc. On the old
server the table has 1637 tuples and occupies just 42 pages (i.e. 330kB
with 8k pages), which gives about 0.025 of a page (0.2kB per) per row.

Let's suppose the characteristics of data (row sizes, etc.) are the same
on both servers - in that case the 2307 rows should occuppy about 58
pages, but as you can see from the first output it occupies 54063, i.e.
400MB instead of 450kB.

  Definitely need a vacuum full on this table, likely
 followed by a reindex.


 The Slow server load increases whenever i run a simple query, is it the
 good idea to run VACUUM full on the live server's database now or it
 should be run when the traffic is very low may be in weekend.

The load increases because with the queries you've sent the database has
to read the whole table (sequential scan) and may be spread through the
disk (thus the disk has to seek).

I'd recommend running CLUSTER instead of VACUUM - that should be much
faster in this case. It will lock the table, but the performance already
sucks, so I'd probably prefer a short downtime with a much faster
processing after that.


 Yes, that's true. I guess the table holds a lot of dead
 tuples. I'm not
 sure why this happens on one server (the new one) and not
 on the other
 one. I guess the original one uses some automatic vacuuming
 (autovacuum,
 cron job, or something like that).

 There was nothing related to VACUUM of database in the crontab.

In that case there's something running vacuum - maybe autovacuum (see
postgresql.conf), or so.

 As someone already posted, clustering the table (by primary
 key for
 example) should be much faster than vacuuming and give
 better performance
 in the end. See

 http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html

 The plain reindex won't help here - it won't remove
 dead tuples.

 I am new to Postgres database, i didnt understand the indexing part. Is
 it related to PRIMARY_KEY column of the table?

Not sure what you mean by the 'nd

Principle of clustering is quite simple - by sorting the table according
to an index (by the columns in the index) you may get better performance
when using the index. Another 'bonus' is that it compacts the table on the
disk,  so disk seeking is less frequent. These two effects may mean a
serious increase of performance. You may cluster according to any index on
the table, not just by primary key - just choose the most frequently used
index.

Sure, there are some drawbacks - it locks the table, so you may not use it
when the command is running. It's not an incremental operation, the order
is not enforced when modifying the table - when you modify a row the new
version won't respect the order and you have to run the CLUSTER command
from time to time. And it's possible to cluster by one index only.


 Should i have to run:- CLUSTER USERS using 'username';

I guess 'username' is a column, so it won't work. You have to choose an
index (I'd recommend the primary key index, i.e. the one with _pk at the
end).

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql is very slow

2008-06-23 Thread tv
Hi,

 Hello to list,

 We have a CentOS-5 server with postgresql-8.1.8 installed. I am struggling
 with postgresql performance. Any query say select * from tablename takes
 10-15 mins to give the output, and while executing the query system loads
 goes up like anything. After the query output, system loads starts
 decresing.

I doubt the 'select * from tablename' is a good candidate for tuning, but
give us more information about the table. What is it's size - how many
rows does it have and how much space does it occupy on the disk? What is a
typical usage of the table - is it modified (update / delete) frequently?
How is it maintained - is there a autovacuum running, or did you set a
routine vacuum (and analyze) job to maintain the database?

I guess one of the servers (the slow one) is running for a long time
without a proper db maintenance (vacuum / analyze) and you dumped / loaded
the db onto a new server. So the 'new server' has much more 'compact'
tables and thus gives the responses much faster. And this holds for more
complicated queries (with indexes etc) too.

An output from 'EXPLAIN' (or 'EXPLAIN ANALYZE') command would give a much
better overview.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability question

2008-06-11 Thread tv
 Hi,

 I got a question about scalability in high volume insert situation
 where the table has a primary key and several non-unique indexes
 on other columns of the table. How does PostgreSQL behave
 in terms of scalability? The high volume of inserts comes from
 multiple transactions.

 Best regards,
 Zoltán Böszörményi

Well, that's a difficult question as it depends on hardware and software,
but with a proper tunning the results may be very good. Just do the basic
PostgreSQL tuning and then tune it for the INSERT performance if needed.
It's difficult to give any other recommendations without a more detailed
knowledge of the problem, but consider these hints:

1) move the pg_xlog to a separate drive (so it's linear)
2) move the table with large amount of inserts to a separate tablespace
3) minimize the amount of indexes etc.

The basic rule is that each index adds some overhead to the insert, but it
depends on datatype, etc. Just prepare some data to import, and run the
insert with and without the indexes and compare the time.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query performance question

2008-06-03 Thread tv
Hi,

Hubert already answered your question - it's expected behavior, the
count(*) has to read all the tuples from the table (even dead ones!). So
if you have a really huge table, it will take a long time to read it.

There are several ways to speed it up - some of them are simple (but the
speedup is limited), some of them require change of application logic and
requires to rewrite part of the application (using triggers to count the
rows, etc.)

1) If the transactions have sequential ID without gaps, you may easily
select MAX(id) and that'll give the count. This won't work if some of the
transactions were deleted or if you need to use other filtering criteria.
The needed changes in the application are quite small (basically just a
single SQL query).

2) Move the table to a separate tablespace (a separate disk if possible).
This will speed up the reads, as the table will be 'compact'. This is just
a db change, it does not require change in the application logic. This
will give you some speedup, but not as good as 1) or 3).

3) Build a table with totals or maybe subtotals, updated by triggers. This
requires serious changes in application as well as in database, but solves
issues of 1) and may give you even better results.

Tomas

 Hello,

 I have a table (transactions) containing 61 414 503 rows. The basic
 count query (select count(transid) from transactions) takes 138226
 milliseconds.
 This is the query analysis output:

 Aggregate  (cost=2523970.79..2523970.80 rows=1 width=8) (actual
 time=268964.088..268964.090 rows=1 loops=1);
   -  Seq Scan on transactions  (cost=0.00..2370433.43 rows=61414943
 width=8) (actual time=13.886..151776.860 rows=61414503 loops=1);
 Total runtime: 268973.248 ms;

 Query has several indexes defined, including one on transid column:

 non-unique;index-qualifier;index-name;type;ordinal-position;column-name;asc-or-desc;cardinality;pages;filter-condition

 f;null;transactions_id_key;3;1;transid;null;61414488;168877;null;
 t;null;trans_ip_address_index;3;1;ip_address;null;61414488;168598;null;
 t;null;trans_member_id_index;3;1;member_id;null;61414488;169058;null;
 t;null;trans_payment_id_index;3;1;payment_id;null;61414488;168998;null;
 t;null;trans_status_index;3;1;status;null;61414488;169005;null;
 t;null;transactions__time_idx;3;1;time;null;61414488;168877;null;
 t;null;transactions_offer_id_idx;3;1;offer_id;null;61414488;169017;null;

 I'm not a dba so I'm not sure if the time it takes to execute this query
 is OK or not, it just  seems a bit long to me.
 I'd appreciate it if someone could share his/her thoughts on this. Is
 there a way to make this table/query perform better?
 Any query I'm running that joins with transactions table takes forever
 to complete, but maybe this is normal for a table this size.
 Regards,

 Marcin


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum: recommended?

2007-11-19 Thread tv
 FWIW, 20k rows isn't all that big, so I'm assuming that the
 descriptions make the table very wide. Unless those descriptions are
 what's being updated frequently, I suggest you put those in a
 separate table (vertical partitioning). That will make the main table
 much easier to vacuum, as well as reducing the impact of the high
 churn rate.

Yes, you're right - the table is quite wide, as it's a catalogue of a
pharmacy along with all the detailed descriptions and additional info etc.
So I guess it's 50 MB of data or something like that. That may not seem
bad, but as I already said the table grew to about 12x the size during the
day (so about 500MB of data, 450MB being dead rows). This is the 'central'
table of the system, and there are other quite heavily used databases as
well. Add some really stupid queries on this table (for example LIKE
searches on the table) and you easily end up with 100MB of permanent I/O
during the day.

The vertical partitioning would be overengineering in this case - we
considered even that, but proper optimization of the update process
(updating only those rows that really changed), along with a little bit of
autovacuum tuning solved all the performance issues.

Tomas


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] autovacuum: recommended?

2007-11-16 Thread tv
 That being said, we have some huge tables in our database and pretty
 much traffic, and got quite some performance problems when the
 autovacuum kicked in and started vacuuming those huge tables, so we're
 currently running without.  Autovacuum can be tuned to not touch those
 tables, but we've chosen to leave it off.

We had some performance problems with the autovacuum on large and
frequently modified tables too - but after a little bit of playing with
the parameters the overall performance is much better than it was before
the autovacuuming.

The table was quite huge (say 20k of products along with detailed
descriptions etc.) and was completely updated and about 12x each day, i.e.
it qrew to about 12x the original size (and 11/12 of the rows were dead).
This caused a serious slowdown of the application each day, as the
database had to scan 12x more data.

We set up autovacuuming with the default parameters, but it interfered
with the usual traffic - we had to play a little with the parameters
(increase the delays, decrease the duration or something like that) and
now it runs much better than before. No nightly vacuuming, no serious
performance degradation during the day, etc.

So yes - autovacuuming is recommended, but in some cases the default
parameters have to be tuned a little bit.

tomas


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] performance - triggers, row existence etc.

2005-04-09 Thread tv
Hello,

I'm just in the middle of performance tunning of our database running
on PostgreSQL, and I've several questions (I've searched the online
docs, but without success).

1) When I first use the EXPLAIN ANALYZE command, the time is much
   larger than in case of subsequent invocations of EXPLAIN ANALYZE.
   I suppose the plan prepared during the first invocation is cached
   somewhere, but I'm not sure where and for how long.

   I suppose the execution plans are connection specific, but
   I'm not sure whether this holds for the sql queries inside the
   triggers too. I've done some testing but the things are somehow
   more difficult thanks to persistent links (the commands will
   be executed from PHP).

2) Is there some (performance) difference between BEFORE and AFTER
   triggers? I believe there's no measurable difference.

3) Vast majority of SQL commands inside the trigger checks whether there
   exists a row that suits some conditions (same IP, visitor ID etc.)
   Currently I do this by

   SELECT INTO tmp id FROM ... JOIN ... WHERE ... LIMIT 1
   IF NOT FOUND THEN

   END IF;

   and so on. I believe this is fast and low-cost solution (compared
   to the COUNT(*) way I've used before), but is there some even better
   (faster) way to check row existence?

Thanks
t.v.

---(end of broadcast)---
TIP 8: explain analyze is your friend