Hello,
I'm just in the middle of performance tunning of our database running
on PostgreSQL, and I've several questions (I've searched the online
docs, but without success).
1) When I first use the EXPLAIN ANALYZE command, the time is much
larger than in case of subsequent invocations of EXPLAI
I guess we need some more details about the test. Is the
connection/disconnection part of each test iteration? And how are the
databases connected (using a socked / localhost / different host)?
Anyway measuring such simple queries will tell you almost nothing about
the general app performance - us
Current Folder: Sent Sign Out
Compose Addresses Folders Options Autoreply Search Help
CalendarG-Hosting.cz
Message List | Delete | Edit Message as New Previous | Next Forward
|
Forward as Attachment | Reply | Reply All
Subject:Re: [PERFORM] Ran
> Hi,
>
> And this is what happens in the queries above - the first query covers
>> years 1963-2009, while the second one covers 1900-2009. Given the fact
>> this table contains ~40m rows, the first query returns about 0.01% (3k
>> rows) while the second one returns almost 50% of the data (18m rows
> I am reposting as my original query was mangled
>
> The link to the explain plan is here as it does not paste well into
> the email body.
>
> http://explain.depesz.com/s/kHa
>
>
> The machine is a 2 cpu quad core 5430 with 32G RAM and 6x450G 15K
> single raid-10 array
>
> 1G work_mem
> default_st
> Any suggestions on what I can do to speed things up? I presume if I turn
> off
> Sequential Scan then it might default to Index Scan.. Is there anything
> else?
>
> Cheers,
> Tom
Well, I doubt turning off the sequential scan will improve the performance
in this case - actually the first case (ru
> I'd suggest to increase the value up to ~80MB, if not for the system,
> may be just for the session running this query.
> Then see if performance improved.
Don't forget you can do this for the given query without affecting the
other queries - just do something like
SET work_mem = 128M
and the
> http://explain.depesz.com/s/br9
> http://explain.depesz.com/s/gxH
Well, I don't have time to do a thorough analysis right now, but in all
the plans you've posted there are quite high values in the "Rows x" column
(e.g. the 5727.5 value).
That means a significant difference in estimated and actu
> Okay, 225044.255ms VS 83813.808 ms, it obviously seems that the planner
> select one bad scan plan by default.
Actually no, the planner chose the cheapest plan (more precisely a plan
with the lowest computed cost). The first plan has a cost 600830.86 while
the second one has a cost 634901.28, so
> But I doubt your answer. I think the essence of the problem is when the
> planner selects 'Bitmap Index Scan' and how the planner computes the cost
> of 'Bitmap Index Scan'.
The essence of the problem obviously is a bad estimate of the cost. The
planner considers the two plans, computes the cost
Hi, what is the size of the table and index (in terms of pages and
tuples)? Try something like
SELECT relpages, reltuples FROM pg_class WHERE relname = 'table or index
name';
And what indexes have you created? It seems to me there's just index on
the variable_id. It might be useful to create inde
> Excerpts from kuopo's message of jue nov 18 04:10:36 -0300 2010:
>> However, when I analyze the table A, the autovacuum or vacuum on the
>> table B cannot find any removable row version (the number of
>> nonremoveable row versions and pages keeps increasing). After the
>> analysis finishes, the s
> 4) INDEXESI can certainly add an index but given the table sizes I am not
> sure if that is a factor. This by no means is a large dataset less than
> 350,000 rows in total and 3 columns. Also this was just a quick dump of
> data for comparison purpose. When I saw the poor performance on the
> COA
> First, I modified the work_mem setting to 1GB (reloaded config) from the
> default 1MB and I see a response time of 33 seconds. Results below from
> EXPLAIN ANALYZE:
...
> Second, I modified the work_mem setting to 2GB (reloaded config) and I see
> a response time of 38 seconds. Results below f
>> 4) INDEXESI can certainly add an index but given the table sizes I am
>> not
>> sure if that is a factor. This by no means is a large dataset less than
>> 350,000 rows in total and 3 columns. Also this was just a quick dump of
>> data for comparison purpose. When I saw the poor performance on th
> Pavel Stehule wrote:
>> 2010/11/21 Humair Mohammed :
>
>>> shared_buffers = 2
>
>> shared_buffers = 2 ???
>
> Yeah, if that's not a typo, that's a very serious misconfiguration.
I guess that's a typo, as the explain plain in one of the previous posts
contains
Buffers: shared hit=192 read=48
>
>
> Correct, the optimizer did not take the settings with the pg_ctl reload
> command. I did a pg_ctl restart and work_mem now displays the updated
> value. I had to bump up all the way to 2047 MB to get the response below
> (with work_mem at 1024 MB I see 7 seconds response time) and with 2047 M
> I believe you can set work_mem to a different value just for the duration
> of
> a single query, so you needn't have work_mem set so high if for every
> query
> on the system. A single query may well use a multiple of work_mem, so you
> really probably don't want it that high all the time unless
Hello,
> Now, should I put these tables in 1 Database's different schemas or in
> separate
> databases itself for good performance?
> I am using libpq for connection.
>
> Pictorial Representation:
>
> Process1 -> DB1.schema1.table1
>
> Process2 -> DB1.schema2.table1
>
> Vs.
>
> Process1 -> DB1.d
> I don't think it will make a big difference in performance.
>
> The real question is: do you need queries that "cross boundaries"? If that
> is the case you have to use schema, because Postgres does not support
> cross-database queries.
Well, there's dblink contrib module, but that won't improve
> On Thursday 25 November 2010 13:02:08 t...@fuzzy.cz wrote:
>> I don't think you'll get performance improvement from running two
>> PostgreSQL clusters (one for DB1, one for DB2). And when running two
>> databases within the same cluster, there's no measurable performance
>> difference AFAIK.
> Th
> I am not facing any issues, but yes I want to have optimal performance for
> SELECT and INSERT, especially when I am doing these ops repeatedly.
> Actually I am porting from Oracle to PG. Oracle starts a lot of processes
> when
> it needs to run many schemas. I do not think PG would need much mor
Hi, and why do you think this is a problem?
The explain plan is expected to change for different parameter values,
that's OK. The merge in the first query is expected to produce
significantly more rows (91774) than the other one (229). That's why the
second query chooses nested loop instead of mer
> I've already used an 'EXPLAIN ANALYZE' to post the message. So I don't
> clearly understand what you are expecting for, when you tell me to provide
> 'EXPLAIN ANALYZE' (please excuse me for the misunderstood)
No, you haven't. You've provided 'EXPLAIN' output, but that just prepares
an execution
> *Modified Query :-
>
> *SELECT count(*) from page_content
> WHERE publishing_date like '%2010%' and content_language='en' and
> content is not null and isprocessable = 1 and
> to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' ||
> 'jihad' || 'Militant' || 'fedayeen' || 'insurgent'
> Thanks , it works now .. :-)
>
> Here is the output :
>
> pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like
> '%2010%' and
> pdc_uima-# content_language='en' and content is not null and
> isprocessable = 1 and
> pdc_uima-# to_tsvector('english',content) @@
> to_tsquery('eng
> I just want to retrieve that id 's from page_content which do not have
> any entry in clause2 table.
In that case the query probably does not work (at least the query you've
sent in the first post) as it will return even those IDs that have at
least one other row in 'clause2' (not matching the !
>
>> Actually the plans are equal, so I suppose it depends on what were
>> run first :). Slow query operates with data mostly on disk, while
>> fast one with data in memory.
>>
>> yeah. maybe the easiest way, is to start a fresh session and fire the
>> queries.
>
>
> After the fresh sta
> On 03/23/2011 04:17 AM, Adarsh Sharma wrote:
>
>> explain analyze select distinct(p.crawled_page_id) from page_content
>> p where NOT EXISTS (select 1 from clause2 c where c.source_id =
>> p.crawled_page_id);
>
> You know... I'm surprised nobody has mentioned this, but DISTINCT is
> very slow unl
> Hi,
>
> I have very bad bgwriter statistics on a server which runs since many
> weeks
> and it is still the same after a recent restart.
> There are roughly 50% of buffers written by the backend processes and the
> rest by checkpoints.
> The statistics below are from a server with 140GB RAM, 32GB
>> Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053)
>> Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id =
>> c.source_id) AND (s.sentence_id = c.sentence_id))
>> -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65
>> rows=27471560 width=1
> That being said, we have some huge tables in our database and pretty
> much traffic, and got quite some performance problems when the
> autovacuum kicked in and started vacuuming those huge tables, so we're
> currently running without. Autovacuum can be tuned to not touch those
> tables, but we'
> FWIW, 20k rows isn't all that big, so I'm assuming that the
> descriptions make the table very wide. Unless those descriptions are
> what's being updated frequently, I suggest you put those in a
> separate table (vertical partitioning). That will make the main table
> much easier to vacuum, as we
Hi,
Hubert already answered your question - it's expected behavior, the
count(*) has to read all the tuples from the table (even dead ones!). So
if you have a really huge table, it will take a long time to read it.
There are several ways to speed it up - some of them are simple (but the
speedup i
> Hi,
>
> I got a question about scalability in high volume insert situation
> where the table has a primary key and several non-unique indexes
> on other columns of the table. How does PostgreSQL behave
> in terms of scalability? The high volume of inserts comes from
> multiple transactions.
>
> B
Hi,
> Hello to list,
>
> We have a CentOS-5 server with postgresql-8.1.8 installed. I am struggling
> with postgresql performance. Any query say select * from tablename takes
> 10-15 mins to give the output, and while executing the query system loads
> goes up like anything. After the query output
>> I was not aware of the VACUUM functionality earlier, but some times back
>> i read and run this on the server but i did not achieve anything in
>> terms of performance. The server is running from 1 to 1.5 years and we
>> have done VACUUM only once.
>
> vacuuming isn't so much about performance a
>> Not necessarily, the 'cost' depends on
>> seq_page_cost and there might be
>> other value than 1 (which is the default). A better
>> approach is
>>
>> SELECT relpages, reltuples FROM pg_class WHERE relname =
>> 'users';
>>
>> which reads the values from system catalogue.
>>
> The Output of query
> My next question is: what is the difference between "select" and "delete"?
> There is another table that has one foreign key to reference the test
> (parent) table that I am deleting from and this foreign key does not have
> an index on it (a 330K row table).
The difference is that with SELECT y
Sorry, but you have to provide much more information about the table. The
information you've provided is really not sufficient - the rows might be
large or small. I guess it's the second option, with a lots of dead rows.
Try this:
ANALYZE table;
SELECT relpages, reltuples FROM pg_class WHERE reln
Obviously, most of the total cost (cost=327569, time=39749ms) comes from
two operations in the execution plan:
(a) sequential scan on the 'rid' table (cost=195342, time=26347ms) that
produces almost 3.200.000 rows
(b) hash join of the two subresults (cost=24, time=14000ms)
How many rows are t
Hi,
so the table occupies about 50 MB, i.e. each row has about 1 kB, right?
Updating 1000 rows should means about 1MB of data to be updated.
There might be a problem with execution plan of the updates - I guess the
100 rows update uses index scan and the 1000 rows update might use seq
scan.
Anyw
> max_connections = 100
> shared_buffers = 16MB
> work_mem = 64MB
> everything else is set to the default
OK, but what about effective_cache_size for example?
Anyway, we need more information about the table itself - the number of
rows is nice, but it does not say how large the table is. The rows
I bet there is no 'critical' length - this is just another case of index
scan vs. seqscan. The efficiency depends on the size of the table / row,
amount of data in the table, variability of the column used in the IN
clause, etc.
Splitting the query with 1000 items into 10 separate queries, the sma
> On Wed, Nov 19, 2008 at 10:54 AM, Kranti� K K Parisa
> <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I have defined sequence on a table something like this
>>
>>
>> CREATE SEQUENCE items_unqid_seq
>> INCREMENT 1
>> MINVALUE 0
>> MAXVALUE 9223372036854775807
>> START 7659
>> CACHE 1;
>>
>> thi
Just the most important points:
1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB
pages this means 1.4GB MB of data), but there are 1834279 unused item
pointers (i.e. about 60% of the space is wasted)
2) "rid" table contains 3275189 roiws in 165282 (with 8kB pages this means
>> Given the fact that the performance issues are caused by bloated tables
>> and / or slow I/O subsystem, moving to a similar system won't help I
>> guess.
>
> I have ran VACUUM FULL ANALYZE VERBOSE
> and set MAX_FSM_PAGES = 15
>
> So there is no any bloat except pg_shdepend indexes which shou
>
> Hi. I have a problem on one of our production servers. A fairly
> complicated query is running, and the backend process is using 30 GB of
> RAM. The machine only has 32GB, and is understandably swapping like crazy.
> My colleague is creating swap files as quickly as it can use them up.
>
> The
Hi Jose,
why haven't you post an example of the failing query, data and the exact
error message? The casting should work on 8.3 (and it works for me) so I
guess there are some invalid data, invalid SQL or something like that.
Anyway I doubt this is a performance issue - this falls into generic SQ
> Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune
> before and everything worked fine for me.
>
> And now i have ~93% cpu load. Here's changed values of config:
>
> default_statistics_target = 50
> maintenance_work_mem = 1GB
> constraint_exclusion = on
> checkpoint_completion_t
> Unfortunately had to downgrade back to 8.3. Now having troubles with
> that and still solving them.
>
> For future upgrade, what is the basic steps?
1. create database
2. dump the data from the old database
3. load the data into the new database
4. analyze etc. (I prefer to do this manually at t
> Hi all;
>
> We have a table that's > 2billion rows big and growing fast. We've setup
> monthly partitions for it. Upon running the first of many select * from
> bigTable insert into partition statements (330million rows per month) the
> entire box eventually goes out to lunch.
>
> Any thoughts/s
> On Tuesday 01 September 2009 03:26:08 Pierre FrĂŠdĂŠric Caillaud wrote:
>> > We have a table that's > 2billion rows big and growing fast. We've
>> setup
>> > monthly partitions for it. Upon running the first of many select *
>> from
>> > bigTable insert into partition statements (330million rows
> default_statistics_target = 100 (tried with 500, no change). Vacuum
> analyzed
> before initial query, and after each change to default_statistics_target.
Modifying the statistics target is useful only if the estimates are
seriously off, which is not your case - so it won't help, at least not
re
> Playing around with seq_page_cost (1) and random_page_cost (1), I can get
> the correct index selected. Applying those same settings to our production
> server does not produce the optimal plan, though.
I doubt setting seq_page_cost and random_page_cost to the same value is
reasonable - random a
> Hi Tom,
>
> Yes, 24 is relative ok ( the real number is 20).
> And the statistic target for the database is 800 at the moment. If
> needet I can set it to 1000 ( the maximum).
>
> Also I waited to the end of this query to gather info for explain analyze.
> It is it:
>
> explain analyze select
> Hi all,
>
> on our PostgreSQL 8.3.1 (CentOS 5.3 64-bit) two different query plans
> for one of our (weird) queries are generated. One of the query plans
> seems to be good (and is used most of the time). The other one is bad -
> the query takes about 2 minutes and the database process, which is
>
> Hello postgres wizards,
>
> We recently upgraded from 8.1.5 to 8.4
> We have a query (slow_query.sql) which took about 9s on 8.1.5
> On 8.4, the same query takes 17.7 minutes.
>
> The code which generated this query is written to support the
> calculation of arbitrary arithmetic expressions acros
> max_connections = 700
> shared_buffers = 4096MB
> temp_buffers = 16MB
> work_mem = 64MB
> maintenance_work_mem = 128MB
> wal_buffers = 32MB
> checkpoint_segments = 32
> random_page_cost = 2.0
> effective_cache_size = 4096MB
First of all, there's no reason to increase wal_buffers above 32MB. AFAI
>
>
> Thanks Scott :
>
> My iostat package is not installed but have a look on below output:
>
> [root@s8-mysd-2 8.4SS]# vmstat 10
> procs ---memory-- ---swap-- -io --system--
> -cpu--
> r b swpd free buff cache si sobibo incs us sy
> id wa
> some additional info.
> the table inventory is about 4481 MB and also has postgis types.
> the table gran_ver is about 523 MB
> the table INVSENSOR is about 217 MB
>
> the server itself has 32G RAM with the following set in the postgres conf
> shared_buffers = 3GB
> work_mem = 64MB
> maintena
> Hi,
>
> I am trying to tune a query that is taking too much time on a large
> dataset (postgres 8.3).
>
Hi, run ANALYZE on the tables used in the query - the stats are very off,
so the db chooses a really bad execution plan.
Tomas
--
Sent via pgsql-performance mailing list (pgsql-performance
> On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
> wrote:
>> I just executed a VACUUM ANALYZE and now everything performs well. hm,
>> strange.
>
> That probably means you need more statistics - try increasing the
> newsfeed's statistics target count.
>
> ALTER TABLE newsfeed_item ALTER COLUMN n
> Thomas,
>
> Thank you for your very detailed and well written description. In
> conclusion, I should keep my random_page_cost (3.0) to a value more than
> seq_page_cost (1.0)? Is this bad practice or will this suffice for my
> setup (where the database is much bigger than the RAM in the system)?
> On Thu, Apr 14, 2011 at 1:26 AM, Tomas Vondra wrote:
>> Workload A: Touches just a very small portion of the database, to the
>> 'active' part actually fits into the memory. In this case the cache hit
>> ratio can easily be close to 99%.
>>
>> Workload B: Touches large portion of the database, s
> Thanks. But let me do the "top" stuff later. I think I have a bigger
> problem now.
>
> While doing a PG dump, I seem to get this error:
>
> ERROR: invalid memory alloc request size 4294967293
>
> Upon googling, this seems to be a data corruption issue!
>
> One of the older messages suggests
> Any idea how to improve the performance?
Hmmm, I guess we'll need more info about resource usage (CPU, I/O, locks)
used when the commit happens. Run these two commands
$ iostat -x 1
$ vmstat 1
and then execute the commit. See what's causing problems. Is the drive
utilization close to 100%? Yo
> It's not synchronous at all. The clients create a temporary file for
> the statistics collector and move on. The actual statistics don't get
> updated until the statistics collector decides enough time has passed to
> bother, which defaults to at most every 500ms.
Really? I thought the clients
> Version: PostgreSQL 8.3.5 (mammoth replicator)
>
> Schema:
>
> CREATE TABLE tdiag (
> diag_id integer DEFAULT nextval('diag_id_seq'::text),
> create_time timestamp with time zone default now(), /* time
> this
> record
> was created */
> diag_time t
> What seems odd to me is that the only difference between the two is the
> limit
> clause:
>
> select * from tdiag where (create_time >= '2011-06-03
> 09:49:04.00+0' and create_time < '2011-06-06 09:59:04.00+0') order
> by
> diag_id limit 1;
>
> select * from tdiag where (create_time >= '2
> On Wednesday 08 June 2011 17:39, Claudio Freire wrote:
>> Of course optimally executing a plan with limit is a lot different
>> than one without.
>
> I imagined that limit just cuts out a slice of the query results.
> If it can find 8 rows in 0.5 seconds then I would have thought that
> retur
> Thanks for including explain analyze output.
>
> Is there any chance you can pop the full explains (not just excerpts) in
> here:
>
> http://explain.depesz.com/
>
> ?
I believe he already did that - there's a link below each query.
Tomas
--
Sent via pgsql-performance mailing list (pgsql-perf
--+-
> shared_buffers | 16MB
> work_mem | 250MB
This seems a bit suspicious. Are you sure you want to keep the
shared_buffers so small and work_mem so large at the same time? There
probably
> On Postgres, this same query takes about 58 minutes (could not run
> explain analyze because it is in progress):
>
> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> pg_dw-# as
> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> pg_dw-# from openactivity o,ecr_sents s
>
> If I had set the primary key to (diag_id, create_time) would simple
> queries on
> diag_id still work well i.e.
> select * from tdiag where diag_id = 1234;
Yes. IIRC the performance penalty for using non-leading column of an index
is negligible. But why don't you try that on your own - just
> I have a function in pgsql language, this function do some select to some
> tables for verify some conditions and then do one insert to a table with
> NO
> index. Update are not performed in the function
>
> When 1 client connected postgres do 180 execution per second
> With 2 clients connected p
> Hi,
>
> DB : POSTGRES 8.4.8
> OS : Debian
> HD : SAS 10k rpm
>
> Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM
>
> After insert trigger is again calling 2 more trigger and insert record in
> another table depends on condition.
>
> with all trigger enable there are 8 inser
> The mystery remains, for me: why updating 100,000 records could complete
> in as quickly as 5 seconds, whereas an attempt to update a million
> records was still running after 25 minutes before we killed it?
Hi, there's a lot of possible causes. Usually this is caused by a plan
change - imagine
On 13 Srpen 2011, 5:09, Greg Smith wrote:
> The long pauses are most likely every client blocking once the
> checkpoint sync runs. When those fsync calls go out, Linux will freeze
> for quite a while there on ext3. In this example, the drop in TPS/rise
> in latency at around 50:30 is either the b
On Sun, 14 Aug 2011 07:15:00 -0600, Scott Marlowe
wrote:
> On Sun, Aug 14, 2011 at 6:51 AM, wrote:
>>
>> I've increased the test duration to 10 minutes, decreased the
>> checkpoint timeout to 4 minutes and a checkpoint is issued just before
>> the pgbench. That way the starting position should b
80 matches
Mail list logo